Training-Free Neural Matte Extraction for Visual Effects: Limitations and Conclusion

6 Jul 2024


(1) Sharif Elcott, equal contribution of Google Japan (Email: [email protected]);

(2) J.P. Lewis, equal contribution of Google Research USA (Email: [email protected]);

(3) Nori Kanazawa, Google Research USA (Email: [email protected]);

(4) Christoph Bregler, Google Research USA (Email: [email protected]).


We have introduced a matte extraction approach using the deep image prior. The algorithm is simple, requiring only a few tens of lines of code modification to an existing U-net. Our approach is training-free and is thus particularly suitable for the diverse, few-ofa-kind subjects in entertainment video production. It also may be of intrinsic theoretical interest in terms of the nature and solution of the matte extraction problem. A further potential use would be to produce ground-truth mattes to be used for DL training. As is the case with many matting algorithms, it assumes coarse guidance in the form of a trimap or similar constraints. This can be created by the artist using readily available semi-automatic tools.

Computational cost is the major limitation of the method, in common with classic methods [Levin et al. 2008]. Compute times for the examples shown in the paper are measured in minutes (but not hours) on a single previous generation Nvidia Volta GPU. This restricts the use of our algorithm to high-quality offline applications where extensive non-real-time computation is the norm, primarily movies and videos. On the other hand, the computation can take advantage of support for multiple GPUs provided in deep learning frameworks, and intermediate results can be visualized.

Our method can produce temporally consistent matte extractions from video by warm-starting the optimization from the previous frame (see accompanying video), however in our experience this requires that the trimaps have smooth motion from frame-to-frame. A topic for future work is to consider recurrent or other network architectures that might make the trimap choice more forgiving. This paper has focused on introducing the DIP matting algorithm. There was relatively little architecture and parameter exploration, and further improvements may be possible.


G.G. Heitmann, Peter Hillman, and Kathleen Beeler gave helpful insights and feedback.


This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.