Sunday, August 21, 2022

// // Leave a Comment

DFL-SAEHDBW - The Grayscale deepfake model now Renders Back in Color - Colorization and Color Stabilization with Pix2Pix model etc.

I had the colorization in mind before I started the refactoring of Deepfacelab to include grayscale models, and lately I added this functionality, for now as a POC, to be published later. The colorization is done with a Pix2Pix model (based on the example on Colab with the facades, maps etc.), trained on the faceset of the color video - grayscale faces converted to color. There is a step of color stabilization, which was required for more pleasant output, because without that there was slight, but noticeable flickering. The experiments for now were only on a single video/segment (about 1000 frames training of the pix2pix model) and without pretraining on other/various faces - that is something to be done in the future.

I am still pushing the limit of a GF 750 Ti 2 GB - so now it can produce 192x192 color lip-synced deepfakes with reasonable quality. The SAEHDBW DF-UDT model is about 345 MB initially (a bit more when trained), the Pix2Pix model is about 131 MB.

Note that the pix2pix model did fit in GPU only 128x128, but a sharpening at the end of the pipeline improves the image even more than the original grayscale one. Training at 256x256 on the CPU is a possible option, too, because the pix2pix model seems to be fast and also possibly the applied color stabilization can repair some fluctuations, i.e. possibly it can be not perfectly trained and still capable to produce decent results - that is to be verified with other videos. I haven't tried to colorize the Arnold's model yet.

The glasses of Stoltenberg are in most cases reasonably depicted, except a few little glitches from the grayscale model.


After investigation of the properties of the colorized faces, debugging of the merging, there was a successful application of an idea for stabilization of the colorized output and merging with precomputed faces (for other usages as well, e.g. prerendered 3D-models or synchronously performing faces etc.). In the video example below the output is also sharpened after merging (whole frame) - it needs to be per face only etc. or to have some antialiasing eventually.

See a merged and sharpened segment with Jens, whole frame:

Only aligned faces:

The raw colorized face with pix2pix model without color stabilization was flickering; it was very bad, but still noticeable, especially in some moments. 

After color-gamma stabilization, that artifact was gone (only the aligned face, 146 KB): 

The color-gamma stabilization is done by first probe-rendering all faces, computing their total pixel weight per frame and the average of all frames, then adjusting the gamma for each frame according to the average in order to flatten the fluctuations: if the face is too dark - it gets lighter and vice versa. Indeed, this phenomenon itself is to show some intrinsic properties of the pix2pix model.

Finally there is sharpening and merging is performed using these corrected faces.


* The neural model didn't capture the blue tint of the model's eyes, but it had a little excuse - the color of the eyes in the video varies and even there are frames where the ground truth eyes are different colors: one is very gray-blueish-purple and the other one - brown.

0 коментара: