This is a pretty clever use of machine learning to solve a prevalent problem.
- A high-definition reference image of person A is sent to person B
- After that, most of the data sent from A to B is a stream of low-definition images
- High-def reference image is mixed with the new images, adjusting the parts that move (mouth, eyes, head position, etc)
What the researchers have achieved has remarkable results: by replacing the traditional h.264 video codec with a neural network, they have managed to reduce the required bandwidth for a video call by an order of magnitude. In one example, the required data rate fell from 97.28 KB/frame to a measly 0.1165 KB/frame – a reduction to 0.1% of required bandwidth.
Since the image you see is actually an animation and not a video it can do some pretty spooky stuff, like twisting your head for you to make sure you’re looking into the camera or replacing your face with an animated avatar.
What we have ahead of us is certainly double-edged:
- A potentially incredible jump in the reliability of video calls
- Another nail in the coffin for reliable identity without verification, as this makes it possible to appear in a Zoom call using the face of anyone you have a good reference photo of.