Apple's 'HUGS' creates digital human avatars in 30 minutes

Apple has released a research paper discussing what it calls HUGS, a generative AI technology that can create a digital human avatar from a brief video in about 30 minutes.

Released via Apple's Machine Learning Research page and shared by Apple researcher Anurag Ranjan on X, "HUGS: Human Gaussian Splats" discusses techniques to create digital avatars of humans. Using machine learning and computer vision, the research details the creation process, using relatively little source material.

Current neural rendering techniques are a marked improvement over earlier versions, but they are still best suited for "photogrammetry of static scenes and do not generalize well to freely moving humans in the environment," introductory paragraphs explain.

The concept of Human Gaussian Splats, HUGS, uses a technique called 3D Gaussian Splatting to create an animatable human within a scene.

The method itself requires a small amount of video of the subject, typically in motion within a scene and showing as many surfaces as possible for the system to work from. The technique can use very short clips in some cases, sometimes monocular video with as few as 50 to 100 frames, equating to two to four seconds of 24fps video.

Introducing HUGS: Human Gaussian Splats - capable of creating animatable (3DGS) avatars from a casual video (50-100 frames) in ~30 mins. Our avatars can easily be embedded into other (NeRF) scenes. (1/4)

Project: https://t.co/ws69aCAUtG
arXiv: https://t.co/yjsR9Vt8RY pic.twitter.com/ADVWw56ats
— Anurag Ranjan (@anuragranj) December 19, 2023

The system has been trained to "disentangle the static scene and a fully-animatable human avatar within 30 minutes," Apple claims.

Watch the Latest from AppleInsider TV

While the SMPL body model is used to initialize the human Gaussian models, it cannot capture every detail. The process is allowed to deviate from the SMPL model for elements that aren't modeled, such as cloth and hair, to fill in the gaps of what was captured and included in the model.

There is also a proposal to optimize linear blending skin weights so they can coordinate with the movements of a Gaussian model during animation, improving the appearance of the model.

In the end, the time from training video to a "state-of-the-art rendering quality" animation of the human model and the scene, outputted with a render speed of 60fps at a HD resolution, is about half an hour. This is claimed to be about 100 times faster than other methods, including NeuMan and Vid2Avatar.

The research paper lists its authors as Muhammed Kocabas, Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan, and was produced in collaboration with the Max Planck Institute for Intelligent Systems.

Apple has been working on the idea of creating digital avatars for quite some time, with the concept of a high-detailed version appearing in the Apple Vision Pro. To enable FaceTime conversations, as well as an external view of the user's eyes, the headset creates a digital "Persona," which is used in various ways to represent the user.

6 Comments

Marvin 19 Years · 15405 comments

About 1 year ago

AppleInsider said:
The method itself requires a small amount of video of the subject, typically in motion within a scene and showing as many surfaces as possible for the system to work from. The technique can use very short clips in some cases, sometimes monocular video with as few as 50 to 100 frames, equating to two to four seconds of 24fps video.
The system has been trained to "disentangle the static scene and a fully-animatable human avatar within 30 minutes," Apple claims.

In the end, the time from training video to a "state-of-the-art rendering quality" animation of the human model and the scene, outputted with a render speed of 60fps at a HD resolution, is about half an hour. This is claimed to be about 100 times faster than other methods, including NeuMan and Vid2Avatar.

The research paper lists its authors as Muhammed Kocabas, Rick Chang, James Gabriel, Oncel Tuzel, and Anurag Ranjan, and was produced in collaboration with the Max Planck Institute for Intelligent Systems.

Apple has been working on the idea of creating digital avatars for quite some time, with the concept of a high-detailed version appearing in the Apple Vision Pro. To enable FaceTime conversations, as well as an external view of the user's eyes, the headset creates a digital "Persona," which is used in various ways to represent the user.

This will help a lot in making 3D avatars quickly at high quality. They can do lighting control so the avatar will blend better with the virtual environment:

https://www.youtube.com/watch?v=1V85241UJmg

https://www.youtube.com/watch?v=s6Lz-qjs_mA

These don't have the uncanny valley feeling that CGI avatars do. Animating them (especially faces) can still be difficult but Meta's ones look good with animation.

3 Likes · 0 Dislikes

ddawson100 17 Years · 547 comments

About 1 year ago

Look, I have no idea how to assess the technical merits of this but it looks absolutely stunning and if training is that quick then 2024 is going to be an interesting year.

3 Likes · 0 Dislikes

9secondkox2 9 Years · 3373 comments

About 1 year ago

Sheesh that’s amazing.

Sucks for those who’ve spent much of their lives getting good at these things.

2 Likes · 0 Dislikes

chasm 11 Years · 3730 comments

About 1 year ago

We’ve actually already seen an earlier version of HUGS in the demo of the digital persona on a FaceTime call when the user has on their Apple Vision Pro.

That demo in and of itself was impressive, now imagine the months that have been spent improving it since June.