You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I understand that the current solution operates on a single-frame basis with 2D input, similar to GeneFace++. While we have a video-driven solution, it appears that the inference remains single-frame basis.
I am exploring the application of these audio-to-face solutions within a 3D video streaming system, utilizing depth sensors to capture data. With depth information and data from previous frames, I believe it is possible to accelerate inference and enhance reconstruction quality.
I would appreciate any insights or advice on this approach. Thank you!
The text was updated successfully, but these errors were encountered:
Thank you for your outstanding work!
I understand that the current solution operates on a single-frame basis with 2D input, similar to GeneFace++. While we have a video-driven solution, it appears that the inference remains single-frame basis.
I am exploring the application of these audio-to-face solutions within a 3D video streaming system, utilizing depth sensors to capture data. With depth information and data from previous frames, I believe it is possible to accelerate inference and enhance reconstruction quality.
I would appreciate any insights or advice on this approach. Thank you!
The text was updated successfully, but these errors were encountered: