Profile Picture

FW Plugin Sequence vs. Webcam

Posted By Kelleytoons 4 Years Ago
You don't have permission to rate!
Author
Message
Kelleytoons
Kelleytoons
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)

Group: Forum Members
Last Active: 3 hours ago
Posts: 7.7K, Visits: 16.7K
Will folks who know (which basically means either those working for RL, or closely WITH RL, or who have been there at Siggraph talking with the FW folks -- even folks playing with the FW demo won't really know) comment here about the quality of lip-sync capture of a video sequence vs. a webcam?

Here's the issue: even with a very fast hard drive, I/O is not going to be nearly as good as streaming from a webcam.  And we already know that getting that 60fps or higher (at 720) is the key to really quality lipsync.  I worry that a video sequence, even captured at 60fps, will result in so many thousands of images coming off a drive that it will be *really* hard to get the same quality of lip sync they tout from the webcam interface.

(Rampa may argue we can "trick" the plugin into accepting a video file and that's possible, but we may still end up with I/O issues and the trick may not be as fast a frame rate as we need).

I'd like to know this because it's crucial to getting good lip sync from remote actors -- if it MUST be done via webcam then obviously some accommodations need to be made that won't otherwise.

Looking to hear specifically from either Peter or Stuckon3D, who have intimate knowledge of this (one theoretical, one practical :>Wink.



Alienware Aurora R7, Win 10, i7-8700k, 4.7GHz CPU, 32GB RAM, GTX Titan XP (12GB), Samsung 960 Pro 2TB M-2 SSD, TB+ Disk space
Mike "ex-genius" Kelley
Jfrog
Jfrog
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)Distinguished Member (3.5K reputation)

Group: Forum Members
Last Active: Yesterday
Posts: 529, Visits: 3.8K
Good point!  And what about the sound? There is no production sound attached to image sequences so how can FW record audio simultaneously with the image for lip-sync ? This will have to be a 2 steps process.  

Intel I7 6850K 3.6 with MSI X99 SLI plus , GTX 1080,  32Gb Ram , 1 TB Samsung Pro M.2 SSD
Kelleytoons
Kelleytoons
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)

Group: Forum Members
Last Active: 3 hours ago
Posts: 7.7K, Visits: 16.7K
I'm guessing, J, that the sequence would be coming from whatever video you record (for example, you can export a sequence from Premiere Pro of any video).  So whatever audio you recorded at the time will end up being your audio track.

*However* -- if your point is how do you use the audio to "help" with the lip-sync (something I think is possible with live webcam recording -- don't know a lot of details about the FW other than the demos I've seen) then that's an excellent question.  Can we feed the audio track in along with that sequence?  If not, again, then the ability to use a sequence becomes a lot less attractive (and talent will need in-studio recording).

Which then begs the question: can we install this on both our desktop AND our laptop?  I would hope so and not deal with any shit licensing problems, because pretty obviously we'll need it on a portable solution for recording actors who can't come to the studio as well as those times we need it in the studio.  I know iClone allows this with their base software, but I'm going to guess FW will be a problem (and that needs to change).



Alienware Aurora R7, Win 10, i7-8700k, 4.7GHz CPU, 32GB RAM, GTX Titan XP (12GB), Samsung 960 Pro 2TB M-2 SSD, TB+ Disk space
Mike "ex-genius" Kelley
animagic
animagic
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)

Group: Forum Members
Last Active: 14 minutes ago
Posts: 13.1K, Visits: 23.2K
If you have an SSD then throughput isn't really an issue, but even a regular hard drive should be able to handle it.

I run image sequences from a regular hard drive for my video editing, which is 30 fps @ 1920x1080, and it plays back without issue. 60 fps @ 1280x720 is about the same throughput.

Audio should be recorded separately for quality considerations; webcam audio wouldn't be sufficient anyway. It is not required for lip-synching as this is taken care of by the facial Mocap from what I understand.

You will need a powerful PC though, so any laptop requirements would be pretty severe as well. For my latest system I wanted a more portable solution and considered a laptop, but in the end I didn't, mainly because I didn't see how it could be cooled sufficiently and be quiet at the same time. So I built a "transportable" PC as a compromise, and it works well.



https://forum.reallusion.com/uploads/images/1f3bed13-1788-45e1-9ace-4fd0.jpg


Kelleytoons
Kelleytoons
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)

Group: Forum Members
Last Active: 3 hours ago
Posts: 7.7K, Visits: 16.7K
Ani,

That is NOT my experience at all (and my PC is about the highest end you could buy).  If I am editing a video then, yes, my NLE does just fine, but if I have a sequence of stills (like produced from iClone) then there is no way it will just scrub easily on the timeline unless it precomposes that sequence (and even that isn't smooth but rather a "Cheat" since it won't then compose at 60fps).  Now, I'm running off a very fast HD (because there is no way I'd have enough room to do this on my SSD -- that's another fallacy, as most will not put their stuff there).

I think you are VERY mistaken about this.  But I still want to hear from those who know, not those who speculate.



Alienware Aurora R7, Win 10, i7-8700k, 4.7GHz CPU, 32GB RAM, GTX Titan XP (12GB), Samsung 960 Pro 2TB M-2 SSD, TB+ Disk space
Mike "ex-genius" Kelley
Edited
4 Years Ago by Kelleytoons
freerange
freerange
Posted 4 Years Ago
View Quick Profile
Veteran Member

Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)

Group: Forum Members
Last Active: 4 Years Ago
Posts: 165, Visits: 437
Image sequence load faster than video since video goes through extra decode process, which is why almost all VFX software prefers or even requires image sequences. Decode for common video formats like H.264, ProRes, DNxHR, etc are done on the GPU so affect performance of GPU based apps. Frame sequences are just pure disk IO for the most part and the image is handled on the CPU for the most part. For editorial though it is the reverse, though you pretty much want all your system (especially GPU) dedicated to the editorial task. Most NLEs while they handle frame sequences just fine, just do better with clips and the workflow is WAY easier with clips. NLEs make good sue of ALL the system resources it can and will eat up all the GPU and disk IO it can. Finishing systems bridge the gap between VFX software and NLE (think Resolve, NukeStudio, Flame etc) in that they treat an image sequence like a clip.  

You can keep things pretty small and light too with image sequences. We used JPG sequences just fine with Faceware. We tested as low as SD resolutions and it was fine too, as long as the face was decently close to camera. It is more about frame rate though we used a lot of 30fps footage just fine. 720p 60fps is pretty much the sweet spot as has been mentioned. Even an older gaming box should be able to handle a 720p image sequence especially if it is JPG.

Also remember the processing for faceware happens on a frame by frame basis so there is no dropped frame, you just lose realtime playback, but the whole sequence will get handled without issue so you will not lose any of the facial performance even if playback performance suffers. 

Unless you do something strange audio stays locked since it is generated from the same clip. 

Anymore your working drive should be flash storage like SSD or even better an M.2 or U.2 (more rare)drive, even consumer based models are very good and are common up to 2TB with a few models having larger sizes (but probably too pricy). I am a big fan of Samsung with it comes to flash storage and HGST (Owned by WD) when it comes to spinning disks (one of the few that has the disk spindle attached both top and bottom even on consumer disks). It will help with IO performance more so than just about anything else and they are pretty cheap now. Ram would be the 2nd thing I recommend as the more you can load into memory the smoother things will run and helps keep the system IO from getting starved. 

That said you can do full HD 1080P 24fps editorial over standard gigabit networks and that has much higher overhead than you system does. I have noticed some odd performance issues with Windows 10 crop up every now and then. So make sure you keep your system up to date and GPU drivers up to date. 

On a side note, curious as to your thoughts on an idea I talked to Reallusion about since you do editorial. With python we could parse an XML file (premiere) or AAF (Avid) and use that to build a sequence in iClone. The script could handle transcoding and prep for audio, video and image sequences for iClone. Your pretty much doing a conform of your edit in iClone then you can use that to set shots with either editorial data or placeholders. You could then do the reverse where rendering a shot in the sequence from iClone updates the edit in your NLE. You could take it one step further and add takes or versions for each clip within a shot so you could do performance takes. Pretty much how most finishing systems work and how shot tracking is done (though usually not whole sequence based) for VFX tools. I figured for iClone users who are usually handling most the load on their own they would prefer a sequence based solution vs the typical VFX pipeline which is shot driven. It is all python driven so it doesn't really need to come from Reallusion but was wondering if you think this would be a useful feature for users?

Free Range
My IMDB
Edited
4 Years Ago by freerange
but0fc0ursee
but0fc0ursee
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)Distinguished Member (1.6K reputation)

Group: Banned Members
Last Active: 4 Years Ago
Posts: 595, Visits: 1.7K
Wow!
freerange droppin' knowledge.
+1 Rolleyes
animagic
animagic
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)Distinguished Member (24.2K reputation)

Group: Forum Members
Last Active: 14 minutes ago
Posts: 13.1K, Visits: 23.2K
Kelleytoons (8/10/2017)
I think you are VERY mistaken about this.  But I still want to hear from those who know, not those who speculate.

I wasn't speculating...



https://forum.reallusion.com/uploads/images/1f3bed13-1788-45e1-9ace-4fd0.jpg


Kelleytoons
Kelleytoons
Posted 4 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)Distinguished Member (26.3K reputation)

Group: Forum Members
Last Active: 3 hours ago
Posts: 7.7K, Visits: 16.7K
freerange (8/10/2017)

You can keep things pretty small and light too with image sequences. We used JPG sequences just fine with Faceware. We tested as low as SD resolutions and it was fine too, as long as the face was decently close to camera. It is more about frame rate though we used a lot of 30fps footage just fine. 720p 60fps is pretty much the sweet spot as has been mentioned. Even an older gaming box should be able to handle a 720p image sequence especially if it is JPG.

Also remember the processing for faceware happens on a frame by frame basis so there is no dropped frame, you just lose realtime playback, but the whole sequence will get handled without issue so you will not lose any of the facial performance even if playback performance suffers. 

Unless you do something strange audio stays locked since it is generated from the same clip. 


That's great info, particularly the last (about handling the sequence without issues even if playback performance suffers).  I really don't care about real time playback since the performance will already be captured (and that's where the editing will have already taken place).  I'm *most* interested in remote talent, as nowadays all the folks I used to work with live on one of the coasts (for some odd reason they didn't retire with me to Florida :>Wink.  I intend to have them record their performances and feed them into FW -- and you are or aren't confirming that audio IS used with FW to improve lip sync?  I thought I had heard that detail in one of the demos.

Your idea about reading an XML file from Premiere is interesting but I'd have to see what the goal would be.  Are you trying to say using live footage inside of Premiere to drive character based scenes in iClone?  If so, it's an intriguing idea, but I thought it would pertain more to a studio than to someone like myself, who's doing this for run (mostly).  I like to see the full backgrounds and I use iClone as its own storyboard first and foremost.  Then again, I have zero experience with FW so perhaps that might be the best way to go (although how you'd handle multiple characters in a scene I don't know).





Alienware Aurora R7, Win 10, i7-8700k, 4.7GHz CPU, 32GB RAM, GTX Titan XP (12GB), Samsung 960 Pro 2TB M-2 SSD, TB+ Disk space
Mike "ex-genius" Kelley
freerange
freerange
Posted 4 Years Ago
View Quick Profile
Veteran Member

Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)Veteran Member (983 reputation)

Group: Forum Members
Last Active: 4 Years Ago
Posts: 165, Visits: 437
Unfortunately audio has no affect on the tracking quality. One area iClone needs some love is with their viseme system. We did do a test where we used face ware for everything but the mouth. Used the text lip sync from Crazytalk (for some reason it works the best of their software), generated the lip-sync file and imported that into iClone and then lined it up to another character that had audio visme to get correct spacing on phonemes. This gave results that were equal with any AAA game character and better than a lot of TV 3D animation. Adobe Character Animator uses a similar system for their 2D face tracking, the face tracking drives eyes and head but viseme drives the mouth. iClones viseme system is in need of a bit of an overhaul, it isn't even as accurate as their other software. They would make a lot of users really happy if they made procedural mouth animation more accurate. 

For our project we were working with a lot of VO talent in LA and the mocap folks and animators were in NY. A lot of the time the footage was just simple cell phone footage the VO actors shot themselves. This worked very well for us, especially if they shot 60fps on their phone. One trick we used was have the actor hold a neutral face pose for a little bit before going into their performance. This helps the tracker lock-on better throughout the performance. If possible shoot slightly up at their face so chin or mouth height. We used tons of different cameras and surprisingly the Logitech webcams gave some of the best results, even beating out facewares own HD cam setup. Eyes, head tracking, expression were all pretty much equal but you would could see the benefits on the mouth, especially with challenging mouth shapes like pucker.  The Martin bros built their own helmet using the Logitech cam which was really cool. We were never doing dialogue during body capture so we didn't have the need for a helmet cam and instead used a standard desktop setup.

Yeah, would not help much if iClone was used as the storyboard. Only useful if you do storyboards and do an animatic in your NLE to work out timing and audio. Then using the NLE XML export you use that to setup all your iClone shots automatically. Pretty much open iClone and audio is prepped and you have a backdrop image to animate against. We often had performers or artists act out the characters on video and used that as reference to animate against. So the edit became the hub of the work and that fed out to all the tools we used to set the correct in and out points for each shot and prep the shots with all relevant reference material. If you are pretty much doing all that in iClone itself then it would be too much hassle and it is always better to keep it simple. 

Free Range
My IMDB



Reading This Topic