FW Plugin Sequence vs. Webcam

FW Plugin Sequence vs. Webcam

Home
»
Archive
»
iClone 7
»
General

https://forum.reallusion.com/Topic336480.aspx

Print Topic | Close Window

By Kelleytoons - 8 Years Ago

Will folks who know (which basically means either those working for RL, or closely WITH RL, or who have been there at Siggraph talking with the FW folks -- even folks playing with the FW demo won't really know) comment here about the quality of lip-sync capture of a video sequence vs. a webcam?

Here's the issue: even with a very fast hard drive, I/O is not going to be nearly as good as streaming from a webcam. And we already know that getting that 60fps or higher (at 720) is the key to really quality lipsync. I worry that a video sequence, even captured at 60fps, will result in so many thousands of images coming off a drive that it will be *really* hard to get the same quality of lip sync they tout from the webcam interface.

(Rampa may argue we can "trick" the plugin into accepting a video file and that's possible, but we may still end up with I/O issues and the trick may not be as fast a frame rate as we need).

I'd like to know this because it's crucial to getting good lip sync from remote actors -- if it MUST be done via webcam then obviously some accommodations need to be made that won't otherwise.

Looking to hear specifically from either Peter or Stuckon3D, who have intimate knowledge of this (one theoretical, one practical :>).

By Jfrog - 8 Years Ago

Good point! And what about the sound? There is no production sound attached to image sequences so how can FW record audio simultaneously with the image for lip-sync ? This will have to be a 2 steps process.

By Kelleytoons - 8 Years Ago

I'm guessing, J, that the sequence would be coming from whatever video you record (for example, you can export a sequence from Premiere Pro of any video). So whatever audio you recorded at the time will end up being your audio track.

*However* -- if your point is how do you use the audio to "help" with the lip-sync (something I think is possible with live webcam recording -- don't know a lot of details about the FW other than the demos I've seen) then that's an excellent question. Can we feed the audio track in along with that sequence? If not, again, then the ability to use a sequence becomes a lot less attractive (and talent will need in-studio recording).

Which then begs the question: can we install this on both our desktop AND our laptop? I would hope so and not deal with any shit licensing problems, because pretty obviously we'll need it on a portable solution for recording actors who can't come to the studio as well as those times we need it in the studio. I know iClone allows this with their base software, but I'm going to guess FW will be a problem (and that needs to change).

By animagic - 8 Years Ago

If you have an SSD then throughput isn't really an issue, but even a regular hard drive should be able to handle it.

I run image sequences from a regular hard drive for my video editing, which is 30 fps @ 1920x1080, and it plays back without issue. 60 fps @ 1280x720 is about the same throughput.

Audio should be recorded separately for quality considerations; webcam audio wouldn't be sufficient anyway. It is not required for lip-synching as this is taken care of by the facial Mocap from what I understand.

You will need a powerful PC though, so any laptop requirements would be pretty severe as well. For my latest system I wanted a more portable solution and considered a laptop, but in the end I didn't, mainly because I didn't see how it could be cooled sufficiently and be quiet at the same time. So I built a "transportable" PC as a compromise, and it works well.

By Kelleytoons - 8 Years Ago

Ani,

That is NOT my experience at all (and my PC is about the highest end you could buy). If I am editing a video then, yes, my NLE does just fine, but if I have a sequence of stills (like produced from iClone) then there is no way it will just scrub easily on the timeline unless it precomposes that sequence (and even that isn't smooth but rather a "Cheat" since it won't then compose at 60fps). Now, I'm running off a very fast HD (because there is no way I'd have enough room to do this on my SSD -- that's another fallacy, as most will not put their stuff there).

I think you are VERY mistaken about this. But I still want to hear from those who know, not those who speculate.

By freerange - 8 Years Ago

Image sequence load faster than video since video goes through extra decode process, which is why almost all VFX software prefers or even requires image sequences. Decode for common video formats like H.264, ProRes, DNxHR, etc are done on the GPU so affect performance of GPU based apps. Frame sequences are just pure disk IO for the most part and the image is handled on the CPU for the most part. For editorial though it is the reverse, though you pretty much want all your system (especially GPU) dedicated to the editorial task. Most NLEs while they handle frame sequences just fine, just do better with clips and the workflow is WAY easier with clips. NLEs make good sue of ALL the system resources it can and will eat up all the GPU and disk IO it can. Finishing systems bridge the gap between VFX software and NLE (think Resolve, NukeStudio, Flame etc) in that they treat an image sequence like a clip.

You can keep things pretty small and light too with image sequences. We used JPG sequences just fine with Faceware. We tested as low as SD resolutions and it was fine too, as long as the face was decently close to camera. It is more about frame rate though we used a lot of 30fps footage just fine. 720p 60fps is pretty much the sweet spot as has been mentioned. Even an older gaming box should be able to handle a 720p image sequence especially if it is JPG.

Also remember the processing for faceware happens on a frame by frame basis so there is no dropped frame, you just lose realtime playback, but the whole sequence will get handled without issue so you will not lose any of the facial performance even if playback performance suffers.

Unless you do something strange audio stays locked since it is generated from the same clip.

Anymore your working drive should be flash storage like SSD or even better an M.2 or U.2 (more rare)drive, even consumer based models are very good and are common up to 2TB with a few models having larger sizes (but probably too pricy). I am a big fan of Samsung with it comes to flash storage and HGST (Owned by WD) when it comes to spinning disks (one of the few that has the disk spindle attached both top and bottom even on consumer disks). It will help with IO performance more so than just about anything else and they are pretty cheap now. Ram would be the 2nd thing I recommend as the more you can load into memory the smoother things will run and helps keep the system IO from getting starved.

That said you can do full HD 1080P 24fps editorial over standard gigabit networks and that has much higher overhead than you system does. I have noticed some odd performance issues with Windows 10 crop up every now and then. So make sure you keep your system up to date and GPU drivers up to date.

On a side note, curious as to your thoughts on an idea I talked to Reallusion about since you do editorial. With python we could parse an XML file (premiere) or AAF (Avid) and use that to build a sequence in iClone. The script could handle transcoding and prep for audio, video and image sequences for iClone. Your pretty much doing a conform of your edit in iClone then you can use that to set shots with either editorial data or placeholders. You could then do the reverse where rendering a shot in the sequence from iClone updates the edit in your NLE. You could take it one step further and add takes or versions for each clip within a shot so you could do performance takes. Pretty much how most finishing systems work and how shot tracking is done (though usually not whole sequence based) for VFX tools. I figured for iClone users who are usually handling most the load on their own they would prefer a sequence based solution vs the typical VFX pipeline which is shot driven. It is all python driven so it doesn't really need to come from Reallusion but was wondering if you think this would be a useful feature for users?

By but0fc0ursee - 8 Years Ago

Wow!
freerange droppin' knowledge.
+1 :rolleyes:

By animagic - 8 Years Ago

Kelleytoons (8/10/2017)

I think you are VERY mistaken about this. But I still want to hear from those who know, not those who speculate.

I wasn't speculating...

By Kelleytoons - 8 Years Ago

freerange (8/10/2017)

You can keep things pretty small and light too with image sequences. We used JPG sequences just fine with Faceware. We tested as low as SD resolutions and it was fine too, as long as the face was decently close to camera. It is more about frame rate though we used a lot of 30fps footage just fine. 720p 60fps is pretty much the sweet spot as has been mentioned. Even an older gaming box should be able to handle a 720p image sequence especially if it is JPG.

Also remember the processing for faceware happens on a frame by frame basis so there is no dropped frame, you just lose realtime playback, but the whole sequence will get handled without issue so you will not lose any of the facial performance even if playback performance suffers.

Unless you do something strange audio stays locked since it is generated from the same clip.

That's great info, particularly the last (about handling the sequence without issues even if playback performance suffers). I really don't care about real time playback since the performance will already be captured (and that's where the editing will have already taken place). I'm *most* interested in remote talent, as nowadays all the folks I used to work with live on one of the coasts (for some odd reason they didn't retire with me to Florida :>). I intend to have them record their performances and feed them into FW -- and you are or aren't confirming that audio IS used with FW to improve lip sync? I thought I had heard that detail in one of the demos.

Your idea about reading an XML file from Premiere is interesting but I'd have to see what the goal would be. Are you trying to say using live footage inside of Premiere to drive character based scenes in iClone? If so, it's an intriguing idea, but I thought it would pertain more to a studio than to someone like myself, who's doing this for run (mostly). I like to see the full backgrounds and I use iClone as its own storyboard first and foremost. Then again, I have zero experience with FW so perhaps that might be the best way to go (although how you'd handle multiple characters in a scene I don't know).

By freerange - 8 Years Ago

Unfortunately audio has no affect on the tracking quality. One area iClone needs some love is with their viseme system. We did do a test where we used face ware for everything but the mouth. Used the text lip sync from Crazytalk (for some reason it works the best of their software), generated the lip-sync file and imported that into iClone and then lined it up to another character that had audio visme to get correct spacing on phonemes. This gave results that were equal with any AAA game character and better than a lot of TV 3D animation. Adobe Character Animator uses a similar system for their 2D face tracking, the face tracking drives eyes and head but viseme drives the mouth. iClones viseme system is in need of a bit of an overhaul, it isn't even as accurate as their other software. They would make a lot of users really happy if they made procedural mouth animation more accurate.

For our project we were working with a lot of VO talent in LA and the mocap folks and animators were in NY. A lot of the time the footage was just simple cell phone footage the VO actors shot themselves. This worked very well for us, especially if they shot 60fps on their phone. One trick we used was have the actor hold a neutral face pose for a little bit before going into their performance. This helps the tracker lock-on better throughout the performance. If possible shoot slightly up at their face so chin or mouth height. We used tons of different cameras and surprisingly the Logitech webcams gave some of the best results, even beating out facewares own HD cam setup. Eyes, head tracking, expression were all pretty much equal but you would could see the benefits on the mouth, especially with challenging mouth shapes like pucker. The Martin bros built their own helmet using the Logitech cam which was really cool. We were never doing dialogue during body capture so we didn't have the need for a helmet cam and instead used a standard desktop setup.

Yeah, would not help much if iClone was used as the storyboard. Only useful if you do storyboards and do an animatic in your NLE to work out timing and audio. Then using the NLE XML export you use that to setup all your iClone shots automatically. Pretty much open iClone and audio is prepped and you have a backdrop image to animate against. We often had performers or artists act out the characters on video and used that as reference to animate against. So the edit became the hub of the work and that fed out to all the tools we used to set the correct in and out points for each shot and prep the shots with all relevant reference material. If you are pretty much doing all that in iClone itself then it would be too much hassle and it is always better to keep it simple.

By Kelleytoons - 8 Years Ago

Thanks for that info on your VO talent. I will keep those tips in mind as I solicit my friends work.

I'm actually hopeful they expose the visemes to python scripting (which I have a ton of experience in) because then I can use Papagayo (the absolutely BEST lip-sync program I've ever found, and I've spend thousands of dollars on them) to generate the tracks and then script the visemes into iClone. Otherwise perhaps I'll try the CT route as well.

By Jfrog - 8 Years Ago

Am I missing something? I thought that Faceware allows simultaneous recording of facial recording along with the visems bypassing the Iclone visems workflow.

By Rampa - 8 Years Ago

Jfrog (8/11/2017)

Am I missing something? I thought that Faceware allows simultaneous recording of facial recording along with the visems bypassing the Iclone visems workflow.

Check this video. Stuckon is recording audio and mocap at the same time, and then can edit the visemes after the fact.

The actual facial mocap by itself is equivalent to Face Puppet, but driven by FW from the video.

By VirtualMedia - 8 Years Ago

Thanks for the share rampa, this related video shows the use of an image sequence with FW and worth noting their running demos through laptops. I'm sure their using high end laptops but they drive FW nicely.

By CtrlZ - 8 Years Ago

Even though These features are out of my range (experience) and price I am super excited to see what you guys do with this technology!

By paulg625 - 8 Years Ago

Kellytoon,
I'm curious when you say your system is the "highest end you could buy" but then talk about SSD room restriction because you can get Multiple SSD in 2TB and run them in a Raid configuration. or a 500 G M2 drive and use it strictly for swap memory. I think you were meaning Highest you could afford ( which is what we all do.) because highest you could buy runs into 20K+ trust me I've dreamed such systems just couldn't afford.

Sorry, Guess my point is your contradicting yourself when you speak of the restrictions your system has and it being the "highest end" . I think if you had the highest end or just a higher end possible add a large SSD you could get the results you are wanting.

Kelleytoons (8/10/2017)

By Kelleytoons - 8 Years Ago

I'm saying I HAVE SSDs but I would never use them for the HUNDREDS of video sequences I am using -- if you have a system that can run 20 SSDs in it I'd love to see it (I have around 40TBs of disk space -- but I'm sure you have plenty more than that, and all in SSD).

Grow up, guys. It's not practical to use SSDs for this kind of processing, unless you are going to be constantly cleaning and removing your videos (and I'm too old to have that kind of time in my life. Maybe that's what I should have said -- I'm not YOUNG enough to use SSD space :>)

By Jfrog - 8 Years Ago

"Check this video. Stuckon is recording audio and mocap at the same time, and then can edit the visemes after the fact."

Thank you Rampa.

Regarding SSDs, I was using 500 MB/s (read / write speed) SSD from Samsung, they are a huge improvement over Sata 3 HD drive (max 180 MB/s) but now my main "work drive" is a 1TB M2 drive by Samsung that is really fast (2000MB/s write, 3000 MB/s read). The pro model is as expensive as a GTX 1080 but it is worth it when working with huge file (4K video image sequences for example). When I am done with a project I just transfer it to one of my two 5TB internal drive, keeping only the current projects on the M2 SSD.

One question, on the video above they said that the set was create in "Dust" or "Dusk"? and imported into clone. I did not get the name properly. Does anybody knows what is the name of this set creator?

Thank you.

By paulg625 - 8 Years Ago

Wow,
I wasn't trying to be a jerk Kellytoon I was asking a serious question Just trying to understand your perspective and this topic.

Kelleytoons (8/11/2017)

By animagic - 8 Years Ago

I was hoping that the facial Mocap would NOT rely on iClone's current speech to viseme assignment as it is messy and requires a lot of cleanup. So more clarification would be welcome.

The CT route mentioned earlier only gives improvements when using TTS (where the text is used to aid the viseme assignment), otherwise the results are the same as with iClone.

3DTest devised a method that uses the TTS-generated script as a reference (see https://www.youtube.com/watch?v=DGs7idDhsPk ), but the hope is that text-support lip-sync will be integrated in iClone in the future.

By CaseClosed - 8 Years Ago

When I was test driving Faceware a couple of years ago, the animation was created using facial bones. It became clear that Reallusion had no intention of using bones in their characters, and that was the obstacle for my intended pipeline. It seems from watching those videos posted in this thread that Faceware and Reallusion created a plugin that uses morphs for facial animation, which works with Reallusion’s preferred approach. This makes sense for Faceware to be able to provide facial mocap at such a huge discount for us. Using the imbedded Viseme technology to fine-tune facial mocap is a great advantage for iClone, though I’d prefer to just get quality facial mocap the first time.

The primary concern I have is that the facial animation in iClone appears to not always be smooth, but rather jumps. I see this primarily in the lips. Is there a detailed demo of Faceware facial mocap using iClone’s plugin? I’d like lo see how well this new plugin will capture expressions and talking. Thanks!

By Jfrog - 8 Years Ago

I was hoping that the facial Mocap would NOT rely on iClone's current speech to viseme assignment as it is messy and requires a lot of cleanup. So more clarification would be welcome.

I really thought it wasn't. As it is implemented now, the lip sync decoding feature is far from being real. And I am not talking about the facial motions which will be solved with Faceware obviously. The facials for me is just ice on the cake but we need a cake.

The primary concern I have is that the facial animation in iClone appears to not always be smooth, but rather jumps. I see this primarily in the lips. Is there a detailed demo of Faceware facial mocap using iClone’s plugin? I’d like lo see how well this new plugin will capture expressions and talking.

I would love to see some demos too with different types of characters, real ones and cartoon characters too. Thank you!

By but0fc0ursee - 8 Years Ago

CaseClosed (8/11/2017)

Have you seen the RL_Max_Templates?
~ G1 thru G5
They earlier versions only had jaw and eye bones.
Then twist and roll bones were added.
Then RL added UI and a complete set of facial bones
"All of this Before IC6".... and the Bones are used for Lyp Snc.

It seems from watching those videos posted in this thread that Faceware and Reallusion created a plugin that uses morphs for facial animation, which works with Reallusion’s preferred approach.

Faceware exports "Bone Animation" that's refined with solvers and retargeting algorithms.
"I've NEVER seen any Markerless Facial Mocap Software "Export Vertex Animation DATA..... only bone animation data!"

Are you sure Faceware export vertex animation DATA???
Teach Me.... please.

Using the imbedded Viseme technology to fine-tune facial mocap is a great advantage for iClone, though I’d prefer to just get quality facial mocap the first time.

Allow me to break this down in easy to understand terms.

The embedded Viseme technology you speak of is "Progressive Morphing"
~ iClone Morphs, but does not have "Progressive Morphing"
Progressive Morphing is when you "Blend 2 or 3 Phonemes AND Visemes" all at ONCE to obtain that PERFECT look your after.

MYTH: Morph Animation provides better animation results than Bone-Base Animation.... Bolonga.
MYTH: Bone-Base animation provides better animation results and Vertex Animation..... Bolonga

Learn From Reallusion:
TODAY.... The most powerful method of achieving AWESOME lip syn is....
HYBRID RIG.... Bone-Base and BLEND MORPHS.

Refer to Reallusion's.... CROCODILE HUNTER
THIS is how you do it.

"All iClone needs is "PROGRESSIVE MORPHING" and iClone's Speech Engine would be COMLETE! :Wow::rolleyes:

The primary concern I have is that the facial animation in iClone appears to not always be smooth, but rather jumps. I see this primarily in the lips.

Yeah, this is what happens if you don't have Progressive Morphing.... It tends to "Snap" when blending "Certian Visemes to Certain Phonemes." :angry:

Is there a detailed demo of Faceware facial mocap using iClone’s plugin? I’d like lo see how well this new plugin will capture expressions and talking. Thanks!

No Worries Here.... the results are smooth..... The Analyzer and Retargeter Takes care of this.
...the sole reason it cost so dang much. :hehe:

By but0fc0ursee - 8 Years Ago

CaseClosed,
I understand your meaning when you said iClone's preferred method..... Morph (Vertex Animation)
Reallusion created the Morphing animation...

...and most love it, because it super easy.
But....

Using ONLY bone animation....
or....
Using ONLY Morph animation
"Dose NOT produce that beautiful, smooth facial mocap you seek."

By SeanMac - 8 Years Ago

@JFrog Re: One question, on the video above they said that the set was create in "Dust" or "Dusk"? and imported into clone. I did not get the name properly. Does anybody knows what is the name of this set creator?

I think they may have meant https://www.3d.sk/
Alternatively there is a NASA thing called Digital Shape Kernel https://naif.jpl.nasa.gov/pub/naif/self_training/individual_docs/27_shape_model_preview.pdf

Hope this helps.
Regards

Sean McHugh

By CaseClosed - 8 Years Ago

[b]but0fc0ursee (8/12/2017)the results are smooth..... The Analyzer and Retargeter Takes care of this.
...the sole reason it cost so dang much. Hehe

Thank you for your detailed response. Yes, I do remember now that there are face bones in the characters (thank you for reminding me), but using iClone 5 and iClone 6 I did not have access to those face bones for facial mocap. I exported the characters in multiple formats (G5, CC etc. iClone 5, iClone6) to Maya and the face bones were absent. Reallusion technical support basically told me that’s the way it is. I didn’t challenge them on their reasoning for this. I accepted they had a business strategy and that was that. I could not import face bone animation to an iClone character either. That’s why this news regarding Faceware integration into iClone is huge news for Reallusion and for me.

As I’ve said in another thread, Faceware provided a top quality facial mocap experience for me, and it’s exciting to hear their collaboration with iClone. It’s good to hear you say that the facial mocap plugin provides smooth quality animation. Looking forward to seeing a detailed demo. Thanks!

By but0fc0ursee - 8 Years Ago

CaseClosed (8/12/2017)

[b]but0fc0ursee (8/12/2017)the results are smooth..... The Analyzer and Retargeter Takes care of this.
...the sole reason it cost so dang much. Hehe

but using iClone 5 and iClone 6 I did not have access to those face bones for facial mocap. I exported the characters in multiple formats (G5, CC etc. iClone 5, iClone6) to Maya and the face bones were absent. Reallusion technical support basically told me that’s the way it is. I didn’t challenge them on their reasoning for this.

Regular iClone users.... Yes, that's the way it is for THEM...
But...

You have Maya... You DON'T have to settle for less.... I DIDN'T
No Face Bones - - - - Give it face bones
~ Build a UI in Maya to control the Bones... Reallusion does this with iClone 5
~ I learned this from the RL_G5_Max Template.

You can't possible say G5 RL bones are NOT present.
Also:
Look closely at the Face.... Reallusion added approx. 60 (CP points) to control the mesh.
CP points = Point Helpers (WITH BONE PROPERTIES) that control the mesh... just like bones do!

Non-Human WILL accept facial animation.
The moment you settle for less.... They got you by the .....
Buy this
Buy that
Buy these
Buy those

Oh... HeLp NO... I'll rig it myself and save! :rolleyes: