What would be a general workflow for working with mocap in clone?


https://forum.reallusion.com/Topic272785.aspx
Print Topic | Close Window

By Ibis Fernandez - 9 Years Ago
We were playing around last night and realized I really have no clue hahaha

Traditionally we use a slate or clapper to help us sync the sound and video together when shooting a film. In this mocap process with iclone where the audio, body, and facial is done as a separate process what techniques are people using to sync up all their stuff?

Should one follow a traditional animation approach where audio is grabbed first so that the acting can be animated based on the audio?
Should the audio be captured at the same time as the mocap? If so how do you guys keep it in sync? Does iclone has a real time audio interface that can be used with the mocap?

I'd be interested in hearing from your experiences and how you'd solved some of this issues.

So far what i did is revert to a basic clap sound and gesture combo where the sound of the clap can be synced to the motion of the character clapping. pretty old school but works fine. is there a more efficient approach?






By james_muia - 9 Years Ago
The best thing to do is write yourself a script, and then write a shot list, and write a mocap list so you only capture what you need to.

A shot list is basically all of the camera angles that you are going to capture - then create a mocap list based on each item from the shot list. What are the actors doing, in each shot from the shot list. If one shot is only of the bartender, then only focus on the bartender. You basically want to know what shot each of your actors are going to be in. Wide angle of the entire bar? Then you'll need to record motion for each character. Use the shot list to determine whose in each scene. Use your mocap list to document what motions for each character you need to record for each shot.

Just slapped this together but you get the idea:

Script: A group of people enter the bar
Shot list: A medium shot of the bartender, slowly zooming in waving to the people entering the bar.
Mocap list: Bartender waves as the group enters the bar.
Mocap list: Bartender puts down a glass he was cleaning, then waves as the group enters the bar.
Mocap list: Bartender turns around, then waves to the people entering the bar.
Etc, etc.

How you were doing it in your video also works, but if you write the script, the shot list and have a mocap list it makes it a little easier.
By Kelleytoons - 9 Years Ago
It kind of depends on the software you are using.

The ipiSoft (which I use) and, I do believe the new RL mocap stuff also captures audio at the same time, so if you need audio you can just do both.  That's now the way most mocap works (performance mocap, like that Sekeris does).

But otherwise a clapboard makes a lot of sense.
By Ibis Fernandez - 9 Years Ago
sw00000p (2/18/2016)
is there a more efficient approach?

Professional Technique.
Applying animation "Data" to bones is far more efficient!

This is what ALL mocap systems provide.
All you need is a program to "Show You the Data!"
...and of course the knowledge of what to do with it!


Really!? Why didn't I think of that!!! Did you even read the post?
By Ibis Fernandez - 9 Years Ago
sw00000p (2/18/2016)
Just a suggestion. Works for me.

I shoot the video in 2 passes.
 • First Pass: Sound and Body Animation.
 • Second Pass: Markerless Facial Motion Capture.

In Both Passes... I do this...
1. Extract the animation data and script the bones to accept it.
2. Smooth the animation with a curve editor.
Fast and Accurate.
Roll 'Em! :)

 • Camcorder
 • MotionBuilder
 • 3ds Max
 • Adobe Audition

I MAKE this work with Kinect and the latest Brekel software... for body animation.
I use Maskerad for markerless facial motion capture.

You can quickly do nearly the same with Perception Neuron.

MB has a new plugin..... This guy used 32 sensors.






Kind Regards,
sw00000p



Good post. Any idea on how to make the workflow work with iClone using th eleast amount of 3rd party software possible? I mean that's kinda the point here. But i do see that with the hardware in place why would anyone really want to use iClone when you can just use motion builder and something else better suited for filmmaking?

Facial mocap with iclone as far as i know is not possible with iCmaybe something can be done with CrazyTalk 8 but would still leave the issue on how do people sync the facial and body etc. Do most people just eye ball it? Or is there some convention in place that allows for better planing of these things?
By Ibis Fernandez - 9 Years Ago
sw00000p (2/18/2016)
Ibis Fernandez (2/18/2016)
sw00000p (2/18/2016)
is there a more efficient approach?

Professional Technique.
Applying animation "Data" to bones is far more efficient!

This is what ALL mocap systems provide.
All you need is a program to "Show You the Data!"
...and of course the knowledge of what to do with it!


Really!? Why didn't I think of that!!! Did you even read the post?

Why yes.
Sync your audio and animation using modern techniques... is my suggestion.
I use MotionBuilder while you use a funky clapper!

The modern technique is far more efficient. Remember, you asked.










What is the technique?
By mtakerkart - 9 Years Ago
I use the Neuron mocap plugin with a 18 Neurons kit. I use a wireless microphone on the performer recorded with audacity on the same machine. 
I start audacity first to naming the performance then I said "top!" when starting the record mocap. So the "top" on your audio file is the start of your mocap. The difference could be few frames witch can be adjusted on a video edit software.
By Rampa - 9 Years Ago
I think you'll be much more accurate in general starting with the dialogue, as iClone stands now. That'll change with the future introduction of facial mocap.

By doing dialogue first, you can always be frame accurate by hitting record on your motion capture at frame 1 of the timeline. If your dialogue is in place already, it's always the same, take after take.

For conversations it may become valuable to record both speaking and moving together, but that gives you sync issues. Currently, iClone records and then inserts speaking. It's not real-time, so must be recorded externally as mtakerkart is doing. Speaking first gives you pretty good timing as well. It's much easier to add gesturing to speaking then speaking to gesturing! The worst is a canned conversation animation that has speaking added to it. There is almost always no correlation between gestures and speaking. 

An interesting interim is the auto-animation stuff in CT. I'm under the impression that it only exports the facial animation though, and not the upper body. Has anyone tested this? The upper body movement seems a pretty important part of it.
By Ibis Fernandez - 9 Years Ago
mtakerkart (2/18/2016)
I use the Neuron mocap plugin with a 18 Neurons kit. I use a wireless microphone on the performer recorded with audacity on the same machine. 
I start audacity first to naming the performance then I said "top!" when starting the record mocap. So the "top" on your audio file is the start of your mocap. The difference could be few frames witch can be adjusted on a video edit software.


hmmm that's an interesting aproach.

By Ibis Fernandez - 9 Years Ago
sw00000p (2/18/2016)
Ibis Fernandez (2/18/2016)
What is the technique?

... you have awesome raw animation data that's PRECISE! :w00t::Wow:

Yes, i realize this. I have compared the results of the data in terms of what iClone sees and what other programs see and there is a huge difference. The hands don't even work in the iClone captures most of the time. The thumbs, never do.

Ive actually switched to recording the raw data using the Axis Neuron software. I am thinking at sometime if it comes down to it i can use that data on other apps that can better handle it. To get the data from axis neuron into iclone or anything else all you have to do is hit play, and it gets broadcasted over to whatever plugin is set up to listen for it. Stick it doesn't address what the sync issues. This is something that has always been an issue even in regular live action film making. that why they came up with the clapper board. You see video of the object clapping, and you listen for the clap (or look at the waveform in the audio file) you match the visual with the audio and boom its synced. (by sliding the track).


...Slide the track with precision to Sync.:Wow:...

Yes, this is basically what im doing now. Except I give myself a visual cue in the performance track to match my audio cue so as not to be aimlessly sliding around indefinitely.

It feels like there would have been some standardized technique by now. How the people in planet of the apes or avatar handle it? Seems like they just brute force synchronizing their stuff. I guess with  millions of dollars in budget you can afford to just brute force your way through anything hahaha

By Ibis Fernandez - 9 Years Ago
still feel like its more efficient to just clap. you see the character clap and you sync the visual cue of the clap to the sound of the clap. once synced, you just trim off the unwanted parts.

Ill have to try your method though, see how that goes.
By Ibis Fernandez - 9 Years Ago
I think you keep missing the point. I'm already perfectly capable of "animating" from scratch. What I'm experimenting with atvthus time is motion captures performances. 

At this time I'm only really interested in finding or establishing a conventional and comprehensive workflow. 

It's only natural that any type of capture will need to be tweaked and fine tuned manually to achieve the best results, but as a filmmaker I really don't care about how it's done, as long as it gets done. 

Bones, morph targets, curve editors ... It doest matter. What matters is that the objective is achieved. If I wanted to make life more difficult for myself I wouldn't be using iclone or neurons lol.
By animagic - 9 Years Ago
I'm following this with interest, because even though I don't have PN right now, I may get it if the iClone plugin becomes a reality. As Rampa points out, using "canned" gestures with dialog is problematic. It's not impossible but takes a lot of tweaking.

I would think that if you have an audiovisual reference point, such as a clapper, to mark the beginning of your capture, you would be fine. I would also think that synching gestures and dialog would be less critical than say mouth movements and dialog, where you need frame accuracy.

The problem I see is that the performance actor and the voice actor may not be the same, in which case the gesturing would have to follow what is being said. I've tried that (with Kinect) and it's not so easy.
By Ibis Fernandez - 9 Years Ago
I've been experimenting. Been noticing that the mocap data somehow becomes out of sync with the audio even when the performances are captured at the same time. It seems the mocap is not recorded in real time. instead the motions are added in as they come in. So capturing 1minute of audio and 1 minute of mocap, at the same time, seems to yeild 1 min of audio and about 1.25 mins of mocap. 

So while the movements are fairly accurate, there is some kind of time shifting involved. 


By Rampa - 9 Years Ago
Maybe don't stream it to iClone? Just capture with the Noitom software and an audio capture running concurrently?

I would suspect that the BVH broadcast to iClone might be where the issue is.
By Ibis Fernandez - 9 Years Ago
No. This happens even when capturing directly to the axis neuron software. 
By Rampa - 9 Years Ago
Probably just a latency issue. For speech, make sure your recording mono. A lower sampling rate is fine too, as you don't need as wide a frequency response for speaking. Try 32 Khz, or even 22 Khz. Keep the bit depth at 16, as that effects your dynamic range (soft to loud).

It certainly is not really critical for most body mocap. It will be an issue when we get facial mocap though.
By animagic - 9 Years Ago
rampa (2/21/2016)
Probably just a latency issue. For speech, make sure your recording mono. A lower sampling rate is fine too, as you don't need as wide a frequency response for speaking. Try 32 Khz, or even 22 Khz. Keep the bit depth at 16, as that effects your dynamic range (soft to loud).

It certainly is not really critical for most body mocap. It will be an issue when we get facial mocap though.


From what I understand the latency is in the motion capture. I would think though that when played back it would be OK. Would it be possible to adjust the playback frame rate?
By Ibis Fernandez - 9 Years Ago
James in terms of making the shot lists and capturing just what you need in short bursts is provable a very sound approach to dealing with it for the most part.
Kinda frustrating for me cause im used to recording my scenes in master scene format which lends itself to more natural acting from the talent. But it is what it is hahaha.

Im also thinking to maybe experimenting with different sync markers somewhere in the middle and at the end of each capture in case of long sequences to allow for quicker resynching. i don't know ill give it a shot next chance i get. see how that works.