Profile Picture

Lip Sync Improvement...Any Tips or Suggestions

Posted By TopOneTone 6 Years Ago
You don't have permission to rate!
Author
Message
animagic
animagic
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)

Group: Forum Members
Last Active: 10 hours ago
Posts: 15.7K, Visits: 30.5K
It is instructive to do some lip-synching with FaceWare (if you have it), because then you'll indeed notice that a speaker's mouth moves far less than you would get when using iClone's lip-synching.

For some reason iClone's lip-synching has hardly evolved since it was first introduced (in iClone 2, I believe) and still shows inaccuracies that aren't really necessary if some work was put into it. I have given up pointing out the obvious to RL. There is an FT entry to improve lip-sync, but there haven't been that many takers. As has already been mentioned, most viseme assignments are incorrect and there are also too many. There is no reason to have visemes if there is no speech, and all those stray visemes give a cluttered impression on the timeline. RL has introduced some quick and dirty "fixes", such as smoothing, which help somewhat, but things should be better as not everyone can afford FaceWare.

3DTest (who did the FaceWare tutorials) has spent time analyzing lip-synching issues. One interesting find is that if you use a TTS voice in CrazyTalk the viseme assignments are correct, so it is possible to get it right. My assumption is that the text used to generate the speech in TTS is also used to assign the visemes. Text support for viseme assignment has been used in other lip-synching solutions. The one I'm aware of is Mimic (now included in DAZ), which I used for Poser many years ago. So improvement is attainable, it just doesn't seem to be a high priority for RL.



https://forum.reallusion.com/uploads/images/436b0ffd-1242-44d6-a876-d631.jpg

Edited
6 Years Ago by animagic
Kelleytoons
Kelleytoons
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)Distinguished Member (35.6K reputation)

Group: Forum Members
Last Active: Yesterday
Posts: 9.1K, Visits: 21.8K
Ani is correct on all points here.  I should point out that my own solution (of using Papagayo) also depends on a text read (it uses both audio and text -- the text assigns the correct phonemes, which are then aligned to the audio.  That is why it is a perfect solution -- also moving them around is easy-peasy and even very long 10-15 minute sequences can be done in just a few minutes of work).

I'm still a *tiny* bit hopeful that one day the Python interface will expose the viseme track so that we can do the same thing in iClone, although the hope I'll be alive to see it has mostly faded.



Alienware Aurora R12, Win 10, i9-119000KF, 3.5GHz CPU, 128GB RAM, RTX 3090 (24GB), Samsung 960 Pro 4TB M-2 SSD, TB+ Disk space
Mike "ex-genius" Kelley
Delerna
Delerna
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)

Group: Forum Members
Last Active: 2 Years Ago
Posts: 1.5K, Visits: 14.8K
Tony.
I have shown off this video to some of my people at work and there were lots of enjoyment grinning. I have sent the UTUBE URL to them. If they make any comments to me I will share them with you.

i7-3770 3.4GHz CPU 16 GB Ram   
GeForce GTX1080 TI 11GB
Windows 10 Pro 64bit
TopOneTone
TopOneTone
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)

Group: Forum Members
Last Active: Last Year
Posts: 329, Visits: 3.2K
Thanks Graham, really appreciate passing the link around.
I can confirm what Kellytoons has said about 30 vs 60 fps, I rendered one episode at 60 fps and there was no noticeable improvement in the lip- sync. I have experimented with faceware, but I'm still having difficulty getting consistency. While it added more expressive facial movement it really messed up some of the mouth movement as I was trying to blend it into existing viseme. I'll keep trying, I'm sure it will get better with practice.
 I have also thought about using TTS to generate the lip sync and then replace the soundtrack with the real voice recording, though I suspect this will be also require a lot of adjustment.


Its great to see that despite the fact lip sync doesn't get a lot of discussion on the forum that it is an area that people have issues with and would like to see improvement, so thanks to everyone who has contributed to keeping this topic running. 
Cheers,
Tony   
K00L Breeze
K00L Breeze
Posted 6 Years Ago
View Quick Profile
New Member

New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)New Member (10 reputation)

Group: Banned Members
Last Active: 6 Years Ago
Posts: 8, Visits: 65
If you use face ware, analyzer, retargeter.... you don't have your problem..
It's very accurate.
pruiz
pruiz
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)Distinguished Member (1.7K reputation)

Group: Forum Members
Last Active: 4 Years Ago
Posts: 101, Visits: 1.8K
Not trying to be cute or controversial or stupid but do you mean 'plosive' (versus explosive) or am I misunderstanding. Which I am wont to do with every day of aging.( Charles de Gaulle called old age 'nauffrage - which is hard to translate exactly - but which means MoL shipwreck.  Maybe we are talking an explosive situation in an animation? 
TopOneTone
TopOneTone
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)Distinguished Member (3.6K reputation)

Group: Forum Members
Last Active: Last Year
Posts: 329, Visits: 3.2K

I created this test to compare TTS vs audio to drive the lip sync - typed in the script for the left face and used an audio recording for the right face. Then I realigned as best I could the TTS viseme track to match the audio viseme track, given that the pace was obviously different and less even in the live audio recording. After rendering, I replaced both audio tracks with the original audio file. There is a 2nd take in the video which merely has the right face mouth opened up more than in the first take, but it had little impact.

The first thing I noticed was that the blink rate changed between the faces - no idea why?
More importantly, to me it appears that the left face looks smoother and more consistent and makes me believe that the left face is actually talking. Without making any changes to the phonemes on either face, it would appear that TTS delivers a more accurate lip sync. I know its a one off rough test, but I wasn't expecting as good a result as I got from TTS, so I'm going to keep trying this approach to see whether its worth the inconvenience of typing a whole script.

I did want to try the same comparison using Faceware and the pre-recorded sound track, but I couldn't work out how to get it into the plug-in other than using the "what u hear" option which did not work very well at all. Can anyone tell me how to import a sound file into the Faceware Iclone plug-in?

Cheers,
Tony    


            
Delerna
Delerna
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)

Group: Forum Members
Last Active: 2 Years Ago
Posts: 1.5K, Visits: 14.8K
I have tried doing lip syncing through TTS. I don't know if this is what you tried. Just saying it here to assist.
I find when using TTS that spelling the words as they sound is usually better that spelling the words properly, can't think of the name for that process.

One more thing. I can't remember if this was mentioned by the people who recommended using audacity for the speaking to do the lip syncing or its something I have thought about?
If you wear headphones and listen to the actual speaking while you record yourself speaking into a microphone then you can get the timing between the actual speaking and your speaking pretty even.
Also your speaking needs to be done in a way to improve lip syncing rather than speaking well. So speaking similar as I mentioned for spelling words for TTS. The primary thing here, I think, is its easier to get the timing right. Although I can only speak with theory because I haven't tried this yet myself.


Maybe the others who have done it can give some tips on it?


i7-3770 3.4GHz CPU 16 GB Ram   
GeForce GTX1080 TI 11GB
Windows 10 Pro 64bit
Delerna
Delerna
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)Distinguished Member (8.2K reputation)

Group: Forum Members
Last Active: 2 Years Ago
Posts: 1.5K, Visits: 14.8K
Just googled the spelling of words for sound.  Phonetic Spelling

i7-3770 3.4GHz CPU 16 GB Ram   
GeForce GTX1080 TI 11GB
Windows 10 Pro 64bit
animagic
animagic
Posted 6 Years Ago
View Quick Profile
Distinguished Member

Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)Distinguished Member (32.5K reputation)

Group: Forum Members
Last Active: 10 hours ago
Posts: 15.7K, Visits: 30.5K
I have to amend what I stated earlier: you now also get correct visemes when using TTS in iClone. That is an "improvement" since IC6. For that version you needed to use CT8 to get correct visemes, which was an extra step.

I haven't really tried to do this myself to see if it's possible to come up with a feasible procedure.



https://forum.reallusion.com/uploads/images/436b0ffd-1242-44d6-a876-d631.jpg




Reading This Topic