Lips Editor... is this suppose to be phonics based?

Author

Message

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

When I let CT auto lip sync the audio file, it was way out and looked really unnatural, so I deleted all the edits... not simple... and then started from scratch.
So, once I'd discovered how to get the, Lip Editor up and started working my way through the song, I noticed that there was something missing and that's when it got me thinking how the CT creators had worked this editor out. I thought it was phonic based, but as a ex Primary Teacher, it isn't anything I was familiar with. Some of the options were obvious, but others not so and when you started to put these together, you got some really strange expressions!

Here is an idea... that may help... according to the Oxford Dictionary... the 100 words below, make up about 50% of words used daily, but more importantly, they form a good phonic base... Could one of the team create a bank of words and have them as an option to select? Each word will obvious take several, Lip Editions, to form the word, but I think this would not only help users of the program, but help them to develop an understand of how the letters and words are formed. It's an idea... I think most examples I've seen, seem to use one option from the, Lip Editor, for each word, or phrase.

Rank	Word
1	the
2	be
3	to
4	of
5	and
6	a
7	in
8	that
9	have
10	I
11	it
12	for
13	not
14	on
15	with
16	he
17	as
18	you
19	do
20	at

Rank	Word
21	this
22	but
23	his
24	by
25	from
26	they
27	we
28	say
29	her
30	she
31	or
32	an
33	will
34	my
35	one
36	all
37	would
38	there
39	their
40	what

Rank	Word
41	so
42	up
43	out
44	if
45	about
46	who
47	get
48	which
49	go
50	me
51	when
52	make
53	can
54	like
55	time
56	no
57	just
58	him
59	know
60	take

Rank	Word
61	people
62	into
63	year
64	your
65	good
66	some
67	could
68	them
69	see
70	other
71	than
72	then
73	now
74	look
75	only
76	come
77	its
78	over
79	think
80	also

Rank	Word
81	back
82	after
83	use
84	two
85	how
86	our
87	work
88	first
89	well
90	way
91	even
92	new
93	want
94	because
95	any
96	these
97	give
98	day
99	most
100	us

I would like to hear what others think of this idea and any other ideas to help?

It seems one of those things you see all the time... lip syncing... just not quiet there!?

Peter (RL)

Peter (RL)
Posted 8 Years Ago

Distinguished Member

Group: Administrators
Last Active: Yesterday
Posts: 23.1K, Visits: 36.6K

Thank you for the feedback. The quality of the lip-sync will be based on the quality of the source audio. Any audio that has background noise, music or other sounds will cause the lip-sync to be affected so it is always wise to check the quality of the source audio. This should help greatly.

For your suggestions, please be sure to submit any requests via the Feedback Tracker. This way the development teams can review and keep you updated on any suggestions that have been put forward. You can find the Feedback tracker below.

http://www.reallusion.com/FeedBackTracker/

Peter
Forum Administrator

www.reallusion.com

Kelleytoons

Kelleytoons
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 4 days ago
Posts: 9.2K, Visits: 22.1K

The other thing most lip sync programs offer is the ability to type the text that is being said so that it can also be matched up -- this can improve the accuracy of sync 1000%.

I'm not much for putting in suggestions, though -- my experience has been they seldom make any difference unless you're a beta (and I'm a beta on a LOT of software and can see the results of my suggestions a lot better there).

For the OP -- you might want to hold out for the promised land (that is to say, SOMEDAY there may actually be facial mocap, maybe even in our lifetime. This should improve things a LOT).

Alienware Aurora R16, Win 11, i9-149000KF, 3.20GHz CPU, 64GB RAM, RTX 4090 (24GB), Samsung 870 Pro 8TB, Gen3 MVNe M-2 SSD, 4TBx2, 39" Alienware Widescreen Monitor
Mike "ex-genius" Kelley

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi SW00000p... sorry I don't know your name!

Thanks for taking the time to reply and offering such good advice.

I did try to do as you suggest, although the concept is certainly sound, doing this with CT 8 Pro, isn't that easy. Could you produce a quick tutorial? You being a tutor, I'm sure others would benefit from this and I'm certain I would!? Just something simple like, Hello! My name is Finn, or using a simple audio file to demonstrate the best way to use it and produce an acceptable finished piece?

The track I used was a single track. I've added a clip... 1MB zip... if you want to check it out and any advice about how to improve quality would be gratefully received!?

Can I ask you a few questions regarding you reply please:

1. You mention a PROPER audio file... can you please clarify what would be best? I did try aif and mp3?

2. How would you say is the best way to use, Reallusion's Speech Engine as there seems to be different ways possible?

3. What is the best method to refine the result, if needed? I did try expanding the Timeline, but could get in enough to see each segment?

4. This seems a bit laborious for each word... I did try, but found it impossible to move by keyframes, or know what keyframe I was on!?
Second Past:
3. At frame one.... Hit EVERY vowel.
Set a Key (For Each Vowel)......"3 KeyFrames BEFORE THE ACTUAL SOUND" (with the proper strength)
Note:
Now you focus on each vowel.

Third Past:
3. Before and After... each vowel.... Blend the proper Consonant.
_________________________________________________________

could you please demonstrate?

Hopefully, with your help, I'm sure I will be able to use CT to produce good speech!?

Many thanks for your help!

Audio 1.mp3.zip (178 views, 1.00 MB)

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi again!

I'm not 100% sure what you mean, or how I do this within CT, so could you give a tutorial... please?

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi Kelleytoons!

Thanks for taking the time to reply! It seems that you have experience with lip sync programs!? Typing in the words would certainly help, so can I do this with CT?

Is CT the best way to go with my attempt at getting my dog talking, or is there some other way I should go? Other than an operation! :)

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi Again!

The graph; RL's Phoneme Chart, you've shown is awesome and ideal for what I'm looking for... so how do I access this in CT please?

The Lip Sync Editor I've been using in CT is obviously not the right place!

You agreed 100% with, Kelleytoons about typing in text... so how do I do this please?

Kelleytoons

Kelleytoons
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 4 days ago
Posts: 9.2K, Visits: 22.1K

dickymac (10/26/2016)

I do a lot of animation with many other programs but for lip sync work I mostly use Anime Studio (strictly 2D animation). However, let me see if I can help you with CrazyTalk.

What you have to understand is that CT is the very low end of the spectrum in terms of "professional" versus "amateur" software. It's really strictly designed so that folks like yourself, who might want to have their dog talk, can do so without much (or any) understanding of the basics of animation. It doesn't have higher end tools because there aren't higher end users using it. It's also why Swoops advice is difficult to follow because he's advising you to use techniques that are way above your pay grade, no offense meant.

I don't think there are any easier programs to do what you want, although Anime Studio (the lower end that isn't very expensive) might be worth looking at, as recently they added some very good image manipulation that would do it. But it isn't nearly the canned solution that CT is. The problem, as you've found, is that a canned solution that gives you a "one-size-fits-all" isn't a very good fit for anyone, really.

Okay, so here's (hopefully) something that will help. What you really need is the ability to translate your text into those symbols that CT *does* support. I *think* you may well be able to use the freeware program Papagayo to do this (Google "Papagayo lip sync" to find it -- Mac or PC versions are available). Papagayo DOES allow you to type in your text and will sync this to an audio file. In my years as a professional animator I've found nothing better. The only problem (and it's real and I don't want to minimize it) is that the file it produces can't be easily imported into anything other than Anime Studio (where it does an amazing job -- as I said, I've found nothing better and I've paid thousands for programs to do this process alone. It's how we were able to do 22 minutes of mostly facial animation for our series each week).

The file it produces IS an ASCII file, though, so that will help. You will get a file that has the phonemes broken down, as well as the frames on which they exist. Reading this will help you work through what you need do in CT. Because Papagayo uses a standard ASCII dictionary for its phoneme breakdown, you can even edit THIS to show the symbols you want, so you could (in theory at least) edit it to reflect what is available in CT.

But the main reason I'm mentioning this is it will at least give you an inside look at the breakdown of each word without having to parse it yourself. Then you can try (manually, in CT) to align things. It won't be easy because, again, CT doesn't give you the right tools. But it might make it a bit easier.

(Or you can, as I said, wait for facial mocap -- this would be the ultimate solution to EVERYONE'S issues, and RL has said it will come with iClone 7, unless I was dreaming).

Alienware Aurora R16, Win 11, i9-149000KF, 3.20GHz CPU, 64GB RAM, RTX 4090 (24GB), Samsung 870 Pro 8TB, Gen3 MVNe M-2 SSD, 4TBx2, 39" Alienware Widescreen Monitor
Mike "ex-genius" Kelley

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi KelleyToons and thanks for the prompt reply!

No offence taken... thick skinned and I'm certainly no animator!

For the first Youtube i tried, I used an app on my iPhone called, Talking Pet and if you check out the video, https://www.youtube.com/watch?v=vVO6rAe_O2M you'll see that although nowhere near perfect, compared to the results in CT... it is much better! or that's what I think. I just wanted to add a bit more realism and though CT would help... not sure I'm right. I was hoping that I could get a good auto generated baseline to work from and it would be a fairly simple matter of adjusting a few thing!? I must say that working with the program... it doesn't follow many everyday conventions... but I suppose it is Windows based and I'm a solid user of Mac.

I've looked at your suggestion about Papagayo and it looks good at what it does... it adds a cartoon based mouth... and I want to animate the dogs mouth, so it looks like he is singing. Anime doesn't appear to give you this option either, so is there a program that does animate a given image, rather than adding a mouth on top of an image?

dickymac

dickymac
Posted 8 Years Ago

Distinguished Member

Group: Forum Members
Last Active: 8 Years Ago
Posts: 34, Visits: 63

Hi Swoop .... did you miss this post?

Lips Editor... is this suppose to be phonics based?

Lips Editor... is this suppose to be phonics based?

Reading This Topic