Profile Picture

Acculips - speech to text engine needs a lot of improvement

Posted By Darren01 Last Year
You don't have permission to rate!
1
2

Acculips - speech to text engine needs a lot of improvement

Author
Message
Darren01
Darren01
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)

Group: Forum Members
Last Active: 4 Months Ago
Posts: 654, Visits: 4.1K
I like the Acculips feature, but I'm finding it's feature for converting speech to text is fairly bad.
I have a song that I want my character to sing with lips movements to match the vocals of the song.
First I used software to separate the vocals from the music. The separation was actually really good and clear.
I loaded that vocals only track into Acculips and it failed miserably at converting the speech to text. Just a bunch of garbled words that didn't even resemble the words.
It would be impractical to edit every word with the correct timing.

So I recorded myself clearly speaking and recording the vocals in time with the song (not singing, just speaking the words clearly) while the song played into my headphones.
Once again I uploaded the vocals only track to Acculips. The conversion to text was ever so slightly improved over the last attempt, but still terrible, full of words that weren't even words.

Just to make sure it wasn't my voice or microphone that wasn't clear, I uploaded the both vocal files to an online speech to text converter. And the conversion was almost perfect.
Suggesting that it's the Acculips speech to text engine that is the problem.

Also I'm not sure if there is a limit to how much speech Acculips will convert, but it only seemed to convert maybe half of the vocal file.
And in the iClone instruction manual for Acculips, it shows an icon which allows you to jump to the current word that is being played. That icon no longer exists, and would be a handy feature to have back.

What I'd like to see is an overhaul of Acculips. Update the speech to text engine to something more up to date, and ensure all features are working (as per the iClone user manual)









_________________________________________________________




animagic
animagic
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)Distinguished Member (33.4K reputation)

Group: Forum Members
Last Active: 3 days ago
Posts: 15.8K, Visits: 31.4K
I don't have the link at hand but there is a pretty good online tool that does a decent conversion. You have found one also. I have tried several and only one did a reasonable job.

Speaker-independent speech recognition is a difficult problem, so I don't think it's fair to hold that against AccuLips. I'm sure a singing voice is even harder.

I always provide my own text file, which is were reliable lip-synching starts. I and others have asked for years that that feature would be added (as the old lip-synching gave horrible results) and I'm glad that it has been added for AccuLips.

Overall I'm quite satisfied with AccuLips. There is/was a bug (reported in FT), where AccuLips only worked correctly for a project framerate of 60 fps.


https://forum.reallusion.com/uploads/images/436b0ffd-1242-44d6-a876-d631.jpg

Darren01
Darren01
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)

Group: Forum Members
Last Active: 4 Months Ago
Posts: 654, Visits: 4.1K
If it was just a matter of cutting and pasting lyrics, I could have pasted the lyrics as text on Acculips and had it convert the lip synch.
But problem is, it would be completely out of time with the song. The entire reason I wanted Acculips to convert it, was so it was timed correctly in iClone.

I first tried with the isolated vocal track on it's own, which didn't work, so then I recorded my own speech, in time with the music, while the song was playing on my headphones.
Even then, it couldn't get it even remotely right. And I didn't sing the song, I spoke the lyrics to song as clearly as possible.

The only reason I tried the speech to text online converter, was to check if the recording I had was of sufficient clarity for speech to text software to work. And it proved that it was.
So Acculips should have also been able to do it. But obviously it has a poor speech to text engine, even compared to free online sottware.




_________________________________________________________




4u2ges
4u2ges
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)

Group: Forum Members
Last Active: 2 days ago
Posts: 5.3K, Visits: 16.8K
I agree, transcribing audio in iClone might result in highly inaccurate text.
But my guess is that online STT transcribes use a powerful back end AI models containing a lot of data and might not suit for a standalone iClone.
They filter voice track by default  and can generate karaoke style SRT. Not perfect though and also require some degree of online editing.

And generally yes, Acculips needs to have some upgrades.
I would love to see a more powefull editor - all in one place (words and visemes).
Working with visemes separately on timeline is a nightmare given the size of viseme blocks which gives me eye strain.




4u2ges
4u2ges
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)

Group: Forum Members
Last Active: 2 days ago
Posts: 5.3K, Visits: 16.8K
I honestly was not aware that Acculips supports SRT import!
That was a pleasant surprise.
So here, I created an SRT on media.io (some online editing in their studio was involved).
And I did not do anything at all in iClone with Acculips. Just imported audio and then raw SRT import and a quick face puppet..



studio view
https://forum.reallusion.com/uploads/images/66d80a81-9b9b-46b7-8021-aafa.jpg






AutoDidact
AutoDidact
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)

Group: Forum Members
Last Active: 2 Months Ago
Posts: 2.1K, Visits: 13.6K
@4u2ges that is excellent!!!:D


RAG DOLL COLLISION ANIMATIONS FOR ICLONE 8 & 7
---------------------------------------------------------------------------------------------------------------------
Ghost Origins
My latest Feature length film created with Iclone.
https://forum.reallusion.com/uploads/images/adf9b210-df59-4cb6-aa1b-9de5.jpg
My Sci- Fi Graphic Novel on Amazon: https://a.co/d/9k3cwoY


4u2ges
4u2ges
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)

Group: Forum Members
Last Active: 2 days ago
Posts: 5.3K, Visits: 16.8K
Thanks AutoDidact :)

A small update.
The SRT is not actually supported as I thought it would... sigh..
You may import this format, but it's no different than a text.
I initially thought the SRT timing was kept somewhere in the background because the aligning was perfect.
But after trying plain lyrics text, I learned it's no different. So the work in the studio I did was practically useless :Whistling:

But nonetheless, it aligned almost perfectly (I only had to slide a couple of words in editor) because I was using filtered voice track only.
After I Aligned and Applied the track, I muted it and load the actual soundtrack on the Project level (you may see it in the video).

Now, I only tried one song. You may try your song and see if it's any different.





AutoDidact
AutoDidact
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)Distinguished Member (6.1K reputation)

Group: Forum Members
Last Active: 2 Months Ago
Posts: 2.1K, Visits: 13.6K
I used an online AI to separate the vocals from the music for this animation in iclone 7 several months ago
Not great, but sort of works for this “sultry,suductive” type of performance where she is smiling while singing 
 

There is a little known utility called
“autolipsync-o-tizer” 
it creates .dat lipsync files for apps like MOHO
but it actually has a decent audio to text engine
not as accurate as these new AI systems and will need some editing.  
Kellytoons had a python script to import .Dat files but he sort of abandoned it after Acculips was added to iclone.



RAG DOLL COLLISION ANIMATIONS FOR ICLONE 8 & 7
---------------------------------------------------------------------------------------------------------------------
Ghost Origins
My latest Feature length film created with Iclone.
https://forum.reallusion.com/uploads/images/adf9b210-df59-4cb6-aa1b-9de5.jpg
My Sci- Fi Graphic Novel on Amazon: https://a.co/d/9k3cwoY


4u2ges
4u2ges
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)Distinguished Member (22.3K reputation)

Group: Forum Members
Last Active: 2 days ago
Posts: 5.3K, Visits: 16.8K
I used an online AI to separate the vocals from the music for this animation in iclone 7 several months ago

Yes, that is what I also did. I guess it did the trick though in proper alignment.
A "true" SRT support (using built-in time intervals) would have been even better (at least as an alternative to an average built-in Speech-to-Text)




Darren01
Darren01
Posted Last Year
View Quick Profile
Distinguished Member

Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)Distinguished Member (6.5K reputation)

Group: Forum Members
Last Active: 4 Months Ago
Posts: 654, Visits: 4.1K
4u2ges (9/8/2024)
Thanks AutoDidact :)

A small update.
The SRT is not actually supported as I thought it would... sigh..
You may import this format, but it's no different than a text.
I initially thought the SRT timing was kept somewhere in the background because the aligning was perfect.
But after trying plain lyrics text, I learned it's no different. So the work in the studio I did was practically useless :Whistling:

But nonetheless, it aligned almost perfectly (I only had to slide a couple of words in editor) because I was using filtered voice track only.
After I Aligned and Applied the track, I muted it and load the actual soundtrack on the Project level (you may see it in the video).

Now, I only tried one song. You may try your song and see if it's any different.



Yeah if I could find a external link or software that would convert to text with the correct timing that could be imported into Acculips, I'd be happy with that.
But Acculips doesn't support it.
The song I'm trying to get lip synch to in iClone has several pauses and sustained vocals, so it would be an extremely difficult task to manually add each word with the correct timing.

Also the 'snapping to specific words' button is missing on mine. It shows it in the iClone manual but the button isn't there on my panel.






_________________________________________________________





1
2



Reading This Topic