I guess it will be a bit more work if your goal is to get the results back into iClone, but it should be possible with what commands are available in iClone's Python API.
I'm currently just working with English dialogue, but probably will have to deal with translations later.
To generate phonemes I have just followed the instructions on their website: Examples — Montreal Forced Aligner 2.0.0a22 documentation (montreal-forced-aligner.readthedocs.io)- I have my dialogue organized in folders for each character
- I have for example LINE1.wav, and a corresponding LINE1.lab (this is just a text file with the dialogue written out, it's the same as the .txt file you can make for iClone).
- I run the command to generate the phonemes as shown in the example (but pointed at the directory where all my files are instead of the example dataset).
Now in the output directory, you have all these .TextGrid files with the phonemes and at what time they happen.
You would have to parse this file to extract the phonemes, and use it to place the visemes on the timeline.
In the iClone install folder there is a file "iClone 7\Resource\ICTextToSpeech\Dictionary\en.PhonemeVisemeMapping" where you can read which phonemes get turned into which visemes, for example both the "F" and "V" phonemes get turned into the "F_V" viseme.