Well, the high level view is you just do it -- it's kind of (sort of) how I do all my animation with iClone (I record the body and facial mocap separately body mocap done with the PN, facial mocap done inside of iClone with either Faceware or Live Face, then bring them in to iClone and then export by frame to my video editor, where I bring in the audio for the scene).
Other than audio it's pretty easy to eyeball the body and facial mocap so it looks right -- with the body mocap I capture a T-pose at the start of the click track, which I can then use to line up (and end up cutting this pre-roll out in my video editor). In the editor it's a bit trickier to get the sync right with the facial mocap but because I have captured the track inside of iClone for the tongue movements I can sync to this track. Without it you can also eyeball it (just like we used to do for all lipsync -- the eyes are VERY good at getting sync right).
So that's a high level view, but I expect what you really want is a LOW level view of exactly how to bring that data through. For that you'd need 3DX to bring the FBX body cap (translate to RLmotion files). Not sure how you'd use the Faceware if you don't have the Faceware for iClone, but perhaps others can comment on that.
Alienware Aurora R16, Win 11, i9-149000KF, 3.20GHz CPU, 64GB RAM, RTX 4090 (24GB), Samsung 870 Pro 8TB, Gen3 MVNe M-2 SSD, 4TBx2, 39" Alienware Widescreen Monitor
Mike "ex-genius" Kelley