I think you may be confusing the Motion track with the Auto-Motion track. With the Motion track you can break and delete parts without affecting your Viseme (vocal) track. Auto-Motion on the other hand is tied to the Viseme track because the motion is automatically generated by the audio data. In this situation removing the Auto-Motion will remove the Viseme track.
You can avoid this by recording lip-sync only and then adding your motion data manually using puppeteering and other pre-set motion files. Auto-Motion is there for uses who want to quickly generate facial animation without any hassle but it doesn't have to be used. You can have complete control if you wish by using the motion track instead of the auto-motion track.
For the Viseme track this too can be edited (Pro only) by breaking and moving the parts around. However it isn't possible to edit and change the text to speech once it has been generated.
I hope this helps explain the situation.