My family are very regular users of one of those communication systems. (I'm trying not to make it sound like an advert). Sometimes if the internet connection on the other persons system is too slow we get the scenario that you discribe. I've had conversations with my step brother in Brazil where I speak to him and he types back to me which can be quite weard at times. If you try to do away with the vidio and just use sound, less bandwidth is used on the internet connection which may help if the connection at either end is poor. Hope that makes sense.
p.s. Has the other user checked his/her settings?