I've stumbled upon WebRTC that alows a fairly simple "semi-p2p" media connection between browsers. That's awesome for streaming or video-conferencing, but it can't do remote desktop due to browser's lack of ability to control mouse and keyboard. You need native code for that. Also getting audio from ASIO drivers into a browser is impossible on it's own.
But then I realized, wait a minute. A plugin is a bit of native code. It might be able to communicate with javascript running inside of a browser through localhost loop. The plugin might pass audio to the JS app through localhost and recieve commands from JS app through it too. And as it's a native bit of code, it might communicate with OS to execute those mouse movements and key strokes. Heck, it can even pass midi events.
That javascript app doesn't have to be in the browser at all. If we embedd a chromium instance into the plugin, it can actually call and run the JS app from the plugin itself. (But leaving it externally in Chrome would have advantages too. Smaller load in the DAW.)
On the side of your collaborator, he would run a JS client in his browser with a fairly straight forward video decoding WebRTC applet and a subrutine that would register his mouse clicks and keyboard strokes, it would feed those commands into the data channel of WebRTC and send it to the hosting site for aforementioned execution.
Here's a VERY LAME image I've just sketched up to help visualize this crazy idea:

So? How bonkers am I?
