A Peek Under the Hood: How OpenTok & WebRTC Make Nice

Today we released an early-access build of OpenTok in our labs which leverages a brand new controller stack along with WebRTC support for media transport. This is important for two main reasons.

  • First, our early access build fully supports an OpenTok peer-peer session using WebRTC under the covers This demonstrates an important principle we strive to provide—a consistent programming interface for application developers where the platform chooses the best underlying transport possible.
  • The second reason is the labs version of OpenTok on WebRTC demonstrates a fully non-Flash, HTML5 version of OpenTok.

With our iOS SDK, the JavaScript SDK and now the Labs version of OpenTok WebRTC, we are happy to be making progress towards our vision: to enable application developers to concentrate on what is important—building rich, compelling and fun applications that talk to each other.

While the OpenTok Javascript library has traditionally used Flash underneath the covers, our engineering team has been actively working to evolve the OpenTok API to a post-Flash HTML5 world. WebRTC is one of the many pieces of the puzzle which form this vision of the future. For those of you not following the action, the W3C committee is making real inroads into defining a viable in-browser communication stack (without the need for plugins) via the WebRTC standard. The effort has garnered significant support from industry browser vendors. Google’s public release of Chrome 21 last week heralded an important milestone towards that end with support for getUserMedia API which enables web applications to capture the camera and microphone within the browser. In addition, important WebRTC constructs like PeerConnection and MediaStream can be enabled manually by turning on the requisite flags (by navigating to chrome://flags).

We see a fragmented future for real-time communications. It’s a jungle out there, different media protocols (RTP, RTMP, RTMFP, etc), a world of differing video codecs (VP8, H264,etc), a slew of audio codecs (iSAC, iLBC, Speex, etc), differing messaging stacks (Jingle, SIP, etc) and OpenTok being able to provide the “smarts” to handle all these end-points seamlessly under the covers.

To evolve to this vision we made the architectural decision early on to decouple media transport from session negotiation and session-level messaging. We built out a custom distributed messaging fabric (code-named “Rumor”). Rumor is a low-latency, high-throughput raw messaging library built on raw sockets using our highly-optimized messaging protocol. Each rumor message has a custom message format with a maximum 64K binary payload. Rumor servers can be linked together in a mesh architecture to enable high-scale and enable out-of-band messaging and a very simple but high-performance publish-subscribe system.

In the v0.91 version of the OpenTok API, all messaging within a session is controlled by a Flash based “Controller” SWF. With the latest version of the OpenTok labs SDK, we are testing out a pure JavaScript HTML5 controller which uses WebSockets to talk to Rumor. Rumor enables us to completely decouple session negotiation/signaling from other moving parts of our API. This enables the OpenTok platform to treat session level eventing distinctly from media transport and thus solve interoperability more easily.

If you are curious, the WebRTC forums have spirited discussions about which protocols (if any) should be defined for out-of-band messaging in the WebRTC world. WebRTC is largely designed to be agnostic to the messaging protocol itself and we believe this is the right direction too. We can now use any out of band messaging protocol. For instance our JavaScript HTML5 controller using the rumor protocol is about 6X faster with respect to Session connection times (time between calling Session.connect() and receiving the onSessionConnected event) as compared to a Flash based controller under controlled lab tests.

We believe as an application developer you shouldn’t have to worry about underlying technologies and let the OpenTok platform do the heavy-lifting for you. Our goal is to be able to provide application developers with simple consistent, and powerful API primitives to build applications which enable face to face video, while the OpenTok API and underlying cloud infrastructure is smart and chooses the most optimal technology stack to deliver Audio-Video streams. To this end, we are pleased to announce today a labs version of OpenTok on WebRTC API which provides a sneak preview of the sorts of technologies we are working on.

Try it out at labs.opentok.com. We would love to hear your feedback.