Where’s the Web in “Real Time Web”?

by mikepk on April 14, 2009

17945646_1eb71934b2_m.jpg

photo by jurvetson

The new hotness online for the technoscenti and early adopter crowd is “Real Time”. The recent release of the new beta interface for FriendFeed has intensified the conversation regarding the consumption of information “as it happens”. Its most breathless proponents regard it as the evolution of news and the web as a whole.

The “Real Time Web” (RTW), at least in concept, is something of a hybrid between live broadcast technology (one to many) and the ability to custom tailor and blend a number of broadcast sources to get just the information that you want. As soon as new information is available in the channel, you can consume it immediately in real time.

When asked for examples, many people point to Twitter and FriendFeed as the poster children of the RTW. The ‘subscription’ to individual’s status updates has led to a lot of interesting phenomena, such as Twitter breaking many major news stories much more rapidly than traditional news outlets.

But do they really represent the real time web? Clearly they are a form of near real time communication, but it is primarily updates of individuals only within a specific domain. Look closely at the frenetic flow of content that can appear on the new friendfeed beta and you’ll see that the “real time” portion is people messaging, sharing and conversing. The data that is presented in real time is FriendFeed’s user data, the data specifically controlled by FriendFeed itself (the same is true for Twitter).

I’m not trying to diminish the value in these real time communications, they have tremendous value in their own right. What’s exciting though, is that they also present a glimpse of what could be possible with rapid availability of information. In my opinion, what we are missing is the “web” part of the RTW. Twitter and FriendFeed are just the first step because they do not represent a true consolidation of online information in a real time channel.

Why make this distinction? Technologically, presenting information in real time to the user from within a controlled environment (within the walled gardens of a single site) is a significantly easier task than trying to move towards a real time presentation of the web as a whole. It is this latter concept of the RTW that interests me.

Already it seems that these two examples of real time are competing with each other on data availability. Twitter updates take a long time to appear in the friendfeed stream. This has been a loud complaint about FriendFeed, but it points to the problem of having real time cross site and application boundaries. It is not in Twitter or FriendFeed’s interests to make their data available in real time outside their sites.

The web is inherently distributed. This is one of its primary strengths. This has allowed the web to grow and scale in amazing and unpredictable ways. Synchronizing distributed systems is notoriously difficult though.

Although many have called for an open source, federated, version of Twitter, I think this lack of a true RTW is the primary stumbling block to its creation and deployment. Some federated synchronization mechanism needs to exist to allow islands of communication to update each other in near real time.

How do we get to the “Web” in “Real Time Web”?

I’ve only been partly thinking about this, so this post is partly to hear people’s research and ideas in this area. There are technologies that have been created that had similar objectives but that fell short for various reasons.

Ping networks, XMPP, update streams, FriendFeed’s SUP protocol, all try to improve update efficiencies but all seem to suffer from specific shortcomings. Whether it’s single points of failure, spam problems, or difficulty in implementation, there doesn’t seem to exist a real time updating silver bullet. I’d be interested to hear people’s experience with these technologies.

  • http://brampitoyo.com Bram Pitoyo

    “Ping networks, XMPP, update streams, FriendFeed’s SUP protocol, all try to improve update efficiencies but all seem to suffer from specific shortcomings.”

    What about using technologies like XML to deliver information?

    As someone who uses a Social Intelligence Dashboard to consume and filter a large number of content sources (of varying timescale: short ones like Twitter, and long ones like blog posts), RSS or ATOM seems to me like an easy and near-real time way to do it.

    But this carries two inherent limitation: 1) RSS, while universal, isn’t rich. 2) Pass these files through any third party filtering and splicing service (like Yahoo!Pipes, PostRank, etc.) and the latency increases.

  • http://mikepk.com mikepk

    Bram, XML is what we have now. Its primary problem (with regards to real time) is that agents have to continuously poll to find out if there's anything new to consume. It's inefficient and wasteful use of computing resources, especially if you intend to track more than a handful of sources. For any significant number of sources a polling paradigm just won't scale (and keep near-real time data). Imagine a long-tail of several thousand blogs that you want to filter for content but be able to get the data in near real time. Even Google, with it's massive resources, can only promise feed updates (for google reader) in about a 2 hour window. That's not good enough (IMHO) to be called real time.

  • http://www.leggetter.co.uk Phil Leggetter

    Ok, I'm a bit late responding to this article. I bookmarked it a while back and just got around to reading over it. I a software engineer working for a company who have components that facilitate the distribution of data across the web in real-time. I’m hopeful that we have the real-time web silver bullet. The section of our website that details streaming push server is here: http://www.caplin.com/caplinplatform/?curart_id=36

    The server is a highly scalable comet server (Liberator) that supports thousands of concurrent connections, manages connection problems, load-balancing, clustering (basically, it's highly scalable) and a whole host of other features that I think are exactly what the real-time web need. The really good news is that there is a free version available: http://www.freeliberator.com

    I'm presently trying to pushing to encourage people to use our components in areas other than the financial sector since the Liberator is an excellent piece of kit that could be put to so many other uses. It can be used to distribute real-time updates across the Internet to a number of clients (we have APIs for Java, .NET, JavaScript/Ajax). Obviously the standard clients are web pages, .NET Web Form apps and Java applications but this doesn’t have to be the case. A web page example that I really like isthe Twitter website; they’ve experienced numerous problems and even today I saw the fail whale due to high load. By using Liberator and SL4B (the JavaScript API) there would be less frantic hitting of F5 to reload the page putting strain on the Twitter servers, updates would simply be pushed to clients as soon as they become available. Liberator also caches content taking additional strain from the Twitter servers.

    I recently wrote a post about What is the Real-Time web which I think is relevant here: http://blog.caplin.com/2009/04/20/what-is-the-r…

    Has anybody tried out Free Liberator? Does anybody think that this could be the silver bullet for the real-time web? If no, what is it missing?

  • http://mikepk.com mikepk

    Phil, interesting tech I'll take a look. With the caveat that I haven't looked very deeply at it, in terms of infrastructure, I don't think liberator gives us the whole package. Liberator seems strongly suited for broadcast to endpoints of the content network. I could envision industrial strength comet servers for content distribution, but then an alternate internal distributed update mechanism I think could be done with more direct, application specific, protocols. What I'm exploring now is more of the idea of a RTW stack where the top level protocol is embedded in the content payload along with authorization. Basically make the updates and keys be transport agnostic. This gives you a lot of interesting abilities, like for not-so-real time updates you might use something like email as a transport for bulk messages. This would allow utilizing existing distributed infrastructure and addressing, and then possibly having member machines move the content along on whatever transport makes the most sense (XMPP, comet to the client etc…). It could also act as a strong bootstrap, rather than force the network to adapt to the new requirements, utilize existing infrastructure even if it's not perfect. It's just the start of an idea so I'm still exploring the ramifications.

  • http://www.leggetter.co.uk Phil Leggetter

    Ok, I'm a bit late responding to this article. I bookmarked it a while back and just got around to reading over it. I a software engineer working for a company who have components that facilitate the distribution of data across the web in real-time. I’m hopeful that we have the real-time web silver bullet. The section of our website that details streaming push server is here: http://www.caplin.com/caplinplatform/?curart_id=36

    The server is a highly scalable comet server (Liberator) that supports thousands of concurrent connections, manages connection problems, load-balancing, clustering (basically, it's highly scalable) and a whole host of other features that I think are exactly what the real-time web need. The really good news is that there is a free version available: http://www.freeliberator.com

    I'm presently trying to pushing to encourage people to use our components in areas other than the financial sector since the Liberator is an excellent piece of kit that could be put to so many other uses. It can be used to distribute real-time updates across the Internet to a number of clients (we have APIs for Java, .NET, JavaScript/Ajax). Obviously the standard clients are web pages, .NET Web Form apps and Java applications but this doesn’t have to be the case. A web page example that I really like isthe Twitter website; they’ve experienced numerous problems and even today I saw the fail whale due to high load. By using Liberator and SL4B (the JavaScript API) there would be less frantic hitting of F5 to reload the page putting strain on the Twitter servers, updates would simply be pushed to clients as soon as they become available. Liberator also caches content taking additional strain from the Twitter servers.

    I recently wrote a post about What is the Real-Time web which I think is relevant here: http://blog.caplin.com/2009/04/20/what-is-the-r…

    Has anybody tried out Free Liberator? Does anybody think that this could be the silver bullet for the real-time web? If no, what is it missing?

  • http://mikepk.com mikepk

    Phil, interesting tech I'll take a look. With the caveat that I haven't looked very deeply at it, in terms of infrastructure, I don't think liberator gives us the whole package. Liberator seems strongly suited for broadcast to endpoints of the content network. I could envision industrial strength comet servers for content distribution, but then an alternate internal distributed update mechanism I think could be done with more direct, application specific, protocols. What I'm exploring now is more of the idea of a RTW stack where the top level protocol is embedded in the content payload along with authorization. Basically make the updates and keys be transport agnostic. This gives you a lot of interesting abilities, like for not-so-real time updates you might use something like email as a transport for bulk messages. This would allow utilizing existing distributed infrastructure and addressing, and then possibly having member machines move the content along on whatever transport makes the most sense (XMPP, comet to the client etc…). It could also act as a strong bootstrap, rather than force the network to adapt to the new requirements, utilize existing infrastructure even if it's not perfect. It's just the start of an idea so I'm still exploring the ramifications.

Previous post:

Next post: