Monday, March 12, 2007

Yahoo! Pipes and The Web As Database

Written by Alex Iskold / February 13, 2007 / 19 comments

http://www.readwriteweb.com/archives/yahoo_pipes_web_database.php

Written by Alex Iskold and edited by Richard MacManus. In this post Alex tests out and explores the emergent world of Yahoo! Pipes. He sees some interesting parallels with Relational Databases in the 90's, concluding that with pipes, the Web essentially becomes a giant database that can be queried and remixed in any number of ways.

One of the central concepts in Complex Systems is Emergence. It is this automagical process through which elements of a system give rise to a higher order system. Emergence is how physics becomes chemistry and chemistry becomes biology. It is how web 1.0 evolved into web 2.0, and how that, in turn, will become the next web.

While the exact mechanics of emergence is complicated and far from being completely understood, scientists know that a new system emerges as a combination of its elements and their interactions. In other words, complex systems are really networks - where elements interact with each other and give rise to a new system.

Perhaps today we are witnessing one of the most vivid examples of emergence - the remixing of the world wide web. The parts of the new web have crystallized - blogs, photos, video, audio, maps, RSS, social network profiles and even plain old HTML pages have formed an impressive network, that now can be mined and remixed. Mashups are really nothing new, the web has been a programmable oyster for at least a few years now. 

What is new though is the recent systematic thinking about the web as a database. A few companies, including Dapper, have been working on the problem. But with the recent launch of Yahoo! pipes, we are beginning to see the real power of remixing.

Ye Olde Relational Databases

The Web is just a vast database of information. Everyday, we interact with it without thinking about that too much. We simply take our best query tool, usually called Google, and fire away. Yet decades before the web made its way into our lives, a different kind of database revolutionized our lives. The Relational Database qualifies as one of our best computer science inventions. Lesser known to the non-techie crowd, it nowadays quietly stores terabytes of information behind most familiar ecommerce and corporate sites.


Microsoft Access Circa 1999

But Relational Databases are remarkably simple. They are collections of tables (structured data) that can be joined (mixed) together via keys to produce a new set of results. For example, the table of sales can be joined with the table of employees to produce a report of who sold what. By combining the tables in various ways, programmers are able to bring seemingly hidden information into the spotlight (think emergence). For example, by combining the sales information with employee records and their geographical locations, one can determine the best sales people in each country.

Another thing that Relational Databases are famous for is visual query and UI tools. Because databases are so simple, and the data is well structured, people have created GUI builders like Visual Basic or Power Builder to automate the UI for fetching and exploring the data. We got so good and so perfect at mapping the databases to the UI, that it's become quite a boring thing to do since about 1997. 

Well, now Yahoo! is making this whole business cool again, by changing the rules of the game - the Web is now the new database.

Yahoo! Pipes - Applying Old Wisdom to the Web


Yahoo! Pipes Circa 2007

Yahoo! Pipes is a remarkable offering that was announced last week. It is the first GUI builder for the biggest database in the world, the Web iself. When compared to Visual Basic and Power Builder, Yahoo! Pipes comes out as more inventive and no less rigorous that its predecessors. It empowers developers to remix the building blocks of the web in a whole new way. And it does it with remarkable simplicity.

In Yahoo! Pipes, what used to be a table in the relational database is now: a web page, an RSS feed, etc. The current list of sources includes: Yahoo! Search, Yahoo! Local, Fetch (RSS feeds), Google Base and Flickr. Each source can be searched or queried using either pre-defined or user-defined parameters. For example, there can be a search of all french restaurants in Chicago via Yahoo! Local. The data source and the searches can be mixed together (think emergence), using a reach set of operators. Among them is the iterator (which lets the user loop through the results), a counter and many other functions that facilitate cleaning, manipulating and recombining the information.

By bringing together many sources and operators, the user can build sophisticated queries that fetch interesting, non-obvious information from the web. For example, one can build a pipe that extracts the listings of all French restaurants in Chicago, along with their Flickr photos. Since the underlying data is virtually limitless and the set of operators is quite powerful, the number of interesting possible pipes is vast. And for this reason, unlike its predecessor the Relational Database, Yahoo pipes will never get boring.

Evolving Yahoo! Pipes

Yahoo! pipes are cool, but they have ways to evolve. The biggest issue is that, unlike in Relational Databases, the data is neither structured nor clean. For example, how can we ensure that Flickr pictures of restaurants in Chicago will be the right ones? We really cannot. The same problem will exist in all pipes, simply because the underlying data online is not as precise and polished as data usually is in a Relational Database. What are the consequences of this? Well, users currently forgive some imprecision in tags on Flickr and del.icio.us, yet they expect near perfect answers from Google. So having precise instruments to clean the data in the pipes would go a long way.

Another, very different, axis for the evolution of the pipes is to make them usable by a less technical crowd. As it stands right now, like Relational Databases, the pipes require a techie brain to be used efficiently. Yet, it seems like there is a possibility, particularly from the user interface and operator simplification point of view, to make this tool usable by moms and pops. But even if not, judging again from the Relational Database, getting wide adoption in the technical community would be just fine.

Conclusion

So what is the catch - why did Yahoo do it? The answer is the same old: search and ads. The majority of the current data sources are from Yahoo! and so that means Yahoo! will get the ad revenue when the pipes are run. So empowering thousands of enthusiastic techies to remix the web using Yahoo's data is a great idea.

Will this work? Will developers start using pipes? At the time of this writing there are over 5,000 pipes, which is an impressive number given that the application is not even a week old. But we should check in a month or so to see how things unfold. Certainly the key to its success will be polishing the UI and adding new operators and data sources. Since Yahoo! is known for its good design and focus on the user experience, it is likely that we will see the pipes improving in that regard over time. 

Please give the pipes a try if you have not done so yet, and let us know what you think is going to happen to it over time.

No comments: