Jan Schreiber - Blog - Software - Bookmarks - About

jan.bio

Pragmatic topic map streaming

I started this day quite innocent at 5am in the morning—until the word “sensor data” on linkeddata.deri.ie toggled the Robert Barta-switch in my head, and, after a short disscusion on the #topicmaps IRC channel, a chain reaction started. Here is the relevant core dump:

I’ve been thinking a lot about knowledge streaming lately. C-SPARQL PDF seems very interesting, and I started wondering how to implement some kind of streaming for Topic Maps. The general idea of streaming topic maps and topic map changes is not new. SDShare was presented at the TMRA 2008, NetworkedPlanet had its own update feed long before that; I remember looking closer at this feed in connection with automatic update of the GREP topic map (The Norwegian Curriculum is published as a topic map, in case you haven’t heard about this before).

The above mentioned protocols supply a feed of changes to a topic map and can be used to sync these changes with other topic maps. Pragmatics ahead.

I’m going to concentrate on a subset of this problem, the problem of knowledge aggregation. This means that I’m not directly interested in changes to a topic map, only newly added topic map constructs. Many popular services like twitter, facebook, delicious, flickrprovide APIs which again provide feeds of newly added information. I use a twitter client, an RSS feed reader, and other applications. Additionally, I like to store snippets of intersting web pages (I currently use DevonThink for this task). I’m trying to keep up with all those feeds on a daily basis, and this works mostly fine—until, sometimes months later, I try to find that information again. Have you once tried to find a tweed that you saw a couple of weeks ago?

The solution is obvious: Topic Maps can help me to aggregate that information, and, hopefully, make it easy to find it when I need it most.

Here is the idea: I store all information from the social services I use in a personal topic map (short: TM). To build this topic map, I run a client (“topic map stream reader”) that periodically checks all topic map stream feeds. If new items appear, it fetches the relevant topic map and merges it into my TM. For each service that I’m interested in, I create a simple wrapper that provides me with an ATOM feed. Each item of that feed is, you guessed it, a topic map. The trick is, that the items are topic maps, not topic map fragments. This allows me to make use of the most powerful and most frightening feature of ISO 13250-2: merging.

Note that I don’t say anything about the complexity of the generated topic maps. They can contain everything between only one association or a topic stub with just one occurrence and a more complex topic map with serveral topics and associations.

TM syndication illustration

Let’s take a tweet as an example. I can easily create a service that creates a topic map for a tweet. It could have a person or twitter account topic, maybe some associations for tweet-mentions-account, tweet-mentions-hashtag, some meta data such as the posting date and a subject locator to the tweet itself. Such a service would be easy to set up on a Google App Engine account. Then I can create a feed of all tweets of the people I follow with the twitter API, and this feed can be converted to an ATOM feed. Maybe it would even be easy to create a Google App Engine application that generates such a personalized feed for me, but I’m not sure if I would publish this feed (but that’s a different story). By iterating over the most popular services, the Topic Maps community could provide small even dynamically generated topic maps for all kinds of information pieces. It should e.g. be easy to convert an ATOM feed of a blog into an ATOM feed of topic maps that describe or contain the blog entries.

What is missing now, is the ability to read and combine those ATOM feeds. It doesn’t sound hard to write a little topic map stream reader that uses my favourite Topic Maps engine TME, reads all feeds F that I’m interested in, for each feed item i fetches the topic map Ti, and merges that topic map into my personal knowledge base topic map. Et voilà: Topic Maps streaming in action!

What do you think? Would that be useful? It seems to me that such an aggregated topic map would be useful for integrating the social services that I use, and to store other personal information. The quality of the information depends a lot on how the different services are mapped to topic maps. However, it should be possible to find what you’re looking for with some custom TMQL queries. Also, there is no limit of the services that can be wrapped into such topic map streaming feeds. It can be blog feeds, a feed of topic maps in Maiana, photos from flickr. You get it. A side effect of such streaming wrappers would be that many small topic maps become available on the web. I’m sure that there are many was to link them together!

That’s it for now. I hope that you could get a basic understanding of my idea. I’ll try to put up an example of a topic map streaming feed in one of the next posts.