Taming the Norwegian Red List with Topic Maps
This is the first part in a series of blog posts about my Coleoptera topic map project. In this post I show you how easy it is to convert the Norwegian Red List of Threatended species into a topic map. The Norwegian Red List is essentially a forecast of the risk of species becoming extinct in Norway. As you might know, 2010 is the International Year of Biodiversity, so I thought it would be a nice pet project to make the data from the Red List available to Topic Maps based applications. Beetles, by the way, is the group of insects with the largest numbers of species–more than 350.000 world wide (according to wikipedia). In Norway, a little more than 3500 species are known, and 801 beetle species are on the Norwegian Red List 2006, which, unfortunately, make them the largest group of species described in the Red List.
A Red List classifies threatended species into a small set of categories:
|EX||Extinct||No individuals remaining|
|RE||Regionally||extinct Very little doubt that it is extinct in the region concerned (here: Norway)|
|CR||Critically||Endangered Extremely high risk of extinction in the wild.|
|EN||Endangered||High risk of extinction in the wild.|
|VU||Vulnerable||High risk of endangerment in the wild.|
|NT||Near Threatened||Likely to become endangered in the near future.|
|LC||Least Concern||Lowest risk. Does not qualify for a more at risk category. Widespread and abundant taxa are included in this category.|
|DD||Data Deficient||Not enough data to make an assessment of its risk of extinction.|
|NE||Not Evaluated||Has not yet been evaluated against the criteria.|
The Red List categories: (Image licensed under the Creative Commons Attribution 2.5 Generic license. Graphic credit: Peter Halasz.)
The thought is to model species and Red List categories as topics, and then to associate the species to their corresponding Red List category. According to the plan, a new Red List with updated information is going to be published every fourth year. To allow the information from several Red Lists (or even Red Lists from other countries) to exist in parallel in our topic map, we create a topic representing the Red List itself. The association between the species and the Red List category is the scoped with the topic representing the Red List. In Compact Topic Maps Notation (CTM), the Red List and the criteria can be modelled like this:
%encoding "utf-8" %version 1.0 %prefix tmcl <http://psi.topicmaps.org/tmcl/> %prefix lang <http://psi.oasis-open.org/geolang/iso639/#> %prefix redlist <http://psi.entomologi.org/redlist/> %prefix redlist-category <http://psi.entomologi.org/redlist/category#> %prefix ent <http://psi.entomologi.org/> shortname isa tmcl:name-type; = http://psi.entomologi.org/topic-name/shortname . ent:artsdatabank_id isa tmcl:occurrence-type; - "Artsdatabank ID" . redlist:criterion isa tmcl:topic-type; - "Criterion"; - "Kriterium" @ lang:nno . redlist:category isa tmcl:topic-type; - "Red List category"; - "Rødlistekategori" @ lang:nno . redlist-category:EX isa tmcl:topic-type; ako redlist:category; - shortname: "EX"; - "Extinct"; - "Utdødd" @ lang:nno . # [...] corresponding topics representing the other Red List categories # Topic type for Red Lists ent:redlist isa tmcl:topic-type; - "Red List" . redlist:redlist2006 isa ent:redlist; - "2006 Norwegian Red List"; - "Norsk Rødliste 2006" @ lang:nno . redlist:is-redlisted-as isa tmcl:association-type; - "Is redlisted as"; - "Er rødlisted som" @ lang:nno . redlist:is-possibly-redlisted-as isa tmcl:association-type; - "Is redlisted as (uncertain)"; - "Er rødlisted som (usikkert)" @ lang:nno .
Artsdatabanken.no provides a search interface to the 2006 Red List. On the “Alle arter” (all species) tab, it is possible to query the database for all species listed under specified categories. The result can be exported as a comma separated value file. The file looks like this:
"ArtsID","Artsgruppe","Vitenskapelig artsnavn","Norsk artsnavn","Underart Kode","Kategori","Kriterier","URL" 1582,Biller,Pseudomicrodota paganetti,,A,DD,,http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=1582 19,Biller,Haliplus fulvicollis,,A,DD,,http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=19 1929,Biller,Denticollis borealis,,A,VU,B2ab(iii),http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=1929 3210,Biller,Cionus alauda,,A,NT,,http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=3210 ...
Ok, what have we got here? All species listed have an ID (“ArtsId”). They belong to an order (“Artsgruppe”, in this case “biller” which means beetles). There is a scientific name (“Vitenskapelig artsnavn”) and sometimes a Norwegian name (“Norsk artsnavn”). “Underart Kode” is a subspecies code, which we can ignore in this setting. Then we have the Red List category (“Kategori”), which is one of the categories mentioned above. “Kriterier” is the criterion that was used to classify the species as redlisted. This valus is a code that we won’t decipher any further. It might look like this example: “B1ab(iii)+2ab(iii)”. These criterions describe the parameters which, on the basis of population models, are known to be important for the risk of extinction.
The file from artsdatabanken.no seems to be encoded in UTF-16LE, so we need some iconv-magic before the file can be converted into a UTF-8 encoded topic map:
iconv -f UTF-16LE -t UTF-8 artsdatabanken.csv > redlist.csv
Unfortunately, something else is wrong with the CVS export. The problem lies in the criterion field. If you search for “i,i” with your text editor, you will see that ”,” within field values is not quoted, and ”,” is used as a field separator. I wrote a mail about this issue to artsdatabanken.no several months ago, but never got an answer. Unitl this is fixed, manual cleanup is needed, so
2674,Biller,Corticeus suturalis,,A,EN,B2ab(ii,iii)c(ii),http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=2674 should become 2674,Biller,Corticeus suturalis,,A,EN,"B2ab(ii,iii)c(ii)",http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=2674
(with ” added).
To convert the file to CTM, download redlist2ctm, a little Perl script that I wrote some time ago from my github.com account. Run the script with:
./redlist2ctm redlist.csv > redlist.ctm
And voilà, there we have our topic map containing a part of the Norwegian Red List. Every species gets it own Published Subject Locator (PSI), based on its scientific name, e.g. http://psi.entomologi.org/species/coleoptera/agathidium%5Fbadium. The Red List category is associated to species with a is-redlisted-as association. If a criterion for the categorization is available, the criterion is represented as a separate topic that plays a third role in the mentioned association. The URL is added as a subject locator to the species, since the referred page can be seen as a represention of the species containing all available information about the species registered at artsdatabanken.no. The Norwegian names, if available, are added as a name in the scope lang:nno. That’s about it. To make the topic map really useful, you’ll probably need a taxonomy (a tree consisting of orders, families, genera, etc.). Also, the scientific names could be modelled in a better way. The Red List export does not include the author name. But we will be able to fix this later, as long as all species got their unique PSI. Here are two species represented in CTM:
haliplus_fulvicollis isa ent:species; http://psi.entomologi.org/species/coleoptera/haliplus_fulvicollis = http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=19; ent:artsdatabank_id : 19; - full_species_name: "Haliplus fulvicollis" . redlist:is-redlisted-as(ent:species : haliplus_fulvicollis, redlist:category : redlist-category:DD) @ redlist:redlist2006 cassida_nebulosa isa ent:species; http://psi.entomologi.org/species/coleoptera/cassida_nebulosa = http://www2.artsdatabanken.no/rodlistesok/Artsinformasjon.aspx?artsID=2871; ent:artsdatabank_id : 2871; - "Prikket skjoldbille" @ lang:nno; - full_species_name: "Cassida nebulosa" . redlist:is-redlisted-as(ent:species : cassida_nebulosa, redlist:category : redlist-category:EN, redlist:criterion : cassida_nebulosa_crit) @ redlist:redlist2006 cassida_nebulosa_crit isa redlist:criterion; = “B1ab(i,ii,iii)+2ab(i,ii,iii)” . Related work
If you are familiar with the Ontopia Topic Maps engine, there is a module to convert CSV files into a topic map: db2tm. Using this module you can map a CSV file to a topic map that can be exported in any Topic Maps format supported by Ontopia. Outlook and lessons learned
The next blog post is where the fun really starts: I’ll show you a simple semantic mashup (great buzzword in these times) that uses this topic map and an external web service to annotate web pages with Red List information and PSIs. Until then, this is what we’ve learned so far:
- Converting data from other applications into a topic map can be really easy.
- No Topic Maps engine needed to create a topic map!
- There is more than one way to do it!
- Using Perl as scripting language keeps the Perl language from dying out, and this is a good thing(tm).
For 2010 a new Norwegian Red list is planned. It remains to see if the data for 2006 will still be available in CVS format. So hurry, and create you own Red List topic map now! With Topic Maps your data will probably last a little longer :-)
Edit: fixed some typos