Debian Package Tracking System now produces RDF description of source packages

Here’s a second post on the subject of RDF descriptions for Debian source packages (see the previous post for some context).

From now on, the Debian Package Tracking System (PTS) will produce, alongside HTML pages meant for humans, RDF pages meant for Linked Data / Semantic Web aware applications.

Every Debian source package, which used to have an HTML page like http://packages.qa.debian.org/packagename now has a corresponding RDF/XML document available provided that the application/rdf+xml content-type is required (the HTTP client being redirected to the proper HTML or RDF document).

The main objective is to try and produce (closest as possible from the source, at debian.org) an authoritative description of Debian’s packages in an interoperable format (RDF using the ADMS.SW 1.0 ontology).

Thus, each package in Debian can be identified on the Semantic Web with a unique URI like http://packages.qa.debian.org/apache2, which is dereferenceable as RDF.

Here’s a graph representing the resources included in http://packages.qa.debian.org/apache2 (click on the image to load it) :

You can see an example of such an RDF representation for the apache2 source package in trying (with the ‘raptor2-utils’ package installed):

$ rapper -o turtle http://packages.qa.debian.org/apache2

If you prefer more visual/navigatable versions, use one of the Linked Data browsers referenced in wikipedia or see the output of the W3C RDF validator.

For more details about the underlying format see a raw RDF/XML dump :

$ curl -L -v -H 'Accept: application/rdf+xml' http://packages.qa.debian.org/apache2

Any Semantic Web application can then reference that resource, which becomes part of the Linked Open Data cloud.

There are probably lots of uses that could emerge from this, but the immediate I can think of is the matching of Debian packages with other packages/projects (upstream, downstream, other dists) in an interoperable way (see other projects like distromatch, etc.). At the moment, the doap:homepage of the upstream SoftwareProject resources generated by the PTS can be such a simple matching key (it’s actually much more complex than that, but will be the topic of further efforts).

AFAICT, this is the first large deployment of an ADMS.SW 1.0 implementation. On my machine, I’ve counted almost 1.5 million triples generated.

Stay tuned for more installments on similar topics.

8 thoughts on “Debian Package Tracking System now produces RDF description of source packages”

  1. Any chance of also returning JSON for Linked Data document in addition to RDF/XML?

    With a JSON for Linked Data document (and an CORS header), this data could easily be used from Web apps without conversion (which, from a Web app, is a headache).

  2. @Samat I guess it could be done, probably just a matter of running rapper or another conversion tool on the server. When one has the real need for it, maybe we can add this.

    Do you have a particular use case in mind ?

    Anyway, feel free to file a feature request for this on the on the qa.debian.org pseudo-package.

  3. Nice work 🙂

    Does the RDF include dependency relationships? This could be really helpful to do graph analytics if we had a whole dump of the repository. That could be interesting to know which packages are more central, have more impact, or to assess propagation of security issues, etc.

  4. Dependencies aren’t yet there, but it could be implemented. Provided there’s an ontology suitable for such links, of course.

Leave a Reply

Your email address will not be published.