Experimenting with Linked Open Data about FLOSS projects : matching Debian upstream projects

I’ve been experimenting with Linked Open Data about FLOSS projects harvested from different sources of DOAP or ADMS.SW descriptions. I’ve tried and match upstream projects of Debian packages with upstream projects hosted at Apache, Gnome, or Alioth.debian.org, or catalogued on Pypi.

I’m matching them on identical values of the Homepage field (comparing the Homepage Control field set by Debian packagers with the doap:homepage meta-data in the RDF documents harvested from the upstream project catalogues).

Here are initial results of my little experiment, for number of matched projects, and results on project name’s similarity :

Upstream catalogue Total matching projs Exact same project name Same project name (case independant)
apache 31 0 (0 %) 0 (0 %)
alioth 16 13 (81 %) 13 (81 %)
pypi 439 217 (49 %) 273 (62 %)
gnome 21 0 (0 %) 7 (33 %)
Total 507 230 (45%) 293 (58 %)

The data set contains tens of thousands of projects, with probably many duplicates, but from all of these, only 507 have common homepages.

As you can see, in some cases, the Debian source package names match the upstream project name (sometimes with lower/upper case variants), but in general, the project names aren’t identical, so it is interesting to try and match them by homepage.

For the curious ones, the Apache, Gnome and Pypi project catalogues use to provide RDF meta-data for quite some time. More recently have we introduced ADMS.SW meta-data for Debian source packages, and even more recently for the Alioth projects (through the ADMS.SW exporter plugin for FusionForge).

There are still some ways for improvements, for instance to normalize homepage URLs which tend to vary (trailing slashes, or different HTTP/HTTPS schemes).

Stay tuned for more details.

New paper “Authoritative linked data descriptions of debian source packages using ADMS.SW” accepted at OSS 2013

I’ll be presenting “Authoritative linked data descriptions of debian source packages using ADMS.SW” at OSS 2013.

Here’s the abstract :

The Debian Package Tracking System is a Web dashboard for Debian contributors and advanced users. This central tool publishes the status of subsequent releases of source packages in the Debian distribution.

It has been improved to generate RDF meta-data documenting the source packages, their releases and links to other packaging artifacts, using the ADMS.SW 1.0 model. This constitutes an authoritative source of machine-readable Debian “facts” and proposes a reference URI naming scheme for Linked Data resources about Debian packages.

This should enable the interlinking of these Debian package descriptions with other ADMS.SW or DOAP descriptions of FLOSS projects available on the Semantic Web also using Linked Data principles. This will be particularly interesting for traceability with upstream projects whose releases are packaged in Debian, derivative distributions reusing Debian source packages, or with other FLOSS distributions.

Update: If you are interested, a preprint is available here in HTML form. See also previous installments on ADMS.SW in this blog.

Update: The slides of the presentation I made at Isola are here.

Dynamically querying external (sub) projects properties with RDF in FusionForge

I’ve been working recently on two plugins for FusionForge. The work is somehow a POC for some COCLICO dynamic interoperability work-package, but some outcomes may actually be of use someday in real life, who knows 😉

The first plugin is called extsubproj, and allows the definition of links to external subprojects (i.e. hosted on another forge), in the properties of a FusionForge project. It’s basically managing a set of stored URLs and displaying them in the top project’s summary page. Nothing fancy, so far, and the code is not yet finished, nor pushed to FusionForge’s trunk yet.

The second plugin, called doaprdf, allows the publication, by the forge, on the same URL as the project summary page (those URLs are standardized in FusionForge in the form : http://.../projects/projname), of a RDF+XML description of some of the project’s metadata, using the DOAP dialect. This works with content-negotiation, following principles of the Linked Data paradigm, so that the same URL, when requested for HTML, renders a Web page (the forge project’s summary page) meant for humans^geeks, and when queried with a special content type Accept HTTP header (Accept: application/rdf+xml), meant for machines.

Now, when you combine these two, you gain the possibility of having extsubproj display not only the suprojects’ URLs, but also some of their meta-data, for instance a link in the form of <a href="http://.../projects/projname">fetched doap:name</a>.

The POC illustrates how one may then construct a hierarchy of (public so far) projects and sub-projects accross the buondaries for different forges databases, and display them in a similar manner as local projects (for instance, what the FusionForge plugin projects-hierarchy provides), through dynamic query of the remote project’s properties fetched on demand and modeled in a generic dialect (RDF with common ontologies such as DOAP).

Note that a similar FusionForge plugin “foafprofile" is being developped too for users profiles, using RDF and FOAF.

Stay tuned for more content in the same vein.

LD2SD, Helios and Linked Data / SemWeb extracted from development forges

We have received a visitor from DERI last week (PhD student Aftab Iqbal) who’s researching integration of facts about software development tools into Linked Data in order to provide interesting “semantic mashups” of data into IDEs like Eclipse (see his slides). Quite interesting is the choice of ontologies and the results integrated in an Eclipse plugin made available to developers.

This approach is quite similar to the one we practice in the core of the Helios platform (still under development) to integrate data coming from different FLOSS ALM tools in order to create dashboards offering a consolidated view of software (maintenance) process.
Maybe the difference is that Helios does this internally inside a “self-contained” platform whereas the potential of LD2SD presented by Aftab is to do the same on the Web of Linked Data.

Also, in Helios, there are other contributions made for the Mandriva distribution (with links with projects like Scribo and Nepomuk to which Mandriva is also participating) in the form of the doc4.mandriva.org, in order to aggregate, this time, not facts at the “project” level, but for a meta-project (a GNU/Linux distribution) that are quite interesting. See Stéphane Laurière’s slides for details.

We’re also experimenting in the frame of the COCLICO project on producing RDFa data about software development projects hosted in FusionForge instances. (see our progress tracked through this FusionForge feature request). First candidates are project’s DOAP profiles and developer’s FOAF ones, and lots of SIOC to glue it all, and of course other informations relating to a Forge ontology that we’re proposing in COCLICO.

With recent announcements that Mylyn is investing a lot in OSLC, and OSLC being based on RDF, and the advent of the Semantic Desktop starting to emerge (in KDE mainly) on top of Nepomuk, this brings great promises for a great Semantic future.

Quick visualization of the social network of the Gnome projects’ maintainers

Thanks to Frédéric Peters, I’ve discovered that Gnome is using DOAP to describe projects in its Git.

At http://git.gnome.org/repositories.doap, one will find the complete agregated DOAP and FOAF description for the projects of Gnome and their maintainers.

My collegue Vu, who’s researching on such social networks has graphed the following image :
which shows projects in red, and their maintainers in green.

More analysis later maybe, but the image is already nice in its own 🙂

Update : a more detailed representation is available (see comments bellow).