URIs for languages

I've eventually given up trying representing hubjects at all, at least for the moment. I had a serious try at it at lingvoj.org. But after discussions in the Linking Open Data forum, I eventually surrendered and published the languages description in a way conformant to W3C recommandations for Semantic Web architecture, with content negociation, 303 redirects and the like. I've even suppressed the previous post here saying otherwise, which would be now full of dead links and would bring about confusion.
So we'll see how this flies. Feed your favourite tool with the URI http://www.lingvoj.org/lang/zh, and figure by yourself if it provides a useful description of the Chinese language, both for humans and machines.


Using owl:sameAs in Linked Data

It's been a very long and interesting thread on Linking Open Data forum and elsewhere, about the use and semantics of owl:sameAs. I just suggested the following best practices :
  1. Assertions such as "a:foo owl:sameAs b:bar" should be grounded on some form of agreement of the owners of a:foo and b:bar, on whichever basis they both decide to agree.
  2. For outsiders (owning neither a: or b: domains), such agreement could be shown by the presence of the assertion in symmetrical way in both domains, each domain using its own URI/resource on subject side, and the other's on object side, that is :
    (a) asserts "a:foo owl:sameAs b:bar"
    (b) asserts "b:bar owl:sameAs a:foo".
  3. If one side (a) pushes the assertion first, the other side (b) should be at least made aware of it by (a), and is entitled to say she agrees or not : (a) says that "a:foo owl:sameAs b:bar", but as the owner of (b), I do not necessarily agree. Such lack of agreement could be implicitly entailed from the absence of the reciprocal assertion on (b) side.
Granted, from a pure logical viewpoint, those assertions are strictly equivalent since owl:sameAs is a symmetrical property, but from a social/trust viewpoint, having each side declaring it in a specific direction could be interpreted as a formal proof of agreement. It's what have been done e.g. between DBpedia and GeoNames. The title thread shows once again by its sheer length, and if necessary, that there is no universal way to ground such agreement, which belongs to the realm of language and social communication.


A journey to Data Mountains

Seems about time to revisit a famous Zen aphorism (more verbose translation here)
Morning mountains, only mountains
Noon mountains, more than mountains
Evening mountains, simply mountains
How does this apply to what we have been speaking about here? We've tried in the morning of our ignorance to pile up mountains of data, and had hard time making sense of them.
In the broad daylight of our powerful abstract thought, both intuition and logic, we found out that useful data were data about some thing(s). Making explicit the things data are about was really the way to follow to organize, understand, search, query, in a thousand ways make data more useful, more meaningful. So we went through those mountains with this "about-ness" in mind, and they looked indeed more than mountains, they looked like information about things. We called it classes, properties, relations, and started re-engineering data in all sorts of smart ways : metadata, RDF, topic maps, ontologies and the like. Happy to bring more and more meaning into data, we called it knowledge, and we thought we had found answers to some fundamental questions, discovered the information lost in data, and the knowledge lost in information. Captured the mountain's spirit.

Then came the time to harvest, to weave it all together. I had knowledge in my information system, and so did you. Or so we figured. But looking into your system, I found only data, and so did you when you looked into mine. Where are the things gone? Where is the knowledge hidden? Only data, which we had to figure again together how to weave.

So we'd not captured the mountain's spirit after all. But we did not travel in vain, because we've felt it blowing by. We know it's hidden somewhere beyond the data, beyond information, beyond what we have called knowledge, and that without this spirit our data would be completely meaningless and useless indeed. And what is more, we have at least found a piece of wisdom lost in knowledge : data are only data.


A bit of Chinese

I've been dreaming about learning a bit more of Chinese than the few characters I've been playing with ever since I discovered the excellent introduction by Kyril Ryjik's "L'Idiot Chinois", around 1980 or so (this book is unfortunately sold out now). I've decided that now is the time to learn Chinese, for all sorts of obvious reasons. Hence the extract of the original Chinese text of the now famous "wheel and hub" quote in this page. If your browser does not support Chinese characters, you should do something about it right away.
I've tried to come out with my own translation. Note that I got rid of any capitalization, because Chinese has no notion of Absolute Capital Things. The concepts represented by Chinese characters are most of the time both concrete and abstract, generic and specific. No definite or indefinite articles, no clear distinction between grammatical nature and function. Depending on the context, the same character may be translated as a noun, a verb, or adjective.
In a way, Chinese characters are by nature ... hubjects.


Adieu to Published Subjects

I've learnt those days that the OASIS Published Subjects Technical Committee, which I've chaired for two years from its foundation in August 2001, was closed. Actually it was officialy closed by OASIS in November 2006, but I had not received any notification from anyone. Sounds like learning the death of an old friend months after.
Actually the activity of the TC was dormant since the publication of its first and somehow unique deliverable Published Subjects: Introduction and Basic Requirements. This output does not seem much after two years of work, but it figures there was not much more we could achieve. In a recent private exchange about the future of Published Subjects, Patrick Durusau, who chaired also this TC after 2003, still wants to believe that it is not the end of it, that the work has stalled mainly by lack of task force, but maybe anyway this TC was a case of premature specification.
I think that the notion of a published "identification" of a subject, whatever you want to call it, is probably a good idea, so long as anyone can add their identification of the same subject. On the other hand, a notion that this *is* the identification of a subject, well, that leads to losing propositions like the stuff you find at Swoogle. How many different identifications of person are there?
I already set this question here two years ago. Amazingly enough, the figures does not seem to have changed since (399 answers by today).
I take the opportunity to point to this paper by Patrick. If you have not figured out what a subject can be, even after an extensive reading of this blog (or don't care going into so much reading) this is a must. Short, clear and to the point.


Linking Open Data

This is a challenging project for Semantic Web technologies. Weaving together open public data, such as Wikipedia or Geonames, and public ontologies and vocabularies such as Wordnet, etc. Of course I had to be involved in that. But consensus will be hard to achieve. Initiators are folks from Leipzig and Berlin universities, involved in dbpedia, a project to RDF-ize Wikipedia content.
I've pushed the idea that linking concepts from different schemes should not be done on the basis of too strong ontological commitment, but of some kind of loose coupling using e.g., SKOS mapping vocabulary, and why not blank concepts. This proposal has not been well received, to say the least.

I'm in general against using bnodes for anything! They should be deleted from the RDF spec and they are especially harmful in a linked data context, where everything should be dereferencable.
And Richard Cyganiak adds:
Yes, sure ... I understand why you want to introduce the blank node. But I don't like it. Why do you generate data? You want it to be useful. How does this blank node increase its usefulness? It doesn't. It's just a fig leaf to cover up the fact that your model is just an approximation of the real world. But we know this. Every model is. And these "semantic rubber bands" don't change this fact -- they just make your data harder to work with. Be bold! People who want to re-use your data will learn to work around its quirks and idiosyncrasies. Dealing with the quirks is a part of re-using data, it always was, and it always will be.
So much for my hubjects ... but this is not the end of it. More to come.

Changing blog title

I know, changing names, like changing URIs, is a silly idea. Cool names never change. But I decided yesterday to re-activate the original univers immedia in French. And since 'The wheel and the hub' has definitely been for a while and certainly will continue to be, the motto of this blog, it had to make its way up to the title.


Every subject is a blank node

Two discussion threads made me move a step forward towards a general theory of blank nodes. One thread I already mentioned is about languages. The other one was started here by John Black on Semantic Web list, about representation of concepts having contextual semantics, such as "I', "You", "Here" etc. Using a blank node to represent the context is the solution I propose today here.