The Wheel and the Hub (six months later)

Six months have passed since the first release of 'The Wheel and the Hub'. Thought it was time to revisit it, and make it consistent with a parallel new release of SPEK vocabulary. The new presentation is much shorter and more full of emptiness. Eventually I've came back to the first impression about it, which is well known to often turn out to be the best one. Blank Nodes are indeed the clue for representation of subjects, but with a subtle and important difference with the first version. Blank nodes representing subjects must be really empty, they have to bear absolutely no declaration of any property whatsoever, not even links to their various descriptions (as they did in the previous versions). It's up to the various descriptions to point to the same blank node, as so many fingers pointing at the Moon, and not the other way round ...
Following this logic, SPEK vocabulary has also be simplified to the extreme ... I don't need "views" and "aspects" any more. Any RDF description is a view and provides a specific aspect. Note also that 'hubject' is back in the SPEK vocabulary, but no more as a class, but as the property linking a description to the binding blank node.

I guess this is now as simple as possible ...


Forging URI schemes : best or bad practice?

I've posted already about International Virtual Observatory Alliance. It has a forum called IVOA semantics. Current thread is about relevancy of forging new URI schemes, fit for a large community of users (like astronomers) vs http URLs.

The year of the unique ID

Jack forwarded that one. JOHO stands for "Journal of Hyperlinked Organization". Food for thought based on the ISBN case. What does isbn:foo identifies?
  • Work (e.g., Hamlet)

  • Expression (e.g., the Folger's Hamlet with annotations and introduction)

  • Manifestation (a particular print run of Folger's Hamlet)

  • Item (a copy of Folger's Hamlet sitting on a shelf)

Work, expression, manifestation, item : four aspects, in four different perspectives, of the same hubject isbn:foo


Tom Gruber on Tag Identity

Tom Gruber is famous for his striking and sensible aphorisms about ontologies such as "Every Ontology is a Treaty". He strikes again with this paper proposing an Ontology of Folksonomy.
In section 5, just before the conclusion, a bunch of questions about Tag Identity, showing that both Tags and URIs meet similar identification issues, in fact common to any naming mechanism, and neither is providing killing answers.


Identity, Reference and the Web

A challenging workshop to be held in Edinburgh in May 2006. I've been invited today by the co-chair Harry Halpin to participate in the Program Committee. From the "Goal and Theme" section of the description.
URIs are the primary mechanism for reference and identity on the Web. To be useful, a URI must provide access to information which is sufficient to enable someone or something to uniquely identify a particular thing and the thing identified might vary between contexts. There is no doubt that as mechanisms for identifying web pages the URI has been wildly successful. Currently, URIs can also be used to identify namespaces, ontologies, and almost anything. However, important questions are the interpretation and use and meaning of URIs have been left unquestioned ...
Exactly indeed, what we are about here ... Interesting to see also Pat Hayes in the co-chairs list. I remember that quite a while ago in a private communication, Pat had stressed the fact that identity issues had been "sadly overlooked" so far by current Semantic Web technologies.

Compact URIs : The CURIE syntax

I've started using this compact syntax in a project using IPTC specifications. Beyond the practical aspects well presented in the W3C note, it strikes me that CURIEs are good candidate hubject identifiers. For example ISBN:0201749602 is a compact URI which can be expanded automatically by various systems in so many URIs providing different views of the book, such as ...

New SPEK release

Although previous release had not yield much feedback so far, I've kept hitting this nail of views, aspects and perspectives and eventually produced a new version of SPEK vocabulary. The new release is as simple as can be, although the RDF schema is quite weird at some points. It introduces a spek:Description which is not formally defined as subclass of rdf:Description, but which looks really like it. RDF editors are variously happy with it. SWOOP beta 2.3 seems to yield the best results among all I tried.
Hubject is no more an explicit class in this version, because I've figured out that about any resource could be used as a hub. I've kept "spoke" as the property linking a view to the resource it describes, but changed the direction : the spoke is directed inward to the (hubject resource), not outward from the hub.
As an example, I picked the "Air Pollution" hubject as defined by Wikipedia, and four different views : a term in a glossary, a descriptor in a thesaurus, a category in a taxonomy, and a class in an ontology.


SKOS in Topic Maps

I've mentioned here a while ago the new blog of Lars Marius Garshol. His last post is about using SKOS vocabulary in Topic Maps, and is presented as a long overdue school work. Well done.
Now the crucial question is maybe not how you can do it (something Lars Marius shows quite neatly as usual), but why one would want to do that. Adding a real world use case would be cool ...

As simple as possible ...

Those days Planet RDF is buzzing with a bunch of interesting responses to Danny Ayers' provocative question about alternatives to the Semantic Web (if possible simpler than the original stuff). Getting rid of artificial complexity gathered around the basically so-simple RDF model is of course the main preoccupation, and of course, the lack of canonical serialization in XML is seen as a major obstacle to adoption. At SemEmergence, Seth Ladd is crying for it:
Please, W3C, create a standard RDF serialization that elevates RDF as a first class citizen of XML. Everyone else has a schema, why can't we?
Having passed (too much) time those days struggling with the yet-another-serialization syndrom in the latest versions of SWOOP and Protégé, I could not agree more. But waiting for such a (most unlikely) W3C delivery, alternatives solutions pop up and are worth looking at.
Phil Jones pushes the notion of SynWeb, which he defines as a web which doesn't need "key identifiers".
The difference is that the knowledge needed to give semantics to the data resides in the programs which do the combining, rather than in a schema which has been prepared earlier.
No absolute meaning of data, no absolute identifiers, semantics in the application context? Certainly close to our current ramblings on perspectives and aspects.

The simplest and most radical alternative to-date is certainly Phil Dawes' tagtriples, a simple text format for triple statements. Forget URIs, namespaces, XML and the like. Identification is local to a graph (an ordered collection of statements), as indicated in the Tagtriples Model and Semantics (don't run away, that is really as simple as can be).
All occurances [sic] of a particular symbol in a graph must denote the same meaning. [...] The same symbol used in different graphs may or may not denote the same meaning - it is up to the consumer of the information to interpret how the symbol/meanings correspond.


Introducing SPEK

Follow-up of the previous post. I eventually delivered a first release of SPEK, an RDFS vocabulary leveraging SKOS to express perspectives, aspects and hubjects.
More to come ...


Perspectives and SKOS

Just thought time was ripe to push hubjects and Michel Biezunski's perspectives in the SKOS forum. Watch this place ...

Subject classification with DITA and SKOS

DITA (Darwin Information Typing Architecture) has been developed by IBM since 2001, and introduces itself as a "topic-oriented architecture". DITA has its own definition of a topic, which is a bit different, and in a sense more restrictive, than the one(s) found in Topic Maps.
A topic is a unit of information that describes a single task, concept, or reference item.
The new publication, really worth reading, comes with a challenging academic subtitle : "Managing formal subjects", hiding in fact a very pragmatic approach:
In a topic-oriented architecture such as DITA, content is authored in small, independent units that are assembled to provide help systems, books, courses, and other deliverables. Each unit of information answers a single question for a specific purpose. That is, each topic has specific, independent subject matter -- the very reason that these units of information are called topics.
The paper then expands very neatly on how SKOS can be used to declare what the subject of a topic is, claiming that "subject" here is to be understood in the same sense than in "Published Subject Indicator".


Placeopedia = Google Earth + Wikipedia

You all waited for it, here it is : connecting the most amazing geographical interface Google Earth with the ever-growing Wikipedia. Go to Placeopedia, pick an article in Wikipedia, find the place of the thing on Google Maps, and pinpoint it. And that's it. You got a cool Wikipedia + Google Earth subject indicator. Just added the Very Large Telecope and Parañal Observatory. Better choose the "satellite" view to see something in those places, though, and don't look for accomodations around, they are sparse ... After that, add Placeopedia data to Google Earth, and see new added objects in real time. Awesome!

[2015-02-23] Yet another dead project ...


The Search for the Perfect Language

Started diving in this fascinating book by Umberto Eco last week-end. Discovered the French translation available in my local library. Really worth reading, to understand that what we are doing here and in many other places today is just another episode of a very long story. A quote among many, this one from Descartes in a letter to Marin Mersenne in 1629 (my own translation from French, hope it makes sense).
I take that such a language is possible, and that the science on which it depends can be found, by mean of which farmers could best grasp the truth of things than philosophers do today. But don't hope to ever see it in use; that would suppose great changes in the order of things, and would need the world to be a heaven on earth, something worth to propose only in the world of novels.


Topic Maps for Libraries Wiki

Announced by Suellen on Topic Maps list. Wiki Home quote :
Elaine Svenonius in her book The Intellectual Foundations of Information Organization states that the purpose of information organization is "to bring essentially like information together and to differentiate what is not exactly alike".
Suellen has also established a Topic Maps Interest Group within LITA (Library & Information Technology Association). I hope she will take the time to comment a little more about it here.

Lars Marius is alive and blogging at TMRA'05

If you wonder where Topic Maps folks are today, you will find some of them, including Jack making the keynote, in Leipzig at TMRA'05. Lars Marius Garshol is there of course, and seems to have fun feeding his brand new blog, which is called simply "Larsblog" because he's a guy who loves simplicity. But I'll suggest him a more sexy name.
What about "Beer, Topic Maps and Everything."?


Anti-SPAM measures for comments

We've started for a month or so to get a few SPAM attacks on universimmedia, in the form of random comments linking to a variety of sites generaly having nothing to do with the post. I've suppressed them manually so far, but their number has increased those days, so I have enforced the word verification procedure which should stop comments generated by automatic SPAM software.
Sorry for the extra inconvenient in posting comments.


Revisiting Content Negotiation

I attended yesterday a very interesting telecon of the SWBPD Vocabulary Management Task Force. The agenda was highly technical - define best practices on how to provide through its URI, both computable RDF description for computers and human-readable description for humans, of an RDF vocabulary term. Use cases were SKOS and FOAF, with their respective editors Alistair Miles and Dan Brickley, and Dublin Core, represented by Tom Baker. All those smart guys have already explored the subject in-depth during recent Dublin Core Conference in Madrid, and agreed that current state of their respective vocabularies was suboptimal.
Devil is in the details there, for example many vocabularies use #URIs, such as http://www.w3.org/2004/02/skos/core#prefLabel.
From Topic Maps Published Subjects viewpoint, such an URI would be called a subject identifier, but in your browser, the fragid is not taken into account, because http://www.w3.org/2004/02/skos/core#prefLabel points to an RDF schema. So the subject identifier does not provide directly a human-readable HTML subject indicator, such as the one actually provided by http://www.w3.org/TR/swbp-skos-core-spec/#prefLabel
Everybody agreed that it would be good to have the subject identifier provide redirection to the subject indicator (even if this is not the terminology used so far in RDF land), at least for human users (that is, in a regular browser), whereas computers would keep being fed with the RDF description.
Consensus in this meeting was that content negotiation is the way to go. While it's unclear at this stage (at least for me) how it can be technically achieved, particularly with #URIs, it sheds a new light on Published Subjects specification, on which I expand in this post.
New thing here is that at the time of the specification (2003), we did not explore both possibilities offered and issues raised by content negotiation for Published Subjects.

Thinking further about it, it strikes me that content negotiation mechanism is very similar to hubjects. A URI managed through content negotiation is defining a subject/resource which is neither this content nor that one, but a superposition of all possible contents, the actual one being delivered in a given interaction depending of the client-server dialogue. It's amazing that impact of content negotiation on URI meaning seems to have been so much overlooked. Although the specification is now quite old, it seems to have been only used as a borderline technical trick, whereas it could become a fundamental mechanism to deliver, through the same URI, a variety of views of a subject to a variety of users, humans and computers as well.


Axioms of Identity

Here is what Scott C. Lemon said:
In my research into digital identity, I created a set of 'axioms' that have molded my perspective of the subject. I developed these axioms as the foundation for how I would create a digital identity solution ... a software solution to accumulate identity, and provide controlled dissemination of that information.

The First Axiom of Identity

I posit that we humans do not have any inherent identity.

The Second Axiom of Identity

I posit that identity does not exist outside the context of a community.

The Third Axiom of Identity

I posit that identity is exchanged in transactions that occur within a context of trust and authentication.

nota bene: given the last update on these (4-3-2005), I'm guessing that Bernard didn't already mention them here earlier :)


Thinking about RDF and Topic Maps

Danny Ayers, in this blog entry, talks about issues related to representations in RDF that speak to issues I have thought about for a while now. I think that now is a good time to start a dialog between the RDF tribe and the Topic Maps tribe. It's a double-edged knife, one that cuts both ways, looking at the true nature of the inquiries of each tribe.

I like to think about it this way: the core of the topic maps inquiry is to
satisfy a couple of important use cases: finding and reminding. In
those two use cases lie two primitive notions: subject identity and
names for things. Those are the two primitives that topic maps place
front and center, whereas, it seems to me, OWL emphasizes inferencing in
subsumption hierarchies, relegating subject identity to "proper use of URIs". I like to think about subject identity in the same terms a lawyer might do so in a court case. There, properties of the subject, more so than some URI, become all important. A trial might turn on something as trivial as shoes worn on some particular day. As topic maps are evolving, particularly in the case of the TMRM (topic maps reference model), we are seeing more emphasis placed on comparable subject properties than on precise URIs, which, in many cases, do not (yet) exist. We are seeing the evolution of the ability to "confer" identity on a subject according to circumstances. I think this line of inquiry can map directly into rdf work.

Topic maps (indeed, "subject maps") add one important consideration
outside subject identity and names for things: a guarantee that any
proxy for any subject (aka Topic), is the one place you need to go (in
*this* map) to find all that is knowable about that subject.

The knife cutting in the other direction suggests that, at the implementation level, topic maps could evolve along lines suggested from rdf work. Indeed, some of my own work involves the use of Jena coupled with JDBM for a backside.



GeoRSS is simple proposal for RSS feeds to also be described by location or Geotagged. It standardizes the way in which "where" is encoded with enough simplicity and descriptive power to satisfy most needs to describe the location of Web content.
This article further suggests to combine this geo-tagging with folkso-tagging, to provide pragmatic but efficient "what-where" identification.


Technorati Blog Finder

Cool new (beta) feature on Technorati. I have added a profile for universimmedia, with a few tags such as this one or that one. Ideas for more relevant tags are welcome.


Semapedia combines the physical annotation technology of Semacode with the availability of high quality information using the free encyclopedia Wikipedia.
Combination of 2D-barcode and bottom-up Published Subject Indicators. Bottleneck is that you need a physical reader on your mobile device, and have to find out physical tags. Next step is certainly to replace the semacode tags by RFID tags.


Simile Tools

I've posted a few months ago about Piggy Bank, without clearly stating it is but one of the various tools developed by the Simile Project, Semantic Interoperability of Metadata and Information in unLike Environments. All RDF-based stuff, but user-friendly, a qualifier which does not come to mind when looking at some other RDF tools (no names, please). Have a look at Longwell browser, for example. You can try it on-line to find your way through W3C specifications. Local installation requires a bit of Java logistics, not tried it yet.
What people say about what they do is also interesting stuff. I had mentioned Stefano's Lynotype before. Posts are not frequent, but always thoughtful. See e.g. Data First vs. Structure first.


Vocabulary, taxonomy, thesaurus, ontology ...

Still unclear about differences between those things? This is a neat and pragmatic introduction. The kind of stuff C*Os should be able to read and make sense of.
Seems that "Taxonomy" is the most trendy word those days, but if you take the time to make a bit of shopping at the Taxonomy Warehouse you will find all kinds of resources belonging to any of those types, and many more : subject headings, classification schemes, indexing schemes, reference models, dictionaries, glossaries ...

Google Sets

Yet another Google tool. Sort of things clustering, results can be amazing.
Try {thing, subject, resource} or {Mondeca}.

No hierarchy revisited

I've been re-discovering today this first exchange with Jack and others, about five years old now, amazed and quite pleased to find out we seem to keep following the same track, and to agree basically with most of what I wrote at the time. Don't know if that is supposed to be good news or bad news, though...
Actually there was an interesting notion introduced then that we have unfortunately a little forgotten since, which is subjects as attractors in conversations. This we should consider again, along with other mathematical tools linked to quantic superposition we have been discussing lately. On the same lines, I had an exchange a few days ago with Michel Biezunski who is currently exploring the field of fiber bundles as a possible tool for subject representation, in the line of his recent presentation at Extreme Markup 2005.


Blogos, the essence of your blog

Being always quite eager to coin new words, and singularly through hybridation, such as hubject or semantopic, I'm just frustrated to have been beaten at that one. So what is the blogos of univers immedia?

Grafting , crossbreeding and other taxonomy breaches

Follow-up of Jack's previous post. Biology has long ago set the rules for categorization, trying to capture the elusive but critical notions of taxon and species. Of course, it's always interesting to look at breaches in this system to see how robust it is. Without looking far away in the past to yet unclassified fossils, just consider common practices such as tree grafting and cattle cross-breeding. Interestingly enough, various religious traditions have been extremely touchy about them, often forbidding them merely because of entailed taxonomy breaches, the species organization being considered as the expression of some divine order. See for example this article showing how complex those issues can get in practice when addressed by Torah experts.
Most fears linked to bio-technologies are indeed to be considered at the same level. People are both fascinated and scared about hybrids and GM organisms, as they have always been about monsters and chimaeras of any kind, more for the breaches they make in their world representation than any objective danger they bring about.


Strange fossil defies grouping

I've got to hand it to paleontologists. Go look at the artist's sketch of the creature that is the subject of the linked article, then look at the image of the creature itself. That someone can imagine such a creature from such a fossil is simply amazing. Nevertheless, there exists a creature that does not readly fit current models. The story gives rise to useful points about subject identity.
The trouble is the animal, named Vetustodermis planus, did not possess a set of features, or characters, which placed it clearly within any known group.

I am interpreting the word "characters" to mean characteristics. This creature identity issue is telling in the sense that it suggests open issues for topic maps subject identification processing. How does ISO 13250 address subject identification? Section 5.2.1 "Topic Link Architectual Form" of ISO 13250 suggests this:
The optional subject identity attribute refers to one or more indications ("subject descriptors") of the identity of the subject (the organizing principle) of the topic link.

There exist numerous interpretations of 5.2.1, which are manifest in XTM, TMDM, and TMRM. Is it appropriate to revisit the assumptions inherent in those interpretations?

I am indebted to Patrick Durusau for long and productive discussions centered around the subject identity issues related to topic maps implementations. I'd like to see such discussions in greater depth, in public.


Subject Identity: Now more than ever...

Heard on the radio this morning while driving to work. Story about a woman who discovered that her 11-month old son was a terrorist. How could this be? A ticket agent at an airline prevented the child from boarding a flight. That's how. It seems that the given name of the child was found on the list of people to be prevented from boarding flights. It would seem that, in the context of merging topics in a topic map, it's dangerous to rely on names for things as a valid criteria.


More on Quantum Semantics

Some follow-up of the previous post on Quantum Semantics ... While Justin Leavesley keeps hitting the nail in Semantic SuperPositions, I remembered an interesting presentation by Nikita Ogievetsky in Seattle Knowledge Technologies 2002 Conference about Quantum Topic Maps. Googling around for more, I stumbled on a bunch of interesting papers, introducing identification issues raised by Quantum Physics, and some logical or mathematical frameworks able to tackle them.
    From the latter, in the conclusion :
    We have suggested here that quantum objects are vague objects and, further, that how that vagueness is understood depends on the metaphysical package adopted with regard to their individuality. If quantum objects are taken to be individuals, as Lowe considers them, then the vagueness arises because of the existence of relations which do not supervene on monadic properties of the relata; it is because of such relations that we cannot tell which particle is which in an entangled state [...] The alternative package characterises quanta as non-individuals, where this is understood in terms of a lack of identity. [...] There are still some interesting questions to be addressed here, such as how it is that one can refer to objects for which one cannot even say that identity holds.
    Is Information Science, at the dawn of 21st century, at a breaking point similar to the one crossed by Physics a century ago ?


    Deep Web Research

    The link under the title points to an entity that is, on the surface, interesting:
    DeepWebResearch.info is a Subject Tracer™ Information Blog developed and created by the Virtual Private Library™. It is designed to bring together the latest resources and sources on an ongoing basis from the Internet for deep web research which are listed below.

    On the surface, it sounds like they are doing topic mapping of one sort of another. What is more interesting (to me) is how I landed on that site: mostly by way of a search for everything that is knowable about UIMA, IBM's Unstructured Information Management Architecture, which is being announced this week at LinuxWorld to go open source. It is already an Eclipse plugin. One of the search hits suggested that DeepWebResearch might be using UIMA in its technology.

    Whether quantum mechanics, or category theory, or plain old propositional logic is at work, it is necessary that some form of information resource harvesting will be necessary. It seems a bit of great news that we can start pulling together a large array of available open source products to assemble ever more powerful harvesting tools.

    Schrödinger's Web

    This is a follow-up of the previous post at Inbetween, providing yet more exciting thoughts about co-existence of many inconsistent descriptions of the same thing as a native feature of the Semantic Web.
    It strikes me that if inconsistency is fundamental then it should be treated as such, not something to be avoided.
    Follows the idea that maybe we need something like the logic of Quantum Physics for the Semantic Web. In such a framework, subjects would be seen, as quantic objects are, as superposition of mutually incompatible states, each one with a given probability. Pushing this concept further needs to define the notion of interaction. When you interact with a quantic object through an experiment, you get the very peculiar behavior known as wavefunction collapse in which the probability distribution changes suddenly in such a way that one particular state is actually "observed". Very long debates and crucial experiments eventually turned out to be rather in favor of the strictly probalistic interpretation, which some famous Quantum Physics founders (including Einstein) would not have been happy with.
    I wonder if we are ready to go this far. Seems to me people will have hard time to, but eventually accept to live with the notion of subjects being by essence superposition of mutually inconsistent states, but going further to admit that observed properties of a subject in a given representation context are probabilistically determined would certainly prove at least as difficult as it has been in Physics. Took about a century there.

    [Added] Thinking about it, the case certainly differs from Physics. The "semantic collapse" leading to specific representation properties is certainly not completely random, but rather likely to use some hidden variables depending on the representation context.

    [2013-08-20] : A typical observation of the subject identified by a URI is a HTTP GET request on this URI. The state of the resource you observe depends on server and client conditions, plus state of the network, plus content negotiation parameters, caching etc. All those are mostly hidden variables indeed for the regular user, but the important point is that most of the time you can't define the state of the resource independently of a specific interaction.


    Perfect or sloppy - RDF, Shirky and Wittgenstein

    Danny Ayers picked this one up. Follows Clay Shirky's post on ontologies that I mentioned earlier. Here's the snippet that, I think, ties the linked subject to univers immedia:
    It essential[ly] hinges on this, do you believe two people have ever in the history of humanity shared the same (i.e identical) concept. Do you believe that concepts exist as perfect entities that we share or infact do we say a concept is shared when we see a number of people using words in a similar enough way. i.e is the world fuzzz, sloppy and uncertain or is it perfect? Are concepts A Priori or derived?

    Quoting further:
    This is the essential error that Wittgenstein points out in his later work. There is no single shared meaning that we all can describe in our different ways. To believe so is to believe that a meaning exists A Priori and that language is just our means of describing it. Instead Wittgenstein turns it on its head and says, meaning is nothing more than the way a word is actually used by people.

    The post then goes on to describe ways in which his comments are reflected in applications of RDF. Danny Ayers adds a comment to the post which says:
    ...the vast majority of software in use today is based on similar conceptual approximations, yet somehow manages to be useful.


    What is a planet ?

    The debate has been around since Copernic and before ...
    The claim Friday that a 10th planet has been discovered in our solar system has set off a fresh round of debate and international talks aimed at defining the most vexing term in astronomy: the word planet.
    Bottom line : The more you know about a subject, the trickier it is to define.

    [2015-02-23] Just in case you were not aware of it, the follow-up of this story has been the redefinition of planet in 2006, by which Pluto is no more a planet, but a dwarf planet. A white horse is not a horse...


    Fun with Hubjects

    What happens when you are standing in the shower at oh-dark-thirty in the morning and you start thinking about hubjects? It goes like this.

    A hubject is the result a phonetic accident when two memes, subject and hub, have a translocation error performed on them. This accident is part of what we now call directed evolution. By contrast, the philadelphia chromosome, known to be behind several cancers, the most prominent being chronic myelogenous leukemia, is a translocation error, not thought to be directed, between chromosomes 9 and 22. That error splices part of 9 with part of 22 into the famous BCR-ABL splice, characteristically referred to as "ph+" (because it was the first cancer gene discovered -- in Philadelphia -- following Watson&Crick). But, there remains the other parts which did not become famous, but which also get together. A dissertation at UCLA showed that object to be benign.

    So, what's that got to do with hubjects? That's what hits when you've got soap in your eyes. In America, we have this whole thing about suburbs. Stay with me here; don't try to guess where this is going. We speak of living in the 'burbs. Could we say that a SubjectProxy (aka: Topic) living in a topic map is, um..., living in the 'bubs? Only if we had a different phonetic accident on the same memes and came up with, brace yourself, sububs.

    You know, we can do that. Language is the longest running open source project in the entire universe. We can make up names for things till the cows come home, and beyond. At the core, however, the identity of the subject remains the same. Go figure.

    We Are the Web

    Jack pointed this to me, certainly to temper the mood in my previous post. Not sure we wanted that, but certainly it's happening.
    What will most surprise us is how dependent we will be on what the Machine knows - about us and about what we want to know. We already find it easier to Google something a second or third time rather than remember it ourselves. The more we teach this megacomputer, the more it will assume responsibility for our knowing. It will become our memory. Then it will become our identity. In 2015 many people, when divorced from the Machine, won't feel like themselves - as if they'd had a lobotomy.


    Seeking sustainable IT (not yet desperately, but still ...)

    Underlying recent debates I've been involved here and there was a similar question : "Who cares about yet another language specification?". I found myself answering quite at the same time "Yes, please" on one side, and "No, thanks" on the other. And in this latter case, I was striken by a remark from Jim Mason - whose background in standard matters makes me always consider very carefully whatever he brings.
    Let's face it. We're building these things for ourselves, and they're proliferating because we have fun doing it.
    And any sofware vendor or consultant around could have added : " ... and because we hope to sell more technology build on top of it."
    It made me wonder about the relevancy of some of the implicit assumptions which pushed me into Knowledge Engineering quite a while ago, and which I explicited at some point in a nutshell as : "Knowledge is sustainable information". Information is consumable and volatile, will be tomorrow at best redundant, at worst obsolete. By opposition, knowledge is supposed to be sustainable and building up with time. The more knowledge you have already gathered, the more you are likely to transform new information into more knowledge. So I envisioned "Knowledge Technologies" (KT) as another name for "Sustainable IT", along the lines of the European IST program spirit : "From Information Society to Knowledge Society". All of it was supported by my background, which made me consider Maths as the most impressive accumulation of (sustainable) knowledge ever, after natural languages of course.
    Right opposite to this cumulative and patient build-up of knowledge we see in Maths and Science, proliferation of technology for the sake of it is clearly anything but sustainable. And actually, the current trend in which languages, software and hardware are all tied up in technological packages leads to this annoying conclusion that languages and specifications are bound to follow the same kind of "product life cycle" logic than their supporting software and harware. If Knowledge Technologies keep up following such a track, they clearly more belong to the technology-for-the-sake-of-it market logic than to sustainable knowledge building.
    A very pernicious trend indeed, for many obvious reasons. Beyond the sheer issues of managing semantic interoperability between current, past and future languages and formats, lost of critical knowledge and data embedded in obsolete formats (see e.g. the Pionneer Anomaly), there are the human aspects of it : playing with a language is always more or less formatting your way of thinking. Dealing with too many different languages is difficult and confusing. Concepts are embedded in their representations, so considering language product life cycle means also considering concept life cycle. Not a very pleasant perspective.
    I'm not sure what the requirements for sustainable IT would be. But surely one on them would be considering concepts the same way as life forms, with their needed diversity, fragility, and need for care.


    Mission 2007

    This is maybe the most important KM project around. Objective is the creation of Knowlege Centers in each one of over 600,000 villages in India by 2007.
    From the mission page :
    What is now needed is the launching of a self-propelling, self-replicating and self-sustaining model of ICT for rural regeneration and prosperity [...] The term “knowledge center” was chosen because at the village level there is need for value addition to generic information by converting it into local-specific knowledge [...] The Mission will be top-down in its approach to technological connectivity, but bottom-up in relation to content and knowledge management.
    Watch this space ...


    News Metadata Framework Technical Specification

    News is a critical test-bed for identification and categorization issues. This draft delivered by the International Press Telecommunications Council is really worth a look.

    What is a document?

    Just discovered the excellent Kevin Clarke's Worklog.
    "What is a document" is a quite long and thoughtful entry (this blog is very verbose, many entries look like full papers), certainly relevant to recent debates about "information resource" vs "other resource", and more subtle than the recent resolution of httpRange-14 issue.


    Ambiguity and imprecision

    I jumped on this thread on Forthcoming blog, after a similar exchange with TMRM folks on the same subject of ambiguity. It is deep down in the comments, so here is the quote.

    I think there is two ways to consider ambiguity :

    Way 1. Subjects are ill-defined, everything is fuzzy, nothing can be asserted for sure.
    Way 2. Subjects are well defined, but in many ways, as so many views in/from different frameworks/perspectives.

    Way 1 is good for unformal and cheerful conversation, like the one you used to have in forums, and now blogs, RSS, tagging and the like. But it is IMO pernicious : people either think they agree, though they speak passed each other, or the other way round think they disagree because they have no way to figure if they actually have different viewpoints, or if they speak about different things. Billions of examples available everyday.

    Way 2 is what TMRM and hubjects are about : subjects are ambiguous, contradictory, fuzzy, moving targets, OK. But each view on a subject has better be well defined, and the rules for this definition explicit (perspective disclosed). You know your view is not exhausting the subject, you can explore different views, see if their logics are compatible, if they can play nicely with each other or are too orthogonal for that etc ... So you can agree that you agree or disagree on clear grounds, and go to war if needed, but with crystal clear reasons.


    Meme tracker

    Before coining the term "hubject", I just made sure it was brand new, at least on the Web. All I got ten days ago with this query was some background noise due mainly to mispellings of "subject". Today you get a few hits, but if I judge by the very positive feedback I got so far it should spread. Before the public launch, I had pushed the term to Jack, and he sent me this page by Jon Udell on the O'Reilly Network, pointing to the final comment.
    One of the advantages of coining a word is that you can track the progress of its associated meme. Last fall, in collaboration with readers of my blog, I settled on the word screencast. A couple of months ago it drew 200 Google hits, today the number is 60,000. Screencasting may never have the mainstream appeal of podcasting, a word coined not long before that now draws 8 million Google hits. But the meme is spreading and I can't wait to see where it goes next.
    I can't wait either to see where hubjects go next, so I set this post as a permanent meme tracker. But so far, I can easily track myself the meme expansion. The last one to-date is freshly posted by Danny Ayers as Stuff of the Day, just after a quick mail "intercourse" triggered by Jack. Before that, I had a fruitful exchange with Patrick Durusau and Steve Newcomb, from which it appears that hubjects could be considered as no more no less than possible technical implementations of the "subject proxies" defined by the TMRM. Steve is even considering the introduction of hubjects in his Versavant implementation. Got also very positive feedback from Phil Tetlow, coordinator of the W3C SWBPD Software Engineering Task Force, already mentioned in those pages. More to come ...


    The Wheel and the Hub

    Jack was asking for graphics. This is the best I could find to illustrate the metaphor in this latest version of my thoughts about hubjects. I like this image both for its sheer graphical quality, and for the fact that only the hub and spokes are visible. The wheel itself you can only guess.

    Note : The Wheel and the Hub is now published under Mondeca namespace, including logo and copyright. The new URL is http://www.mondeca.com/content/download/455/3434/file/hubjects.pdf


    Introducing Hubjects

    Hub + Subject = Hubject.

    This paper is a rough first cut of an introduction
    inspired by Chapter 11 of the Tao-te-King
    Thirty spokes share the wheel's hub;
    It is the center hole that makes it useful.
    More to come, including examples and graphics.


    httpRange-14 issue "Resolved"

    The W3C Technical Architecture Group announced on Saturday that the issue httpRange-14 was "resolved" (sic). Very weird resolution indeed, which links the kind of resource an http URI identifies ("information resource" or "any resource") to the type of answer to a GET request (2xx, 303 or 4xx). On the TAG list and on his blog Jan Algermissen wonders about the impact of such a decision, listing a few examples of application, and concluding with a good question indeed.
    Question: Who is going to mint and maintain all the URIs to talk about dogs?


    Ontology Definition Metamodel

    Certainly a major step towards semantic interoperability of Topic Maps, RDF, OWL, UML, Common Logic ... Really worth downloading and reading at length.


    Blank Nodes continued

    Steve Newcomb took the time to make an excellent comment on my previous post. Actually, this comment would have deserved a full post, and I remind my old friend that he has a permanent invitation to appear here as a contributor, and I would be very honored if he could join. That said, Steve's viewpoint is much closer to mine that he seems to think.
    SRN : I insist that subjects do have identity, but only within contexts -- within universes of discourse.
    I did not write something very different when I wrote that only representations can have identity. Maybe I should have put it slightly differently, and I'm sure Steve will agree with this other way to put it : Whatever the subject, it has neither absolute identity, nor absolute definition, nor absolute property of any kind, that would be valid in any context. Identity, and all properties bound to this identity, is always conferred through a representation, itself defined inside the context of some representation scheme, and making sense only in the framework of this scheme.
    SRN: But I don't see how it's meaningful to say that a proxy is not a proxy for something in particular. By its very nature, a proxy is always a proxy for something in particular.
    Well here I think I disagree with Steve, if "something" is to be understood as "some thing". My view on this has always been that, for all practical reasons, it's the first act of "proxyfication" which brings the subject into existence, as a subject of conversation. But this debate is not really important, and we can proceed from here to the notion of subjects as blank nodes, with or without agreement on the separate existence of subjects. The original point of the debate set by Alistair and Dan was to know how to express in an efficient and meaningful way the fact that two or more representations in different schemes are somehow proxies for the same subject. My point was that this could be captured by something quite similar to an RDF blank node, lets' call it a Subject Blank Node, bearing no absolute identity, and of which only properties could be : represented this way here, and that way there.

    The way I see it, a Subject Blank Node would have no logical property per se. It would not be part of any representation scheme, but would provide a hub between various schemes. Such a hub would allow applications able to make sense of several representations schemes, each with their specific structure and logical rules, to aggregate information from different schemes, such as a Topic in a TM application, a class in an OWL ontology, a concept in a SKOS scheme, a category in dmoz, a page in Wikipedia, a term in Wordnet, or a picture by Van Gogh, or a Nocturne of Chopin.


    Ontology Mapping, Ineffable Subjects and Blank Nodes

    In this thread on SWAD forum, Alistair Miles and Dan Brickley re-activate an old issue : How do I express that resource X in representation scheme A (e.g. a SKOS concept scheme) and resource Y in representation scheme B (e.g. an OWL ontology) are somehow representations of the same (----) . After suggesting a suboptimal Topic Map solution I suddenly yesterday came out with the idea that in RDF, blank nodes could be a killer solution. Actually one can use blank nodes to aggregate various representations of whatever, keeping agnostic on what this whatever is. Using blank nodes to represent "ineffable subjects" is cool, since nobody is able to say anything directly about them (asserting name, type or any other property), since they have no URI. Put it together with recent debate on ISO SC34 mailing list about subject locators, and consider this provocative conclusion : RDF blank nodes are better than TM topics at representing subjects, since, and this is my last thought, subjects have no identity, only representations have one. Subjects have no identity, read no type, no property at all. Resources have identity (URIs), so the best attempt to indicate a subject is to gather various resources in a blank node, as so many fingers pointing towards the moon.
    Remember in the Topic Maps book, I wrote about an empty subject indicator ...


    Maybe ontologies aren't overrated after all...

    When Clay Shirky posted his now famous "Ontologies are overrated" paper, (see also here) a relatively new conversation started. Slipping in the sidelines, however, massive creativity continues. For instance, look at this page at del.icio.us where a small ontology of file types is used to refine the del.icio.us tags.

    A comment to that post points to here, a page which enumerates things going on either for, around, or inspired by del.icio.us. One such link points to sid.vicio.us where a list of OWL ontologies exists, one of which is liberal.owl. rdf:about and rdf:resource attributes are links into del.icio.us content.


    Bloom filters

    From Wikipedia:
    The Bloom filter, conceived by Burton H. Bloom, is a space-efficient probabilistic data structure that is used to test whether or not an element is a member of a set. False positives are possible, but false negatives are not. Elements can be added to the set, but not removed (though this can be addressed with a counting filter). The more elements that are added to the set, the larger the probability of false positives.
    A good list of papers about applications at the end of the article, including P2P networks. Certainly could be applied to automatic aggregation of Topic Maps too. To be compared with methods of Subject Identity Measure already mentioned.


    Stumble Upon

    Stumbled upon this community tool yesterday. Quite amazing mix of FOAF and Bookmark sharing. I think Jack will love it, and become a stumbler too. The link on the side bar is to my personal stumble node.

    Are "subject locators" bogus?

    Patrick Durusau, in the title post on Topic Maps ISO/IEC SC34 list, questions the notion of "subject locators" as defined by TMDM. His point is that through the network you never retrieve a resource, only some representation of it, depending on many things, including the global state of the client-server system at retrieval time, the state of the resource itself etc. Patrick quotes excerpts from the Thomas Fielding dissertation supporting such a view:
    The early Web architecture defined URI as document identifiers. Authors were instructed to define identifiers in terms of a document's location on the network. Web protocols could then be used to retrieve that document. However, this definition proved to be unsatisfactory for a number of reasons ...
    I tend more and more to agree with Patrick that this distinction TM make between "subject identifiers" and "subject locators", IOW between "subject indicator references" and "resource references" is certainly something to revisit. More on the thread ...


    Planet Identity

    I did a quick google on this site to ensure someone didn't already mention this aggregation of blogs.
    Planet Identity is an aggregation of public weblogs related to Identity Management. The opinions expressed in those weblogs and hence this aggregation are those of the original authors.

    If you're not sure whether something is ambiguous, it is

    Found that while browsing the ESW Wiki, in a page called GoodURIs. Food for thought is to be found around that one, about topics such as UniversalNames, or DefineYourTerms.


    Semantics@ IVOA Forum

    I've already mentioned IVOA here a while ago. I've jumped since a few days in a lively debate on IVOA semantics forum, including this thread, where ontological status of observation, event, object, etc. are discussed in-depth. As already mentioned, definition, identification and classification of an unbound quantity of "objects", among the tremendous flow of data carried by all wavelenghts of light over distances ranging from a few kilometers to billions of light-years, and captured by so many instruments and stored in so many data bases ... is one of the widest, oldest and most fascinating challenges in Knowledge Management. Not to mention the task of making sense of all that stuff put together ...


    Semantic Elephant

    "Part 1 : The Elephant is Real". The first of a series by Jeff Pollock of Network Inference about the still unknown, but certainly happening, "Semantic Convergence". To be continued ... comments when I've read the whole series.


    Open ID

    Just discovered that stuff developped by Danga. Not sure I understand exactly how it's supposed to work.
    An OpenID identity is just a URL. You can have multiple identities in the same way you can have multiple URLs. All OpenID does is provide a way to prove that you own a URL (identity). And it does this without passing around your password, your email address, or anything you don't want it to. There's no profile exchange component at all: your profiile is your identity URL, but recipients of your identity can then learn more about you from any public, semantically interesting documents linked thereunder (FOAF, RSS, Atom, vCARD, etc.).


    To Tag or Not to Tag, That Is the Question

    For those who feel some overdose of hype in recent posts about folksonomies, folksologies, tagging and the like, this is a useful antidote (sort of) by John C. Dvorak in PC Magazine

    Enter yet another more baffling attempt at tagging. This one is fascinating since it's been gussied up with a new name, and for some unknown reason been given the blessing of a bunch of brain-dead bloggers. This is because a few of the favorite sites that the bloggers love have tacitly approved of the so-called—get this—"folksonomy tags." Oh, a new term! This one is a laugh riot, since there is nothing new here except the new name: Folksonomy. I mean even in HTML there was the "metatag."

    No, no. This is different because, uh well, uh, lemme think. It just is!

    The current fave sites amongst the cognoscenti have adopted the idea of public tags, and a number of influential bloggers have jumped on board pumping up the concept and re-promoting that old rusty saw, "the semantic Web." The semantic Web is a dead duck, let me assure you.

    ... and so on.

    Piggy Bank

    Piggy Bank is an extension to the Firefox web browser that turns it into a “Semantic Web browser”, letting you make use of existing information on the Web in more useful and flexible ways.
    Not tried it yet, but sound interesting. Based on so-called folksologies, as explained in accompanying blog, Stefano's Lynotype.


    Identification in Big Science

    Quite opposite to the friendly tagging we've been considering lately, Big Science projects are heading towards carefully engineered categorization and identification frameworks, aiming at interoperability and sharing of raw data. In Astronomy I've already mentioned here (and if not, I should have done so long ago) the International Virtual Observatory Alliance which has defined a standard format for identifiers.
    An IVOA Identifier is a globally unique name for a resource. This name can be used to retrieve a unique description of the resource from an IVOA-compliant registry. This document describes the syntax for IVOA identifiers as well as how they are created. An IVOA identifier has two separable components that can appear in two equivalent formats: an XML-tagged form and a URI-compliant form. The syntax has been defined to encourage global-uniqueness naturally and to maximize the freedom of resource providers to control the character content of an identifier.
    In Life Sciences domain, the Object Management Group proposes Life Science Identifiers Specification.
    This specification addresses the need for a standardized naming schema for biological entities in the Life Sciences domains, the need for a service assigning unique identifiers complying with such naming schema, and the need for a resolving service that specifies how to retrieve the entities identified by such naming schema from repositories.
    Interestingly, LSID uses urn schemes, but specifies resolving mechanisms.

    Social Bookmarking Tools

    Further commentary to my previous post, the link under the title points to a review of social bookmarking tools.

    Because, to paraphrase a pop music lyric from a certain rock and roll band of yesterday, "the Web is old, the Web is new, the Web is all, the Web is you", it seems like we might have to face up to some of these stark realities [n1]. With the introduction of new social software applications such as blogs, wikis, newsfeeds, social networks, and bookmarking tools (the subject of this paper), the claim that Shelley Powers makes in a Burningbird blog entry [1] seems apposite: "This is the user's web now, which means it's my web and I can make the rules." Reinvention is revolution – it brings us always back to beginnings.

    John Udell; delicious; language evolution

    Click on the title and you'll hear and watch John Udell walk you through using "delicious". It's a lucid exposition of how tags and tagging contribute to language evolution. Tagging, as Bernard suggests in his previous univers immedia post here, appears to have great merit. John Udell's page ends with a thoughtful commentary on the relationship between tagging and Steven Pinker's writings on language.

    Consider this: language is the longest-running open source project on this planet.


    Ontology is Overrated: Categories, Links, and Tags

    Clay Shirky has gathered in this page two presentations at recent conferences, making the case for unformal, bottom-up, folk-edited categorization of Web resources vs formal ontologies.
    What I get from this very clear and intelligent paper is the notion that, in the open Web, efficient semantics are likely to emerge from free tagging, more efficient indeed than those built in pre-defined well-thought ontologies. It goes with the experience of my few past years of development of ontologies and constrained topic maps : very efficient for intranet and corporate environments, they will give poor results on the Web at large.


    Source Codes for Subjects

    Conal Tuohy recently pointed the topic maps mailing list to his TM4J-driven website The New Zealand Electronic Text Centre. Roaming about there, I discovered a link to MADS (Metadata Authority Description Schema).
    The Library of Congress' Network Development and MARC Standards Office, with interested experts, has developed the Metadata Authority Description Schema (MADS), an XML schema for an authority element set that may be used to provide metadata about agents (people, organizations), events, and terms (topics, geographics, genres, etc.).

    MADS points to the Source Codes for Subjects (link below the title here). Some of the codes are:
    "Asian American Studies Library subject headings" in A Guide for establishing Asian American core collections. (Berkeley, CA: Asian American Studies Library, University of California, Berkeley)
    AAT: Art & architecture thesaurus (New York, NY: Oxford University Press)
    Autoridades de la Biblioteca Nacional de España (Madrid: Biblioteca Nacional de España)
    AGRIFOREST-sanasto (Helsinki: Helsingin Yliopisto)
    AGROVOC multilingual agricultural thesaurus. (Rome: APIMONDIA)


    Identity as a "Pattern of Information"

    I just received this today from E-VERSE Radio:
    "According to Greek legend, Poseidon's son Theseus sailed to Crete to slay the monster Minotaur. After his triumphant return to Athens, his ship was preserved as a memorial. As the vessel aged, decaying planks were replaced with new ones; eventually, all the original timber was replaced. Philosophers know the story of Theseus's ship as a classic example of the problem of identity. What was the true identity of the ship, the shape or the wood? A more contemporary example may be found in the form of my first car, a 1966 Ford Mustang with a 289-cubic-inch engine and a speedometer that pegged at 140 m.p.h. As a young man high in testosterone but low in self-control, by the time I sold the car 15 years later there was hardly an original part on it. Nevertheless, my '1966' Mustang was now considered a classic, and I netted a tidy profit. Like Theseus's ship, its essence — its Mustangness — was intact. The analogy holds for human identity. The atoms in my brain and body today are not the same ones I had when I was born. Nevertheless, the patterns of information coded in my DNA and in my neural memories are still those of Michael Shermer. The human essence, the soul, is more than a pile of parts — it is a pattern of information." – Michael Shermer
    The idea of a pattern language has been revisited once more by Christopher Alexander in his fourth book, The Luminous Ground: The Nature of Order, which is described as presenting "a new cosmology that arises from the careful study of architecture and art, and above all from the practice of the arts. It is a cosmology which places the I, our experience of self, as the linking stem that unites each individual with the whole, connecting consciousness and matter," and suggests that it is human interpretation (a necessarily contextualized process) that provides us with a sense of identity, not anything inherent within the ever-changing cosmos.


    Situation and Identity

    Danny Ayers' blog points to a paper which is about ontology driven architectures, and which includes the link below the title here. In fact, that paper links to several papers appropriate to this blog. I see parallels between the Situation and Identity paper and the topic maps Reference Model that Bernard pointed to in the previous post.

    This paper examines the notions of situation and identity on the semantic web. The authors define how identity and situation apply to the semantic web, and present methods for using Inverse Functional Properties to utilise these definitions. We present the notion of a Composite Inverse Functional Property in order to exploit the structure of data for identification, and show how these can be used to apply context specific identification.

    That, to me, sounds like conferred identity, using terms from the RM. It also reminds me of the seeing as post I did here much earlier. I'm not making any value judgements on the content of the paper; rather, I think it to be grist for a lot of group think.


    New deliverables

    The new version of Topic Maps Reference Model is definitely putting subject identification as the core common feature of Topic Maps Applications. It acknowledges in the informative Annex A:
    The problem of "subject identity" has recently been recognized as more difficult than previously thought by proponents of the Semantic Web.
    This Annex also mentions several interesting papers already mentioned here in various previous posts. Meanwhile, the SWBP RDFTM Task Force has delivered a rich Survey of Interoperability Proposals with a quite exhaustive presentation of the identity issue.

    I've been playing lately with SWRL, wondering how it could be used to express subject identification rules. We have poked enough lately with the notions of context and protocol of identification to think about going from those qualitative general considerations to something more effective like a "Subject Identification Rule Language" able to capture complex rules of identification including declared or computed properties of subjects as well as context elements.


    Categorization before identification

    Over at sciencedaily.com today, I noticed an article on how the brain identifies objects which come into view (visual recognition begins with categorization). I am tossing this out for general consideration. Here's the requisite quote:
    "There are two main processing stages in object recognition: categorization and identification, with identification following categorization," the authors wrote. "Overall, these findings provide important constraints for theories of object recognition."


    Is Identity Contextualized?

    In a recent post on the Ceryle blog I commented on an article by Katharine Mieszkowski of Salon.com called Steal this Bookmark!, which is basically about the emergence of grass roots ontologies online as used on websites like 43 Things. It might seem somewhat of a stretch for me to describe 43 Things thusly, but I think it's accurate, probably more accurate than much of the use of the word "ontology" within the Knowledge Representation field. 43 Things becomes a map of what people know about a series of subjects as expressed in common language. It's not perfect, the language is muddy, but there's no pretended formality either. As someone says in Mieszkowski's article, "It's more the simplest thing that could possibly work, that shouldn't work, but happens to."

    My blog entry was mostly about grass roots or informal ontologies, which I think will succeed where the "Semantic Web" will fail, not so much in delivering the goods to its paying benefactors (such as DARPA and other large government and corporate entities), but in actually having any real impact on the Web as used by the Rest of Us.

    The Web community has always developed its own technologies, almost in spite of the W3C, and this is only reinforced by the open source movement. There's no particular reason why RSS or Atom or other new Web technologies need to be based in RDF, it's just a convenient (cough) XML graph syntax. GXL or XTM would do just as well, maybe even better. I've long believed people's enthusiasm for RDF is simply a misplaced enthusiasm over graphs. To those long bound to hierarchies and tree structures, graphs seem very cool, like the Che Guevara of mathematical structures. They're more like the way of the world inside and outside our heads. Some people get very passionate about such things. Others like to watch golf on TV, so go figure?

    Anyway, apart from my normal ranting I closed with mention of two issues near and dear to my own research: identity and context. I note that univers immedia has a reference to Chris Welty and Nicola Guarino, both of whom have done some excellent work on the former. Patrick Brézillon has for a number of years been leading conferences that focus on the latter, and maintains a web page about his work on context. While much of the Semantic Web stuff I find almost nonsensical in its almost complete absence of issues of epistemology, identity and context, these guys have been doing some very important work for many years. I don't think we could underestimate the important of Brézillon's conferences in pushing the issue of context into the mainstream.

    One of the things that I'm pretty convinced of is that everything is contextualized, even identity (I won't quote the first two chapters of the Tao Te Ching). So where in the Topic Maps models we always talked about the notion of some kind of fixed identity point around which we hung Topic characteristics, if that identity is itself fluid (i.e., contextualized by any of a myriad of factors, human and not), it doesn't exactly break the model, but it makes it a lot more complex, perhaps more capable of modeling real life. For those of you who speak XTM natively, we'd just need to add an optional <scope> element to the content model of <subjectIdentity>. But there's probably a way to do this without mucking with the XTM syntax.

    I've been digging around in the philosophical/epistemological literature (e.g., [1], [2/3], [4/5], [6]), trying to find that Copernicus-in-the-bathtub experience (no, not that one, the other one) on how identity and context mesh. It seems sometimes the more I dig the more complicated the issue becomes, and unfortunately my research domain isn't theoretically in philosophy (at least that's what my advisors keep advising me — they hope I'll actually finish my dissertation one day). The pile of books keeps getting higher.
    The Penumbra said to the Umbra, "At one moment you move: at another you are at rest. At one moment you sit down: at another you get up. Why this instability of purpose?"

    "Perhaps I depend," replied the Umbra, "upon something which causes me to do as I do; and perhaps that something depends in turn upon something else which causes it to do as it does. Or perhaps my dependence is like (the unconscious movements) of a snake's scales or of a cicada's wings. How can I tell why I do one thing, or why I do not do another?"
    -- Chuang Tse, (trans. Lin Yutang)
    which kinda sums up my own experience lately...


    The Concept of Subject in a Semiotic Light

    The linked paper is by Jens-Erik Mai, whose publications can be found here. Personally, I recommend studying his dissertation. At various times in the past, I have connected a C.S. Peirce scholar, Mary Keeler, to Steven Newcomb, one of the founders of the topic maps paradigm (among other important contributions). What we get from that coupling is the realization that there is, indeed, a semiotic aspect to the nature of subjects. From the linked paper:
    One of the key functions of library and information services is to provide access to information based on users' requests for knowledge. Knowledge can be stored in a wide range of information bearing objects such as text, image, sound, multimedia, and as technology develops more people gain access to the objects, through different media. We will here analyze the processes and problems associated with determining the subject matter of an information bearing object.


    A Formal Ontology of Properties

    Christopher Welty just sent me the pointer to an excellent paper he presented with Nicola Guarino at the 12th International Conference on Knowledge Engineering and Knowledge Management EKAW-2000.

    Clean and clear introduction to difficult issues, very formal but at the same time providing a very practical framework for sound ontology engineering.


    Identity, Context, and everything...

    The link is to a lively thread, jumping into the middle, where Peter P. Jones opens the floodgates on a discussion about context. No point in doing all the obligitory quotes here. Just go read the thread. But, hey, two quotes do stand out, following Peter's thesis on context as geometry.

    Dennis E. Hamilton:
    There's a book called "Metaphor and Reality" by Phillip Wheelwright that has a tangential bearing on this topic. The phrase that sticks in my mind is this: "machines have contexts, people have perspectives."
    and Murray Altheim:
    I'm not sure if you're aware, but there's a whole subdomain within the Knowledge Representation/Structures community devoted to issuesof context, which seems to be headed up by Patrick Brezillon.

    Cybernetics and Conversation

    Previous post from Jack makes me re-visit old tracks about subject identity dynamically emerging and changing through conversation. In the last section in my Published Subject Indicators chapter in the Topic Maps Book, I wrote a few years ago:
    The best PSI is the one that is most likely to change its content because it is maintained at the core of the community questioning the subject, and most subjects are moving targets.
    Seems that Wikipedia pages are exactly that : places where living subjects are continually emerging through conversation. Googling conversation + identity, I stumbled on the title page written back in '96 by Paul Pangaro
    The piece attempts to capture, in every-day language, the breadth of Conversation Theory as purveyed by Gordon Pask. Although it was not explicit in the publication, a sub-title could be, "Conversation Theory in Two Pages."
    The opening line reads : "Without conversation, there is nothing (no thing)"


    Wikipedia URLs a Subject Codes

    The link is to David Megginson's blog. This struck me as terribly interesting.
    Over in my aviation weblog, I find myself more and more linking to Wikipedia whenever I’m discussing a concept, person, place, or anything else that doesn’t have its own, canonical home page. If, as I suspect, lots of other bloggers are doing the same, then links to Wikipedia articles may soon be the blogsphere’s answer to subject codes.
    The idea follows something similar from James Tauber, who points to a tagging scheme from Technorati. In the end, it seems that subject identity lies in the realm of concensus or agreement; Wikipedia appears popular enough that its URLs might serve at least one important aspect of the subject identity issue.


    Situation and Identity

    Another paper mined from already quoted page about Semantic Web Software Engineering Task Force. Under the general title comes a very technical subtitle
    "A Generalisation of Inverse Functional Properties" which does not speak much to people not aware of arcane details of OWL terminology. "Inverse Functional Properties" should be better named "Identifying Properties", since, according to OWL semantics, two individuals sharing the same value for such a property must be considered as the same one. The abstract says:
    This paper examines the notions of situation and identity on the semantic web. The authors define how identity and situation apply to the semantic web, and present methods for using Inverse Functional Properties to utilise these definitions. We present the notion of a Composite Inverse Functional Property in order to exploit the structure of data for identification, and show how these can be used to apply context specific identification.

    Context specific identification is really the concept to highlight here.

    Object Co-identification on the Semantic Web

    Phil Tetlow and Jeff Z. Pan have published for the new SWBPD Task Force on "Semantic Web and Software Engineering" a page called "Relevant Topics, Publications and Validated Ideas". Among those, the title paper reviews in details identification issues and relevant technologies: names, keys, and more interesting, heuristic methods based on probabilistic approach.

    How many "person" concepts in the Semantic Web?

    Need to define "Person"in your local ontology? Unless you want to add yet another identifier for this concept, your contribution to the Semantic Mess, go ask Swoogle Ontology Dictionary and pick up your choice. At first ranks come with no surprise classes from FOAF http://xmlns.com/foaf/0.1/Person and WordNet http://xmlns.com/wordnet/1.6/Person.
    But there are about 400 other SW resources, classes and properties using the same name. Food for thought ... Why so many already? What about re-usability? What about aggregation of data and federation of knowledge? (see previous post) How many of those resources declare equivalence with other ones? And how many are actually used?


    Semantic Web Ontologies: What Works and What Doesn't

    Google's director of search quality discusses challenges of automation, knowledge, spam, and even politics. One among a series of interesting articles of AlwaysOn, excerpted from SDForum's Semantic Technologies Seminar, cohosted by AlwaysOn, TopQuadrant, and Enterprise Architect.

    Glossary of terms relating to thesauri and other forms of structured vocabulary for information retrieval

    Confusion often arises when different people use terms to mean different things. This list is based on definitions drawn up by a group of four consultants who specialise in the development and use of thesauri and other forms of structured vocabulary for information retrieval, Stella Dextre Clarke, Alan Gilchrist, Ron Davies and Leonard Will. The authors do not claim that these definitions are "correct" and that other meanings are "wrong", but recommend these definitions as being a consistent and well-defined set which will aid communication by encouraging everyone to use the same words with the same meaning.


    John Sechrest, on the bluoxen.org mailing list, had this to say about LID:

    a) It is distributed and gives personal control over access
    b) It is available now
    This should be compared to the Identity Commons I-Names that I mentioned here.


    Identity Management Webcast

    This is an online webcast. Don't know that it costs anything except all the personal information they can wring out of you to sign up for it.

    In this webcast you will learn how identity management is helping organizations to meet their key business initiatives by driving new online revenue opportunities, enabling a company to securely extend business beyond its four walls, and helping corporations to mitigate risk while complying with new regulations such as Sarbanes Oxley.

    Topics to be Covered:

    > Identity Management defined and the stages of a successful deployment
    > The business drivers and benefits of Identity Management
    > Provisioning and Access Management technologies
    > Selecting the right Identity Management vendor
    > Do's and Don'ts - Deploying an Identity Management solution

    Latest Topic Maps Reference Model

    The latest opus by Steven R. Newcomb and Patrick Durusau, dated 2004-11-07, is found at the link. The document includes definitions of subject identity properties, which, in some sense, are the subject of this weblog.