Two cents of (natural) intelligence

Several months ago, my previous attempt to speak here about artificial intelligence, wondering if computers could participate in the invention of language, met a total lack of feedback (it's not too late for second thoughts, dear reader). I found it quite frustrating, hence another attempt to venture on this slippery debate ground.
+Emeka Okoye in the follow-up of the previous post on facets makes strong points. When I wonder how much intelligence we want to delegate to machines, and for which tasks, the answer comes as a clear declaration of intention.
We are not delegating "intelligence" to machines rather we are delegating "tasks" ... We can have a master-slave relationship with machines ... We, humans, must be in control.
I appreciate the cautious quote marks in the above. But can it be that simple? Or just wishful thinking, as +Gideon Rosenblatt is warning us in a post entitled Artificial Intelligence as a Force of Nature. The connected machines ecosystem, distributed agents, neuronal networks and the like, are likely to evolve into systems (call them intelligent or not is a moot point) which might soon escape, or has already escaped if we believe some other experts on this topic, the initial purpose and tasks assigned by their human creators, to explore totally new and unexpected paths. This hypothesis, not completely new, is backened here by a comparison with evolution of life, of which the emergent ambient intelligence would be a natural (in all meanings of the term) follow-up.

But evolution of technologies, from primitive pots, knifes and looms up to our sophisticated information systems, is difficult to compare to the evolution of life and intelligence. The latter is very slow, driven by species selection on time scales of millions of years, spanning thousands of generations. Behind each success we witness, each species we wonder how it perfectly fits its environment, are forgotten zillions of miserable failures which have been eliminated by the pitiless struggle for life. Nothing can support the hypothesis of an original design and intention behind such stories.
It's often said, like in this recent Tech Insider article, that comparing natural and artificial intelligence is like comparing birds to planes. I agree, but this article misses an important argument. Birds can fly, but at no moment did Mother Nature sat down at her engineering desk and decided to design animals able to fly. They just happened to evolve so over millions of years from awkward feathered dinosaurs, jumping and flying better and better and we now have eagles, sterns and falcons. On the contrary, planes were from the beginning designed with the purpose of flying, and in barely half a century they were able to fly higher and quicker than the above natural champions of flight.

To make it short, technology evolves based on purpose and design, life (nature) has neither predefined purpose nor design. Intelligence makes no exception to that. Natural intelligence (ants, dolphins, you and me) is a by-product of evolution, like wings and flight. We were not designed to be intelligent, we just happened to be so as birds happened to fly. But computers were built with a purpose, even if they now behave beyond their original design and purpose, like many other technologies, because the world is complex, open and interconnected.

Let's make a different hypothesis here. Distributed intelligent agents could escape the original purpose and design of their human creators, maybe. But in such a case, they are not likely to emerge as the single super intelligence some hope and others fear. Rather, like the prebiotic soup more than three billions years ago, its spontaneous evolution would probably follow the convoluted and haphazard paths of natural evolution, struggle for survival and the rest. A recipe for success over billions of years, maybe, but not for tomorrow morning.


Rage against the mobile

The conversation around the previous post about facets led me to investigate a bit more about mobile, and what it means for the web of text. This is something I'd never really considered so far, and thanks to +Aaron Bradley for attracting my attention on it. Bear in mind I'm just an old baby-boomer who never adopted mobile devices so far, touchscreens drive me crazy, and I still wonder how people can write anything beyond a two words sentence on such devices etc. To be honest I do have a mobile phone but it is as dumb as can be (see below). It's a nice light, small object, feeling a bit like a pebble in my pocket but I actually barely use it (by today standards), just to quick calls and messages. Most of the time I don't even carry it along with me, let alone check messages, to the despair of my family, friends and former colleagues. But they eventually get used to it.

To make it short, I do not belong to the mobile generation, and my experience of the Web has been from the beginning, is, and is bound to remain a desk activity, even if the desktop has become a laptop along the years. I'm happy with my keyboard and full screen, so why should I change? And when the desk is closed, I'm glad to be offline and unreachable. I wish and hope things can stay that way as long as I'm able to read, think and write.

With such a long disclaimer, what am I untitled to say about mobile? Only quote what others who seem to know better have already written. In this article among others I read about the so-called mobile tipping point, this clear and quite depressing account of the consequences of mobile access on Web content.
The prospect for people who like to read and browse and sample human knowledge, frankly, is of a more precipitous, depressing decline into a black-and-white world without nuance [...] The smaller screens and less nimble navigation on phones lend themselves to consuming directory, video, graphic and podcast content more easily that full sentences. If the text goes much beyond one sentence, it is likely to go unread just because it looks harder to read than the next slice of information on the screen. [...] Visitors who access information via a mobile device don’t stay on sites as long as they do when using a desktop computer. So if you’re counting on people using their smartphones or tablets to take the same deep reading dive into the wonders of your printed or normal Web page messages, you’re probably out of luck.  
Given the frantic efforts of Web content providers to keep audience captive, all is ready for a demagogic vicious circle of simplification. Short sentences, more and more black-and-white so-called facts. If this is where the Web is heading to, count me out. I won't write for mobile more than I use mobile to read and write.

I still have hope, though, looking at this blog analytics. Over 80% of the traffic seems to still come from regular (non mobile) browsers and OS. But I guess many of you visitors have also a mobile (smart) phone you otherwise use. I wonder if and how you manage to balance which device you use for which usage. Are you smart enough to use mobile for apps, and switch to proper desk screens to take the time to read (and write)? I'm curious to know. 


In praise of facets

Follow-up of the previous post, and more on the ways to escape the tyranny of entities in search results. In the quick exchange with Aldo Gangemi in the comments of this post, facets were suggested. I won't argue further with Aldo about facets at BabelNet being types or topics, because he will win at the end, and such a technical argument would lead us astray, far from the main point I wouyld like to make today. You might be uneasy on what facets and particularly faceted search mean, but you have certainly used them many times when searching e-commerce sites, to filter hundreds of laptop models by price, brand, screen size, memory size etc. Libraries, enterprise portals, and many more use faceted search, example below is the search interface of Europeana for "impressionism", the results being filtered by two facets, media type "image" and providing country "Netherlands".

Faceted search is a very intuitive way to search items in a data base. Using faceted search, the user creates at will its own algorithm of filtering, selection and possibly ranking. If you compare with the usual general search engine results, two major advantages appear. The search is multidimensional, and the algorithm is transparent to the user. The system does not apply fancy, smart but opaque algorithms, based on guesses of what the user is looking for. It provides an interface where the user's natural intelligence can be put into action. In short, faceted search provides a good collaborative environment where artificial and human intelligence work together, the former at the service of the latter.

Given the above, one can wonder why general search engines such as Google do not propose faceted search facilities over their results, instead of an unidimensional list of ranked results. A technical answer coming to mind is that such engines do not search items in a collection of objects of which semantic descriptions are stored in a data base, but resources indexed by keywords. That used to be true, but the argument does not seem to hold anymore in the current state of affairs. The Knowledge Graph, however it's implemented, is a data base where things have declared semantic types and properties which could be used for faceted search. It would be a good way to see types and properties defined by schema.org vocabulary put explicitly into action as facets (Creative Work, Person, Place, Event, Business, Intangible ...).

I cannot imagine that Google and al. have never thought about this. There are certainly technical hurdles, but I can't imagine they could not be solved. So I would be curious to hear what they have to say, given that the added value to the search experience would be tremendous. Above all, it would give back to the user the power to define her own filtering on results, and reinstate the habit to do so, instead of the reductionnist Q&A dialogue which in the long run leads to pernicious intellectual laziness, unique thought, and jumping to conclusions without further checking. Our world is more and more complex, and offering simplified and unidimensional answers (presented as facts) to any question does not help to cope with complexity. Current events show us too many examples of oversimplifications and where they lead to.

I think of any of my queries to a search engine as a beam of light sent through the night of my ignorance, where possible answers are hiding as so many complex multi-faceted diamonds. I don't want any one of them, however brilliant and wonderful, make me blind to the point of missing all the rest. Every faceted answer should reflect back a new and unexpected part of the spectrum, without exhausting the question we should always keep alight.


Search is not only for entities

The Knowledge Graph is a great achievement, but its systematic use at the top of search results is sometimes counter-productive. Knowledge Graph nodes are mostly named entities (individuals, particulars) such as people, places, works (movies, books, music tracks), products ... and rarely universals (concepts, topics, common names). And if an ambiguous search sentence can refer to either particular entities or universals, the former seem to always float at the top with their fancy Knowledge Graph display, and relevant results about universals kicked down. The assumption underlying this default behavior is that people search mostly for particular entities (things), not information about some universal (topic). The hijacking of common names as brand names we already pointed here in the past adds to the issue, along with the growing number of work titles using common names. Add to this the magic of the Knowledge Graph knowing entities by various names in different languages, and you end up with examples like the following. 

For a recent post I searched about the Theory of Everything. If instead of going straight to the Wikipedia article I ask Google, here is what I get.

I was searching for information about a theory in physics, and I get all about a movie which happens to have taken as title the name of this theory. And since my browser default language is French, the Knowledge Graph is kind enough to present me the movie under its French adaptation title "Une merveilleuse histoire du temps", which you can imagine even if you don't speak a lot of French, is all but a translation of "Theory of Everything". The silver lining is that if I search for "Théorie du tout" in French, I have not the same problem, since the movie is not known in French under this title which would be the correct translation of the original one. The first result for "Théorie du tout" is the Wikipedia article on this topic, as expected.
You can play the same funny game with "Gravity", "Frenzy" and many more. Given the limited supply of common names, and the exponential growth of named entities in the Knowledge Graph, all tapping into the commons for their names and titles, such ambiguities are likely to end up being rather the rule than exceptions. Search engines should provide a simple way to opt out entities, so that I could ask "Dear Google, give me resources about the topic called gravity, and I don't care about any individual entity with gravity in its name." And yes, Google, you can do it, I'm sure, just take example on BabelNet, where you can sort results by entities, concepts, music, media etc.  A bit of typing goes a long way ...

Why do I write, really?

Teodora Petkova strikes again with her new and tiny (her word) Web Writing Guide. Her savvy recommendations on the Whys and Whats of writing on/for the Web made me wondering if I ever applied any single one of them, and in particular in this blog which has been for years the main place I've been writing and publishing. The rest of my publication track consisting in a handful of conference or journal papers, a chapter in a collaborative book, some of those still published online, but not really written "for the Web". Not to mention hundreds of messages to various community lists and comments on the social Web, but does that really count as Web writing? 
It might be too late and pointless anyway to consider those recommendations, since I have no tangible reason to keep on writing altogether. Retired from business for half a year, not participating any more in discussions of various communities, not even following them, I have nothing to sell or even to give away here. I could as well forever hold my peace, instead of indulging in more wordy selfies. Nevertheless, I'll make the exercise of going through some of Teodora's recommendations, to see if I ever met them. Just for the fun of it.
  • Write for people
Of course, who else? But I've never thought of anyone in particular as the target of what I write here, although I know I write better when I think about a potential reader. Somehow, each post on this blog could (should) be read as a personal letter to some unknown reader. To make it short, I have no market, no target audience. I know I have a handful of more or less faithful followers, and hope the few serendipitous visitors will bring home some food for thought. 
  • Write for machines
Believe it or not, I really don't give a damn about that one. I've been a so-called Semantic Web evangelist because I liked the ideas behind it and the conceptual debates it triggered (not to mention I was also paid for it), but I never applied its technology to this blog. I even did everything to blur the radar of search engines by changing both URI and title several times. No semantic markup either, beyond a few (rather random) tags. I like the idea of those pages being as easy to reach as the places I love in my mountains. Not unreachable, but not much advertised either, with paths not difficult to follow, but not obvious to find either. And actually, since I'm not able to define or name what I am about, I prefer search engines to ignore those pages than indexing them under any silly topic.
  • Write for joy
This is certainly the only recommendation I follow. Nothing to add.

But the Whys are not the main point of difficulty. Regarding the Whats, I must admit I am completely off track.
  • What is it that you really want to say and cannot help but share?
I'm afraid most of the time I don't know before I've finished writing it.
  • What is it that your audience needs?
As said above, I've no audience, and therefore cannot possibly know what it needs. 

If I try to apply the following ... The intersection of the answers to these questions is the answer to “What to write?” Well I won't say this intersection is empty, but it looks rather undecidable.

Sorry, Teodora, but your recommendations are either useless to me, or they lead to the conclusion that I should not write at all before answering the two above. Unless the write for joy is enough of an excuse to keep writing. 

When I was a child, half a century ago, my school teacher (who happened to be also my father, the teachers offer is scarce in village schools) was an adept of the texte libre. This is the writing exercise I still prefer. Following the Freinet pedagogy, the original free production was selected and amended by the group and eventually published in the class journal. The final text was a collective production based on an individual original idea. Does not that sound quite Webby, back in the 1950's, in remote French village schools?


Backtracking signs

This image has been for some years now my avatar on various places on the Web. I've chosen it obviously because it's a nice image taken in my dear mountains, but also as an illustration of what Quine called the inscrutability of reference.
This image is a sign, elle nous fait signe. To each of you, depending on your experience and culture, it will evoke something different and particular  - or nothing at all. But does it only evoke, or does it represent something? Could a machine figure what it is? I would be curious to submit this image to some automatic description algorithm. Would we get something like tracks in the snow in a winter mountain landscape? That would not be bad. If it succeeds in adding several people wearing snowshoes, I would be most impressed. And I would be really baffled if it could guess how many people have passed, and in which direction.

Now let's take it as a support for an exercise in backtracking. Let's move a few steps towards the genesis of this image, trying to figure out its deeper meaning. Someone shot this image on a fair winter day (supposing it's a genuine photograph and not one of those fancy computer-generated graphics). In either case what you are viewing here and now is just a reconstruction on the screen of your device of a pack of bits, a file uploaded to Google servers from my computer, this local file being itself a resized and trimmed copy of an original one generated by a numeric camera. Several copies, deconstructions and reconstructions happened since the original shot.
Now just trust me it's a "genuine" photograph of some "real" landscape, and imagine yourself back at the scene, along with the photographer. Given the point of view, he's certainly on the tracks himself. Does he follow the tracks let by another group of walkers? Does he belong to this group? Is he looking back at its own tracks? Has he followed the same track way up and down, and the several people who seem to have passed here were actually the same person, once walking up and once down, or maybe several times up and down? Whatever. Who could answer those questions now, except the one who shot the image? Days, months, seasons and years have passed since. Later on the same day other walkers have come following the tracks or crossing them and messing the signs. A few days after a new snow fall has erased them all, and in April the winter memories have vanished in the streams joyfully cascading down. And another summer, and another winter. Going back there now won't tell you anything about those tracks, even if the landscape looks quite the same, even if some walker has taken today the same path, letting similar tracks.

But figuring the genesis of the image itself is not the end of the backtracking. I've chosen this image to represent me on the Web, among thousands of possible images. How can you interpret this choice? Is it a track of mine, captured by someone else, a track of someone else taken by me, my own track taken by myself, a far-fetched form of selfie? Maybe nothing of the sort. Maybe I found this image somewhere on the Web and thought it looked like me, someone who walks, and is often no more where you expected to meet him.

I could answer all those questions, but I won't. I'd rather imagine you wondering as you would wonder, hopefully, finding some perfect pebble stone on the seashore, about the long story it silently tells, the slow cooking of rock in the depth of Earth and its upraising over millions of years, the sudden earthquake or storm or the patient bite of ice cracking the rock, the fall off the cliff, the long rolling travel downstream to the sea, the patient work of currents, tides and waves until this unique morning where its glow on the sand have captured your eyes.

Think about it, just every thing is somehow akin to this image of a track or that pebble stone. Telling stories, giving time its depth by linking us to the past as so many threads. Trees and rocks, bowls, clothes, jewels, printed words and texts. And every so-called Web resource. They are not just sitting idly here and now, but are signs worth backtracking.


Do things go wrong, or is it just me?

Consistency seems to be an universal requirement for any account of reality we accept to consider as true. This requirement seems to build slowly in childhood with the acquisition and consolidation of language, along with notions of true and false, and the underlying law of excluded middle, a basis for all rational and scientific accounts of the world. Formal logic and mathematics underlie the growing computational power of our machines, and we also try to make consistent the laws and rules governing our daily life. But whatever the level of formality at which they are used, consistency and truth belong to the realm of discourse. Holding that a discourse is (in)consistent and statements are true or false in the framework of this discourse makes sense and in many cases can be precisely defined and proven by logic. Considering that a statement is true because it seems consistent with reality or at least the state of affairs at hand is more hazardous, but is still useful and is actually the basis for most of our daily decisions. 

But what is more arguable is to consider consistency as a characteristic of the reality itself, independently of any discourse we can have on it. What could that mean? Reality simply is what it is whether we think or speak about it or not, and there is no point in asking if reality is true or false, consistent or inconsistent, all qualifiers which should apply only to statements and discourse. Reality is the state of affairs, the mountain as we experience it, it is not a discourse, even if our discourse is part of it. What can be said true, false, consistent or inconsistent, is that one asserts about this experience. But somehow the experience has those permanent patterns which comfort us in believing that indeed reality is internally consistent and our language can build accounts of it we proudly call facts. Our faith in the internal logic and consistency of reality beyond any account of it has gone as far as considering reality as the embodiment of the discourse of some perfect logos. This metaphysical stance pervades implicitly or explicitly all the occidental thought from Greek philosophy through various avatars of monotheism. We can still track it in modern science, with the quest of the Theory of Everything, which in the mind of some would be a consistent account of no more no less than the thought of God. Of course such a theory should be globally logically consistent, since the creator could not be inconsistent without failing to perfection.

Our philosophy should be more humble. Logic should stay where it came from and belongs, inside language. And when the reality suddenly behave in an unexpected way, inconsistent with those accounts we so far considered as true, instead of thinking first that things have gone wrong, let us admit that it is our account of things which was proven wrong. Things never go wrong, but we often do.


The moving shores of things

I would like to dedicate this post to the victims of last week's attacks in Paris, who were blindly sentenced to death without notice because they were guilty of joie de vivre, or maybe simply of humanity. I started writing those lines before the attacks, and they could seem at first sight to have nothing to do with Daesh madness. But if you are patient enough to read down to the end, I hope you will find relevant food for thought in the context of those events.

Whether things are ontological primitive or abstracted from the states of affairs, as suggested in our previous post, to deny them any kind of existence would fly in the face of common sense and experience. Mountaineers know that there are mountains, rocks and streams, sailors know that there are seas, waves and storms. But ask them what mountain or sea is, and you're likely to get all but a definition. They will certainly tell you awesome stories of climbing and sailing, maybe show you images, and the most sensible of them will just propose to go with them for climbing or sailing to figure by yourself, experiment the thing, be part of it. 
Why is it so? Because being is not being neatly defined. Our logicians and ontologists would like to make us believe that the world can be split neatly between this and that, day and night, land and sea, human and non-human etc, categories which could be logically defined. There are many reasons why they are wrong, the most often quoted being the arbitrary choice of such limits, since the world can be split into things in many ways. A good introduction to the current discussion on this viewpoint called relativism can be found in the Stanford Encyclopedia of Philosophy.
But relativism is not the best and primary stance I would choose to argue why trying to give a logical definition of mountain or sea or whatever else is bound to fail forever. The main point is that such things, as well as most things you can think of, have fringes, shores, edges, interfaces ... (the name depending on the kind of things you consider) through which they are less separated from than intertwined with each other. If you look at a shore from far enough, it can look like a neat line. But if you look closely, while walking on a beach or at the edge of a forest, you will discover a very complex world which belongs to neither or both worlds that meet here. On the shore, the sea enters the land and the land feeds the sea in the perpetual circulation of waves and tides. The shore is where land and sea communicate and exchange. At the forest edge, animals perpetually come in and out of the trees' shelter to feed in the grass and fields. And what is the forest itself, if not a shore between earth and sky, with thousands of trees as so many links and knots between the depth of ground and the air. Great examples of such intertwining interfaces are mangroves, known to be extraordinary rich ecosystems.

Mangrove at Cayo Levisa, Cuba.
Source Wikimedia Commons

The concepts we abstract from the world and toss to each other's face in our endless arguments and wars are of the same nature. Between life and death, human and non-human, the limits should look indeed like the above, moving and intertwined. And thinking otherwise that those moving shores are or should be reduced to neat lines is the first step towards totalitarism, exclusion, and death. 

The words and acts of Daesh have gone of course very far down such an alley, but to fight them back we should be careful not to use similar simplistic rhetoric, ignoring the complexity of fringes and shores, replacing them with edges as straight and cutting as their knife's blade. One of those, the most simplistic one still unfortunately thought aloud by too many people, leads to broadly confuse Daesh with Islam, when 99.9% of Muslims condemn the terrorism, and more than 80% of Daesh victims are Muslims. But the defensive stance of many moderate Muslims claiming outloud that Daesh is not Islam, and his members are not the Muslims they claim to be, is equally simplistic and counterproductive. Daesh is indeed a shore of Islam, although a very remote and dangerous one, and moderate Muslims would certainly benefit to acknowledge outloud that such a shore exists, where Islam meets and intertwine with intolerance, obscurantism, organized criminality, thirst for glory and power, or sheer madness. As any shore, you can get there from both sides. From inside Islam through fundamentalism, and from outside through social exclusion and criminality. 
Beyond or inside Islam, we see many people saying or writing that Daesh killers have put themselves by their words and acts deliberately beyond humanity, and therefore could and should be simply shot down as dangerous furious animals. Calling them "monsters" or "barbarians", whatever fits for saying "they are not like us and must be eliminated" is as simplistic and counterproductive as the above claim "they are not Muslims". Daesh killers certainly dwell on some strange and frightening fringe of humanity, so far off that they are even able to shake the notions we have of what makes humanity. But whether humans or barely so, barbarians, monsters, or simply mad criminals, in any case they have come to this deadly shores from inside humanity, and we need to understand how they got there to prevent more young people to follow the same paths.


Mountains as states of affairs

Trying to make sense of the deep work of Jan Christoph Westerhoff about ontological categories, reality and everything, along with a slow but steady learning of Chinese language and ancient philosophy, leads you to consider as ontological primitive the states of affairs, instead of good old semantic web things and properties, the latter being derived artefacts of the former, not the other way round. Let's try to illustrate this as simply as possible.
Consider a mountain. On the semantic web you represent a mountain as an instance of owl:Thing or one of its specific subclasses such as schema:Mountain. You claim to have defined a non-ambiguous individual identified by a URI and described by an open set of property-value pairs, such as http://dbpedia.org/resource/Mount_Everest.

But in the view of the world proposed by both Westerhoff philosophy and the Chinese language (insofar as I understand them properly), the above are just abstractions derived from some state of affairs. The chinese 山(shān) we translate in English as mountain(s) is a sign associated with certain aspects of things, or states of the world. We have to be very cautious on terms here, and not take for granted that existence of "things" and "the world" are preconditions to the states of affairs we associate with the sign 山. In ancient Chinese culture where this sign first emerged about three thousands years ago, the world is not divided into things before we name them. Certain states of affairs, patterns we recognize again and again, lead us to associate a sign to them. 山 is just an abstract visual representation of those states of affairs presenting peaks rising upward, a main central one and another one of each side, slightly asymmetrical. A mountain is indeed generally mountains, bearing in mind "three" has to be understood as a shortcut for "many".

The difference between considering there are such things in the world as individual mountains and we just give them individual names and put them in a category, and considering mountains as states of affairs we associate using a common sign or name, might appear subtle or moot. But it is indeed a fundamental shift of our view of the world. States of affairs are not neat individuals defined by properties, they are not separated from each other, they have neither precise limits in space and time, nor definite components and properties. Of course we can try to agree and generally agree to disagree upon such limits and components, and argue forever on what is or is not a mountain in general or this mountain in particular. And we actually argue upon what is a human being, or a book, or a Web resource, or democracy ... This kind of argument is interestingly called in Chinese 是非 (shì fēi), literally meaning "being - not being", hence "right - wrong" and in common language dispute, argument. There is much food for thought in this word. Dispute arises when the language gets out of its original role of simply putting signs on state of affairs, going down to argue on what there is and is not behind signs, in other words, when the language mingles into ontology and meaning instead of sticking to what it's really made for - poetry.

I wish you to stay away from dispute, walk up and listen to the mountain songs.


It is so because it is so

I've been through a few mountain paths and Chinese ancient texts this summer, both demanding fitness and attention of body and mind, and quite interesting to put together. 


The above image I captured yesterday, the caption is extracted from one of the most challenging pages of Zhuangzi, quite close in spirit to another taoist piece we've been through in a previous post. We have here also a symmetrical sentence conveying the idea of a parallel ontogenesis for paths and things. The following is my synthesis of various translations.

Walk the path and it is completed
Name the thing and it is like this

The second part is indeed close to Laozi (having a name is the mother of all things). The implicit assumption, in a taoist context, is that there are adequate ways for both walking and naming. Even if, like in the above image, the path is barely visible and seems to vanish at some point. Such unsteady tracks quickly disappear if not regularly trodden, but they are not as contingent as they could seem. Even without any visible track, any (good) walker would make its way through the scree towards the background pass following a more or less similar path. From a taoist viewpoint, the (good) walker's path would fit the lines of the landscape, his path in harmony with the general way of things, both being called 道 (dào). The (wise) man will do the same in naming things, following the natural lines of his current cultural landscape. Bearing in mind that this landscape is bound to change, hence neither path or thing have any absolute and definitive existence.
And Zhuangzi in the following sentence pounds the point that there is no need for further explanation or dispute about it, through the insisting use of the word , meaning here "so" or "like this". The (adequate) path is so because it is so, the (adequate) thing is so because it is so. 

The translation at Chinese Text Project is adding an extra layer of interpretation.

A path is formed by (constant) treading on the ground. 
A thing is called by its name through the (constant) application of the name to it. 

This "constant" is not explicit in the original text, but it makes sense in the cultural context of Zhuangzi writing. For the track to become a path, it has to be trodden again an again by many people, and the name gets its meaning by usage. Paths and things are polished cultural artefacts. 


Weaving beyond the Web

More on this story of names (including URIs) and text (including the Web), as promised to all those who have provided a much appreciated feedback to the previous post. I'm still a bit amazed by the feedback coming from the SEO community, because I really did not have SEO in mind. But I must admit I'm totally naive in this domain, and tend to stick to principles such as do what you have to do, say what you have to say, make it clear and explicit, and let search engines do their job, quality content will float towards the top. And explicit semantic markup is certainly part of the content quality. Very well ... but that was not my point at all. That said, any text is likely to be read and interpreted many ways, and there is often more in it than its author was aware of. And actually, this is akin to what I am about today, the meaning of a text beyond its original context of production.

Language is an efficient and resilient distributed memory, where names and statements can live as long as they are used. And even if not used any more, they can nevertheless live forever if part of some story we keep telling, reading, commenting and translating, some text we are still able to decipher. We still use or at least are able to make sense of texts forged by ancient languages thousands of years ago, even if the things they used to name and speak about do not exist any more. Dead people, buildings and cities returned to ground centuries ago, obsolete tools and ways of life, forgotten deities, concepts of which usage has faded away, the names of all those we nevertheless keep in the memory of languages - the texts. Some of us still read and make sense of ancient Greek and Latin, or even ancient Egypt hieroglyphs. The physical support of this memory has changed over time, from oral transmission to bamboo, clay tablets, papyrus, manuscripts and printed books, analogic and numeric supports of all kinds, today the cloud and what else tomorrow. Insofar as such migrations were possible at all, we trust the resilience of our language.

How do URIs fit in this story? URIs are a very recent kind of names, and RDF triples a new and peculiar form of weaving sentences. People who forged the first of them are still around, and they have been developed for a very specific technical context, which is the current architecture of the Web. Will they survive and mean something centuries from now? Do and will the billions of triples-statements-sentences we have written since the turn of the century make sense beyond the current context of the Web? Like Euclid's Elements, are they likely to live forever in long meaning?

Let's make a thought experiment to figure it. We are in 2115, the current Web architecture has been overriden since 2070 by some new technological infrastructure we can barely figure out in 2015, no more no less than our grandmothers in 1915 could figure the current Web architecture. HTTP is obsolete, data is exchanged through whatever new protocol. Good old HTTP URIs don't dereference to anything anymore since half a century. Do they still name something? Do the triples still make sense? Imagine you have saved all or part of the 2015 RDF content, and you have still software able to read it - just a text reader will do. Can you still make sense of it? Certainly, if you have a significant corpus. If you have the full download of 2015 DBpedia or WorldCat, most of its content should be understandable if the natural language has not changed too much. Hopefully this should be the case. We read without problem in 2015 the texts written by 1915. And if you have saved a triple store infrastructure and software, you might still be able to query those data in SPARQL by 2115. Triples are triples, either on the Web or outside it.

What lesson do we bring home from this travel to the future? Like any text, URIs and triples can survive and be meaningful well beyond the current Web infrastructure, they belong to the unfolding history of language and text. Of course today the Web infrastructure allows easy navigation, query and building services on top of them. But when forging URIs and weaving triples, consider that beyond the current Web what you write can live forever if it's worth it. Your text is likely to be translated into formats, languages and read through supports and infrastructures you just can't imagine today. Worth thinking about it before publishing. Text never dies.


From names to sentences, the Web language story.

Conversation about text and names and how they are interwoven within the Web architexture is going on here and there. The more it goes, the more I feel we need more non technical narratives and metaphors to have people get what the (Semantic) Web is all about. We have drowned them under technical talks and schemas of layers of architecture and protocols and data structures and ontologies and applications ... and the neat result is that too many of them, and smart people, think only experts, engineers and geeks can grok it. So let me try one of such - hopefully simple - narratives. 

The story of the Web is just the story of language, continued by other means. Forging names to call things, and weaving those names in sentences and texts. On the Web, things have those weird names called URIs, but names all the same. As we have seen in a previous post, a name is to begin with a way to shout and identify people and things in the night. On the Web to call a thing by its URI-name you will use some interface, a browser, a service, an application, and at this call something will come through the interface. Well, the thing you have called does not actually come itself to you through the network, but you get something which is hopefully a good enough representation of the thing. The deep ontological question of the relationship between the name and what is named has been discussed for ages and will continue forever. The Web does not change that issue, does not solve it, just provides new use cases and occasions to wonder about it. But this is not my point today.

On the first ages of the Web, calling things was all you could do with those URI-names. You had the language ability of a two years old kid. You could say "orange" or "milk" when you were thirsty, and "dog" and "cat" and "car" and "sea" and "plane" when you saw or wanted one, and cry for everything else you could not express or the dumb Web would not understand. With no more sophisticated language constructs, you could nevertheless discover the wealth of the Web, through iterative serendipitous calls. Because courtesy of the Web is such that when you call for a thing the answer comes often back with a bunch of other names you can further call (an hyperlink just does that, enabling you to call another name just by a click). You would bring back home things you had not the faintest idea of the very existence a minute before. Remember this jubilation, the magic of your first Web navigation, twenty years ago? Like a kid laughing aloud when discovering the tremendous power of names to call things.
Today in many (most) of our interactions with the Web we are no more aware of using names. We make actions with our fingertips, barely guessing that under the hood, this is transformed in a client calling a server or something on this server by some name, and many calls are made on the network to bring back what your fingers asked. Only geeks and engineers know that. The youngest generations who have not known the first ages of the Web, and interact only through such interfaces, are plainly ignoring all that names affair. Did you say URL Dad? What's that? It sounds so 90's ...

Now when you grow older than two, you go beyond using names just for shouting them in the face of the world, you begin to understand and build yourself sentences. That's a complete new experience, a new dimension of language unfolding. You link names together, you discover the texture, the power to understand and invent stories and to ask and answer questions. You still use the same names, you are still interested in oranges, cats, dogs and cars, and all the thousands of things which are the children of naming. But you are now able to weave them together using verbs (predicates), qualifiers and quantifiers and logical coordination. You have become a language weaver.

And that's exactly and simply what the Semantic Web is about, and how it extends the previous Web. Just growing and learning to weave sentences, telling stories, asking questions. But using the same URI-names as before. Any URI-name of the good old Web can become a part of the Semantic Web. Just write a sentence, publish a triple using it as subject or object, and here you are. 


Text = Data + Style

We used to consider the Web as an hypertext, a smart and wonderful extension of the writing space. It is now rather viewed and used as a huge connected and distributed data base. Search engines tend to become smart query interfaces for direct question-answering, rather than guides to the Web landscape. Writing-reading-browsing the hypertext, which was the main activity on the first Web, is more and more replaced by quick questions asking for quick answers in the form of data, if possible fitting the screen size of a mobile interface, and better encapsulated in applications. Is this the slow death of the Web of Text, killed by the Web of Data?
For a data miner, text is just a sort of primitive and cumbersome way to wrap data, from which the precious content has to be painfully extracted, like a gem from a dumb bedrock. But if you are a writer, you might consider the other way round that data is just what you are left with when you have stripped the text of its rhythm, flavor, eagerness from the writer to get in touch with the reader, in one word, style. Why would one bother about style? +Theodora Karamanlis puts it nicely in her blog Scripta Manum under the title "Writing: Where and How to begin".
You want readers to be able to differentiate you from amongst a group of other writers simply by looking at your  style: the “this-is-me-and-this-is-what-I-think” medium of writing. 
Writing on the Web is weaving, as we have seen in the previous post, and your style in this space is the specific texture you give to it locally, in both modern graphical sense and old meaning of way of weaving. The Web is indeed a unified (hyper)text space where anything can be weaved to anything else, but this is achieved through many local different styles or textures. It would be a pity to see this diversity and wealth drowned in the flood of data.
We've learnt those days that Google is working on a new kind of ranking, based on the quality of data (facts, statements, claims) contained in pages. But do or will search engines include style in their ranking algorithms? Can they measure it, and take it into account in search results and personal recommandations, based on your style or the styles you seem to like? Some algorithms are able to identify writing styles the same way other ones identify people and cats in images, or music performers. If I believe I Write Like I just tried on some posts of this blog, I'm supposed to write like I. Asimov or H.P. Lovecraft. Not sure how I should take that. But such technologies applied to compare blogs' styles could yield interesting results and maybe create new links that would not be discovered otherwise.
The bottom line of our data fanatics here could be that after all, style is just another data layer. I'm not ready yet to buy that. I prefer the metaphor of style as a texture. Data is so boring.


... something borrowed, something blue

I already mentioned +Teodora Petkova in a recent post. Reading her blog, you'll maybe have as I had several times this "exactly ... that!" feeling you get when stumbling on words looking like they have been stolen from the tip of your tongue or pen. In particular don't miss this piece, with its lovely bride's rhyme metaphor, to be applied to every text we write in order to weave it with the web of all texts.
Something old, something new, something borrowed, something blue
Something old ... how can one write without using something old, since what is older than the very words and language we use to write? And one should use them with due respect and full knowledge of their long history. Let's look at some of those venerable words. Children of the Northern European languages, web and weaving seem to come from the same ancient root, hence Weaving the Web is a kind of pleonasm. And text comes from the Latin texo, texere, textus meaning also to weave, and cognate to the ancient Greek τέχνη, the ancestor of all our technics, technologies and architectures. In the Web technologies the northern germanic warp of words have been interwoven with the southern latin woof, and each new text on the Web is a knot in this amazing tapestry. Our Web of texts is not as bad as I wrote a few years ago, and with its patchy, fuzzy, furry and never-finished look, we love it and want to keep it that way.

Something new ... Text seems to be old out-fashioned stuff those days, it's data and multimedia and applications all over the place. Even the Semantic Web has been redubbed Web of Data by the W3C. And what if after Linked Open Data (2007) and Linked Open Vocabularies (2011), we were opening in 2015 the year of Linked Open Text?

Something borrowed ... Teodora encapsulates all the above with the concept of intertextuality. And that one I definitely borrow and adopt (just added it to the left menu), as well as the following from another great piece.
As every text starts and ends in and with another text and we are never-ending stories reaching out to find possible continuations…
Something blue ... The blue of links indeed, but to make the Linked Open Text happen and deliver its potential, we need certainly more than a shade of blue. As Jean-Michel Maulpoix writes in his Histoire du bleu ... All this blue is not of the same ink.
Tout ce bleu n’est pas de même encre.
On y discerne vaguement des étages et des sortes d’appartements, avec leurs numéros, leurs familles de conditions diverses, leurs papiers peints, leurs photographies, leurs vacances dans les Alpes et leurs terrasses sur l’Atlantique, les satisfactions ordinaires et les complications de leurs vies. La condition du bleu n’est pas la même selon la place qu’il occupe dans l’échelle des êtres, des teintes et des croyances. Les plus humbles se contentent des étages inférieurs avec leurs papiers gras et leurs graffitis : ils ne grimpent guère plus haut que les toits hérissés d’antennes. Les plus heureux volent parfois dans un impeccable azur et jettent sur les cités humaines ce beau regard panoramique qui distrayait autrefois les dieux.
To fly that high, we need indeed to invent and use new shades of blue to paint the links between our texts, and the words where those links are anchored. 


Could computers invent language?

Artificial intelligence is something about which not a line has been written in these pages in next to two hundred posts and over more than ten years. But I feel today like I should drop a couple of thoughts about it, after exchanges on Google+ around this post by +Gideon Rosenblatt and that one by +Bill Slawski, not to mention recent fears expressed by more famous people.
There are many definitions of artificial intelligence, and I will not quote or choose any. Samely, popular issues I also prefer to let alone, like knowing if computers are able to deal only with data and algorithms, or if they can produce information or even knowledge, or if they think and can individually or collectively accede to consciousness or even wisdom. All those terms are fuzzy enough to allow anyone to write anything and its contrary on such issues. Let's rather look at some concrete applications.
Pattern recognition is one of the great and most popular achievements of artificial intelligence. Programs are now able with quite good performance to translate speech into written language, identify music tracks, cluster similar news, identify people and cats on photographs etc. 
Automatic translation is also quite popular, and working not that bad for simple factual texts, has still hard time dealing with context to solve ambiguity, understand puns and implicit references, all things generally associated with intelligent understanding of a text. 
Question-answering is also making great progress, based on more and more rich and complex knowledge graphs, and translation of natural language question into formal queries.
No doubt algorithms will continue to improve in those domains, with many useful applications and some related and important issues regarding privacy and delegation of decision to algorithms.

All the above tasks deal more or less with the ability of computers to process successfully our languages. But, and this is where I'm bound from the start, there is a fundamental capacity of human intelligence which, as far as I know, has not even began to be mimicked by algorithms. It's the capacity to invent language. It has been largely discussed since Wittgenstein whether a private language is possible or not, but there is no discussion that language has been and still is built collectively through a proceess of collective continuous invention. Anyone can invent a new word or a new linguistic form; whether it will be integrated into the language commons depends of many criteria akin to the ones enabling a new species to expand and survive or disappear. This is the way our languages constantly evolve and adapt to the changing world of our communication and discourse needs. Could computers be able to mimick such a process, take part in it, and even expand it further than humans? Could algorithms be able to produce new and relevant words, smoothly integrated in the existing language, to name concepts not yet discovered or named? In short, are computers able to take part in the continuous invention of language, and not only make a smart use of the existing one?
Such a perspective would be indeed fascinating and certainly scary, insofar as machines inventing collectively such language extensions would not necessarily share them with humans, and even if they do, humans would not necessarily be able to understand them

Whether such an evolution is possible at all or in a foreseeable future is a good question. Whether we should hope for it and work to let it happen, or should fear and prevent it, is yet a more interesting one. But at the very least, those questions we can technically specify, making them much more valuable for assessment and definition of artificial intelligence than vague digressions on whether computers can think, have knowledge or can become conscious. We don't even really know what the latter means for humans, our shared language being the closest proxy we have for whatever is going on in our brainware. So let's assess the progress of artificial intelligence by the same criteria we generally use to assess the human intelligence, its ability to deal with language, from plain naming of things to invention of new concepts.


Statements are only statements

A few days ago in the comments of this post by +Teodora Petkova on Google+ I promised to +Aaron Bradley a post explaining why I am uneasy with the reference to things in Tim Berners-Lee's reference document defining (in 2006) Linked Data. The challenge was to make it readable by seven-years old kids or marketers, but I'm not sure the following meets this requirement.

When Google launched its Knowledge Graph (in 2012) with the tagline things, not strings, it was not much more than the principles of Linked Data as exposed in the above said document six years before, but implemented as a Google enclosure of mostly public source data, with neither API nor even public reusable URIs. I ranted here about that, and nothing seems to have changed since for that matter.
But something important I missed at the time is a subtle drift between TBL's prose and Google's one. The former speaks about things and information about those things. The latter starts by using also the term information, but switches rapidly to objects and facts.
[The Knowledge Graph] currently contains more than 500 million objects, as well as more than 3.5 billion facts about and relationships between these different objects.
The document uses "thing", "entity" and "object" at various places as apparent broad synonyms, conveying (maybe unwillingly) the (very naive) notion that the Knowledge Graph stands at a neat projection in data of "real-world" well-defined things-entities-objects and proven (true) facts about those. An impression reinforced by the use of expressions such as "Find the right thing". And actually, that's how most people are ready to buy it, "Don't be evil" implies "Don't lie, just facts". In a nutshell, if you want to know (true, proven, quality checked) facts about things, just ask Google. It's used to be just ask Wikipedia, but since the Knowledge Graph taps on Wikipedia, it inherits the trust in its source. But similarly naive presentations can be found here and there uttered by enthusiastic Linked Data supporters. Granted, TBL's discourse avoids reference to "facts", but does not close the door, and by this opening a pervasive neo-platonician view of the world has engulfed. There are things and facts outhere, just represent them on the Web using URIs and RDF, et voilà. The DBpedia Knowledge Base description contains such typical sentences blurring the ontological status of what is described.
All these [DBpedia] versions together describe 38.3 million things, out of which 23.8 million are localized descriptions of things that also exist in the English version of DBpedia.
It's let to everyone's guess to figure what "existence in the English version" can mean for a thing. What should such documents say instead of "things" and "facts" to avoid such a confusion? Simply what they are, data bases of statements using names (URIs) and sentences (RDF triples) which just copy, translate, adapt, in one word re-present on the Web statements already present in documents and data, in a variety of more or less natural, structured, formal, shared, idiomatic languages. As often stressed here (for five years at least), this representation is just another translation.
And, as for any kind of statements in any language, to figure whether you can trust them or not, you should be able to track their provenance, the context and time of their utterance. That's for example how Wikidata is intended to work. Look at the image below, nothing like a real-world thing or fact is mentioned, but a statement with its claim and context.
The question of the relationship of names and statements with any real-world referents is a deep question open by philosophers for ages, and which should certainly remain open. Or in any case the Web, Linked Data and the Knowledge Graph do not, will not, and should not insidiously, or even with no evil in mind, pretend to close it. Those technologies just provide incredibly efficient ways to exchange, link, access, share statements, based on Web architecture and a minimalist standard grammar. Which is indeed a great achievement, no less, but no more. At the end of the day, data are only data, statements are only statements.


Common names, proper usage

What follows might be, as previous posts, relevant to the raging debate in and around the W3C Shapes Working Group. If you don't care too much about Latin, Greek, French, German, etymology, translation and languages at large, you can go straight to the last paragraph. But I trust my faithful readers (whoever they are) to follow me through the long preliminary linguistic meanders.

I had a while ago pointed at the enclosure of common names as trademarks. Maybe I should have written common nouns. But in French (my native language), there is a single word nom to translate both noun and name, all being cognates to Latin nomen, Greek ὄνομα, and many more avatars of the same Indo-European root. In French grammar you will say "nom commun" for "common noun" and "nom propre" for "proper noun", and a French native speaker is likely to translate in English "common name" and "proper name", both ambiguous out of context. And my purpose today is indeed to look at what it can mean for names to be common or proper beyond what it means for grammatical nouns.
Let's look into Latin again, where communis and proprius, as well as their ancient Greek equivalents κοινός and ἴδιος have roughly the semantic scope they have kept in French and English. Together they split the world into what belongs to the commons and what is proprietary or private. Beyond and before use in grammar to denote universals and particulars, further meanings have built upon good or bad characteristics associated with each term. Typically, "common" will be used as a derogatory qualifier for whatever belongs to the vulgum pecus, those common people which do not behave, think or speak properly.  The French "propre" even goes further down this derogatory path to mean "clean", with disambiguation by position ("c'est ma propre maison" = "it's my own house" vs "sa maison est propre" = "her house is clean"). Such extensions seem indeed characteristic of a language controlled by some aristocracy. It's worth noticing that the English "own" and its German cognate "Eigen" do not seem to have suffered similar semantic drifts. 
Sticking to the original meaning and forgetting the interpretations of either grammar or aristocracy, common names would be simply names belonging to the commons. Which is true, if you think about it, for just any name. A name with no community (or communality) would be useless, and actually barely a name, just a string with no shared usage and agreed-upon denotation. Under such a definition, even proper nouns are common names. From a grammatical viewpoint, "Roma" is a proper noun, but it's common to all people using it to denote the capital of Italy. To make it short, all names belong to the commons, otherwise they don't name anything at all.
The above analysis does not apply only to natural languages names (aka nouns), but also to all those technical names handled in our information system internal languages, the names used by machines to call each other in the dark (see previous post) and take actions. URIs, addresses, objects and classes names ... if those were not common names, we would have no open Web, and no open source code with reusable libraries.
But those common names, when used and interpreted by software, behave internally at run time as proper names, by all means of "proper". They each call a well defined individual object, method or whatever piece of executable code. A URI sent through the HTTP protocol is eventually calling by their internal names specific pieces of data on one or more servers, all of them running by their own, proper, often proprietary code with its idiosyncratic functional semantics.
Otherwise said, if the declarative semantics of a technical name (description of what it denotes) belongs to the commons, its performative semantics (what it does when called) is proper to the system in which it is used, and conditions at run time.

How is that relevant to the W3C Shapes debate? What this group is (maybe) seeking (or should seek) is actually a (standard) way to describe proper performative semantics for systems using RDF data. On the DC-Architecture list, +Holger Knublauch is complaining a few days ago.
Yet, there used to be a notion of a Semantic Web, in which people were able to publish ontologies together with shared semantics. On this list and also the WG it seems that this has come out of fashion, and everyone seems "obsessed" with the ability to violate the published semantics.
Violate the published semantics? Well, no, it's just about describing how the common semantics behave properly in my system. But whether that can be achieved through yet another declarative language or some interpretation of existing ones without blurring the RDF landscape a bit more, is another story. 


You need names on the Web, it's dark in there.

The chinese character 名 (name) which we have seen in the previous post as the mother of all things, has an interesting origin. It's composed from the characters 夕 (night, symbolized by a crescent moon) and 口 (an open mouth). The clue of such a mysterious association is that you need a name either to call someone, or to identify yourself, in the dark of night. In daylight, you don't really need to know the name of your interlocutor to recognize each other and engage into conversation. You don't need names of things to find and handle them.

Interaction through information systems, and singularly on the Web, is a conversation in the darkest of nights. You can't see your interlocutors, you can't wave or bow at them, and you don't see either what your are looking for, and the system does not see you. So you need names everywhere. You need names to enter the system, to login, to send messages. You need to know names to connect to people on the social web. You need to know a name of what you search to ask a search engine. One can argue that all of this is rapidly changing, with identification using your finger or eyeprint, connecting to stuff or people using icons and various fancy non-textual interfaces. But under the hood, the system will still exchange ids, keys, adresses, all those avatars of names used by machines. If our online experience gets closer and closer to daylight conversation, poor machines will keep  for a long time shouting names to each other across the dark of Web.



My conversation with good old 老子 is a neverending story, and I had to revisit him with the untranslatables paradigm in mind. I discovered long ago the extreme difficulty of translating the chinese characters and singularly in ancient writings through the excellent introduction I already mentioned here some years ago, this "Idiot chinois" by Kyril Ryjik. This book had sold out long ago, my exemplar was lost in a former life, fortunately a few years ago on some obscure blog I stumbled on a PDF copy I was preciously keeping safe ... but I can now forget about all those. After thirty years of dark ages, L'Idiot Chinois is now republished, and this new edition should land on my bookshelves anytime soon ...
The infamous and cryptic first chapter of the 道德經 would certainly be easily short listed in any challenge of the best untranslatables ever. It is an example Ryjik is presenting, because it's both too well known and too much translated, and certainly deeply misunderstood by most western translators.
Here goes the first part, which even if you don't read Chinese will strike you by the rhythm and sheer graphical refinement of its 24 characters. Note that the character 名 (míng, "name") is repeated five times, a hint at this story being about names and naming, mainly. 


Ryjik holds that all but a few western translations and interpretations project a transcendental interpretation of  which does not make sense in the historical/political/cultural context where this text was produced. This is still the case of many available translations, for which the Dao has too much the look and feel of our western monotheist God. If nothing else, the initial caps everywhere are suspicious, there is no upper-case in Chinese.  should certainly be taken with a more mundane meaning : the way the world is going, and that human beings should try to follow, individually and collectively, in order to live in harmony with the general flow. Only physics, no metaphysics.
With this in mind, Ryjik posits that the negative  in the first sentence should be certainly read as a determinant of 常 (constant, unchanging, regular, in one word steady), rather of the whole group 常道. 
In other words, where most translators read 非(常道) not (steady way) one should rather read (非常)道 (not steady) way. Which makes the whole sentence read  something like (a) way really way is not a steady way. In other words : if you want to conform your way to the way (of the world at large) you have to adapt and change (as the world does). In the historical context, Ryjik holds that this is a moral and political recommandation not to stick to a rigid application of ancient rules despite the situation is everchanging. But this is a general consideration, just put there to introduce the main point of the story : the role of names.
Reading in the same spirit 名可名,非常名 yields name really name is not a steady name. Since things as the world flows are everchanging, the names you give to things are also bound to change to keep their accuracy. And in this spirit I just changed the title of this blog ...
As for the following two sentences which seem more mysterious, I've not been fully convinced by any translation so far, even the one by Ryjik. I'm pushed towards proposing my own translation by a beautiful edition entitled "La Danse de l'Encre", illustrated by Lassaâd Metoui, a tunisian calligraph. Thomas Golsenne writes in the introduction (in French, my translation)
"To read the Tao Te King against the grain, out of context is not only a right granted to the reader, it's a sort of duty  ... Understanding or translating [it] "faithfully" does not make any sense, because there is nothing to be faithful to, nothing but emptiness"
So be it, here goes my own unfaithful version of the two following sentences

無名天地之始  : there is no name at the origin of the universe
有名萬物之母  : having a name is the mother of all things

Which I read : the world as a whole 天地 (sky and earth) exists before and beyond any name, and does not need any name to exist, but with names come the separation in things, this and not-this, one, two and the ten thousand beings like said further on in chapter 42. 道生一,一生二,二生三,三生萬物. Dao is father of one, one is father of two, two is father of three, three is father of the multitude of beings.
I'm not sure we need another subject than 無名 and 有名 in those two sentences, a subject which would be implicitly 道, as most translations have it, like "Without name the Dao is the origin of the Universe" etc ... here comes the Holy Ghost, the Logos and the heavy monotheist capitalization. But the dao has nothing to do with the Holy Ghost. There is no metaphysics in the dao, only physics. 
This is actually somehow akin to the (too noisy) recent thesis of Markus Gabriel "Warum es die Welt nicht gibt". Things exist insofar as they are named, but the world cannot be named as a separate entity because there is nothing from which it could be separated from.

Amazingly enough, there is no entry for name in the Dictionary of Untranslatables. Not even a small entry in the index. This is certainly food for thought to expand in a future post.


Data Patterns, continued

Follow-up of the previous post, still trying to make sense of this pack of untranslatables : pattern vs schema vs structure vs model, and in particular how to draw the fine line between their descriptive and prescriptive aspects ... without spamming anymore the DC-Architecture list with this discussion with +Holger Knublauch which has somehow gone astray ...
Looking at pattern in the Wiktionary yields a lot of definitions, among others the following ones, broad enough to fit our purpose.
  • A naturally-occurring or random arrangement of shapes, colours etc. which have a regular or decorative effect. 
  • A particular sequence of events, facts etc. which can be understood, used to predict the future, or seen to have a mathematical, geometric, statistical etc. relationship. 
Further on in the same source, I discover that pattern can also be used as a verb (to pattern)
  • To make or design (anything) by, from, or after, something that serves as a pattern; to copy; to model; to imitate.

To discover, recognize, classify and name patterns in the world is a basic activity of our brain, and the very basis of our knowledge. Are those patterns emerging in our brains and projected on reality? Or does the world really signifies something to us (in the sense of the French faire signe) with those patterns, pointing to some internal logic and maybe meaning? I will keep agnostic here on this deep question, and rather look at an example which will bring us back to the questions of patterns in data.
What do we see in this image? Objects of various shapes, sizes and colors, connected by edges apparently not oriented. Some would call it a graph. Can you see any pattern? A casual look might miss it, and say those shapes, colours and sizes are rather random, their distribution is not really regular, although there are some vertical and horizontal alignments, groups of objects of the same color, and other groups of the same shape. A mix of order and random, like in the real world. Looking more closely, you will notice that connected objects share either a common color, or a common shape, or both (like the two red rectangles). This I will call a pattern.
We can now try to describe those objects in RDF data, using three predicates ex:shape, ex:color and ex:connected, and check if the pattern is general.

    ex:shape  "moon";
    ex:color "blue";
    ex:connected  :blueTriangle1 .

    ex:shape  "triangle";
    ex:color "blue";
    ex:connected  :blueMoon1, blueEllipse1, redTriangle1 .


The pattern can be checked over the above data using this query

  ?x  ex:shape ?xShape.
  ?x  ex:color ?xColor.
  ?y  ex:shape ?yShape.
  ?y  ex:color ?yColor.
  ?x  ex:connected  ?y.
  FILTER (?xShape = ?yShape || ?xColor = ?yColor)

This query should yield all objects in the graph. If there is a handful of exceptions out of thousands of objects, I will certainly consider this is a general pattern, with some exceptions I will look closely at for further investigation. If this pattern is observed for, say, 60% of nodes, I will certainly consider it a frequent pattern. If the result is less than 10%, I will tend to consider it as a random structure rather than a pattern. All this activity is descriptive, with possible predictive purposes. I might have queried a part only of this graph because it has billions of objects, and assume the pattern is extending to the rest.

Can I turn this pattern into a prescriptive rule? Sure enough. If I want to create a new object connected to the yellow triangle at the bottom right, it has to be either a triangle (free color), or a yellow whatever (free shape), or both. But ... may I introduce new colors and new shapes, such as a yellow star or a purple triangle? In an open world, this is not forbidden by my pattern. But my closed system can be more restrictive, and limit the shapes and colors to those already known. 

I'm pretty sure that people asked to extend this graph, even after discovering the underlying pattern, will wonder for a while whether they are allowed or not to introduce a yellow star or a purple triangle, because neither star or purple appear in the current picture. It's likely that the most conformist of us will interpret the open pattern into a closed world schema, where objects can have only the shapes and colors already present. Not to mention the size, which has not been discussed, and not represented in the data. Imaginative people, certainly many children will take the open world assumption to invent freely new shapes with new colors, maybe joyfully breaking the pattern in many places. Logicians will be stuck in wondering which logic to use, and are likely to do nothing but argue why at length with each other.

What lessons do we bring home from this example?
  • Patterns can be discovered in data, or checked over data. 
  • The same observed pattern can be turned into an open world rule or included in a closed world schema, and there is not generally a single way to do either of those.
  • We should have a way to represent and expose patterns in data, independently of their further use. The current RDF pile of standards has nothing explicitely designed for such representations, but  SPARQL would be a good basis.
  • Patterns are not necessariliy linked to types or classes of objects. In our example, no rdf:type is either declared in the data or used in the SPARQL query.
For those who read French see also this post on Mondeca's blog Leçons de Choses Le toro bravo et le top model dated april 2010, showing those ruminations are not really new.