Encyclopédie Renvois Search/Linker

Leave a Comment
During the summer (2009), a user (UofC PhD, tenured elsewhere) wrote to ask if there was any way to search the Encyclopédie and "generate a list of all articles that cross-reference a given article". We went back and forth a bit, and I slapped a little toy together and let him play with it, to which his reply was "Oh, this is cool! Five minutes of playing with the search engine and I can tell you it shows fun stuff...". This is, of course, an excellent suggestion which we have talked about in the past, usually in the context of visualizing relationships of articles in various ways. At the highest level, visualizing the relationships of the renvois is what Gilles and I attempted to do in our general "cartography paper"[1] and, more recently, Robert and Glenn (et. al.) tried, in a radically different way, to do in their work on "centroids"[2].

The current implementation of the Encyclopédie under PhiloLogic will allow users to follow renvois links (within operational limits to be outlined below), but does not support searching and navigating the renvois in any kind of systematic fashion. Since this is something I think warrants further consideration, I thought it might be helpful to document this toy, give some examples, let folks play with it, outline some of the current issues, and conclude with some ideas about what might be done going forward.

To construct this toy, I wrote a recognizer to extract metadata for each article in the Encyclopédie which has one or more renvois. As part of the original development of the Encyclopédie, each cross reference was automatically detected from certain typographic and lexical clues. This resulted in roughly 61,000 cross-references. Accordingly, the extracted database has 61,000 records. I loaded these into a simple MySQL database and used a standard script to support searching and reporting. The search parameters may include articles headwords, authors, normalized and English classes of knowledge as well as the term(s) being cross referenced. For example, there are 39 cross-referenced article pairs for the headword estomac. As you can see from the output, I'm listing the headword, author, classes of knowledge, and the cross referenced term. You can get the article of the cross referenced term or the cross-references in that article. Thus, the second example shows the link to Digestion:

ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Digestion || renvois
[The renvois of Digestion find 56 articles pairs, including one to intestins]
DIGESTION (Venel: Economie animale, Animal economy ) ==> Intestins || renvois
Intestins (unknown: Anatomie, Anatomy ) ==> Chyle || renvois


and so on ==>lymphe==>sang==>ad nauseum. No, there is no ad nauseum, just how you might feel after going round and round.

Now, there are problems, but please go ahead and play with this now using the submit form, as long as you promise to come back and read thru the rest of this and let me know about any other problems.

Problems

As noted above, the renvois were identified automatically. And as with most of these things, it worked reasonably well. But you will see link errors and other things which indicate problems. Glenn reported these to me and I was going to eliminate them. On second thought, this little toy lets to consider the renvois rather more systematically. Where you see a link error is (probably) a recognizer error, which either failed to get a string to link or got confused by some typography. The linking mechanism itself is based on string searches. In other words, whenever you click on a renvois, you are in fact performing a search on the headwords. This simple heuristic works reasonably well, returning string matched headwords. In some cases, you get nothing because there is no headword that has the renvois word(s), and at other times you will get quite a list of articles, which may or may not include what the authors/editors intended. It is, of course, well known that many renvois simply don't correspond to an article and many others differ in various ways from the article headwords. I am also applying a few rules to renvois searching to try to improve recall and reduce noise. So, this also adds another level of indirection.

Now, ideally, one would go through the entire database, examine each renvois and build a direct link to the one article that the authors/editors intended. But we're talking 60,000+ renvois against 72,000 (or so) articles and it is not clear that humans could resolve this in many instances. When Gilles and I worked on this, we used a series of (long forgotten) heuristics to filter out noise and errors. So, this simple toy works within operational limits and gives us a way to more systematically identify possible errors and ways to improve it.

Future Work

Aside from being a quick and dirty to way get some notion of errors in the renvois, we might be able to make this more presentable. Please feel free to play with this and suggest ways to think about. In the long haul, I would love a totally cool visualization. A clickable directed graph, so you could click on a node and re-center it on another article, or class of knowledge or author. Maybe something like Tricot's representation of the classes of knowledge. Or maybe something like DocuBurst. Marti Heast's chapter on visualizing text analysis, is a treasure-trove of great ideas.

For the immediate term, I would like to recast this simple model to allow the user to specify number of steps. So, set the number of iterations to follow, so you would get something like:

ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Digestion || renvois
DIGESTION (Venel: Economie animale, Animal economy ) ==> Intestins || renvois
Intestins (unknown: Anatomie, Anatomy ) ==> Viscere || renvois
ESTOMAC, ventriculus (Tarin: Anatomie, Anatomy ) ==> Chyle || renvois
CHYLE (Tarin: Anatomie | Physiologie, Anatomy. Physiology ) ==> Sanguification || renvois
SANGUIFICATION (unknown: Physiologie, Physiology ) ==> Respiration || renvois
RESPIRATION (unknown: Anatomie | Physiologie, Anatomy | Physiology ) ==> Air || renvois


Following this chains of renvois either until you run out or your hit an iteration limit. I will try to follow this up with both the multi-iteration model and see if I can recover some of what Liz tried to do using GraphViz to generate clickable directed graphs.

References

[1] Gilles Blanchard et Mark Olsen, « Le système de renvoi dans l’Encyclopédie: Une cartographie des structures de connaissances au XVIIIe siècle », Recherches sur Diderot et sur l'Encyclopédie, numéro 31-32 L'Encyclopédie en ses nouveaux atours électroniques: vices et vertus du virtuel, (2002) [En ligne], mis en ligne le 16 mars 2008.

[2] Charles Cooney, Russell Horton, Robert Morrissey, Mark Olsen, Glenn Roe, and Robert Voyer, "Re-engineering the tree of knowledge: Vector space analysis and centroid-based clustering in the Encyclopédie", Digital Humanities 2008, University of Oulu, Oulu, Finland, June 25-29, 2008
Next PostNewer Post Previous PostOlder Post Home

0 comments:

Post a Comment