Sunday, December 18, 2016

Transparency of judgment

The Guardian, and others, have discovered that when querying Google for "did the Holocaust really happen", the top response is a Holocaust denier site. They mistakenly think that the solution is to lower the ranking of that site.

The real solution, however, is different. It begins with the very concept of the "top site" from searches. What does "top site" really mean? It means something like "the site most often pointed to by other sites that are most often pointed to." It means "popular" -- but by an unexamined measure. Google's algorithm doesn't distinguish fact from fiction, or scientific from nutty, or even academically viable from warm and fuzzy. Fan sites compete with the list of publications of a Nobel prize-winning physicist. Well, except that they probably don't, because it would be odd for the same search terms to pull up both, but nothing in the ranking itself makes that distinction.

The primary problem with Google's result, however, is that it hides the relationships that the algorithm itself uses in the ranking. You get something ranked #1 but you have no idea how Google arrived at that ranking; that's a trade secret. By not giving the user any information on what lies behind the ranking of that specific page you eliminate the user's possibility to make an informed judgment about the source. This informed judgment is not only about the inherent quality of the information in the ranked site, but also about its position in the complex social interactions surrounding knowledge creation itself.

This is true not only for Holocaust denial but every single site on the web. It is also true for every document that is on library shelves or servers. It is not sufficient to look at any cultural artifact as an isolated case, because there are no isolated cases. It is all about context, and the threads of history and thought that surround the thoughts presented in the document.

There is an interesting project of the Wikimedia Foundation called "Wikicite." The goal of that project is to make sure that specific facts culled from Wikipedia into the Wikidata project all have citations that support the facts. If you've done any work on Wikipedia you know that all statements of fact in all articles must come from reliable third-party sources. These citations allow one to discover the background for the information in Wikipedia, and to use that to decide for oneself if the information in the article is reliable, and also to know what points of view are represented. A map of the data that leads to a web site's ranking on Google would serve a similar function.

Another interesting project is CITO, the Citation Typing Ontology. This is aimed at scholarly works, and it is a vocabulary that would allow authors to do more than just cite a work - they could give a more specific meaning to the citation, such as "disputes", "extends", "gives support to". A citation index could then categorize citations so that you could see who are the deniers of the deniers as well as the supporters, rather than just counting citations. This brings us a small step, but a step, closer to a knowledge map.

All judgments of importance or even relative position of information sources must be transparent. Anything else denies the value of careful thinking about our world. Google counts pages and pretends not to be passing judgment on information, but they operate under a false flag of neutrality that protects their bottom line. The rest of us need to do better.

No comments: