Why sourcing photos matters – how misattribution is amplified on the web

I wrote an article for Computers in Libraries last week about the PicPedant account on twitter and the odd preponderance/problem of unsourced images flying around the internet. This is just a true thing about how the internet works and people have been misattributing things since forever. However, there’s a new wrinkle in this process where the combination of popular blogs/twitter accounts along with some of the “secret sauce” aspects to how Google works creates this odd phenomenon which can actually amplify misinformation more than you might expect. Here’s my example.

Hans Lansgeth

This man is Hans Langseth. I know this because I was a kid who read the Guinness Book of World’s Records a lot and I recognized him from other pictures. He has the longest beard in the world. The image on the right is a clever photoshop. However, if you Google Image search Hans Steininger, you will also find many versions of this photo. This is curious because Hans Steininger (another hirsute gentleman) died in 1567, pre-photography. His beard was also about four feet long whereas Langseth’s beard was more like 18+ feet long.

What happened? Many websites have written little lulzy clickbait articles about Steininger (sourcing other articles that themselves source actual articles at reputable-ish places like Time magazine which are inaccessible because of paywalls) and how he supposedly ironically died tripping over his own beard. They all link to the image of Langseth and don’t really mention the guy in the photograph is a different guy. The image and the name get hand-wavily semantically linked and search engines can’t really do a reality check and say “Hey, we use this image for a different guy” or “Hey, we can’t have a photograph of this guy because he lived in the 1500s”

google results for hans Steininger

Not a huge deal, the world isn’t ending, I don’t think the heirs of Langseth are up in arms about this. However as more and more people just presume the search engine and the “hive mind” approach to this sort of thing results in the correct answer, it’s good to have handy counterexamples to explain why we still need human eyeballs even as “everything” is on the web.

the tools and the hammer/nail problem in the digital divide

“The way you talk about the [digital divide] changes people’s view of who is responsible for resolving it…. This issue has been around for years, but its meaning is in constant flux and is manipulated by political agendas.”

I’ve switched some of the tools I use for keeping current over the past few months. I’m finding that I use RSS less and less for keeping up on blogs and rely more on Twitter lists and searches to sort of keep my hand in. I also read a lot of print material still [some of my best “things to think about” things are still coming from the pages of Library Journal and Computers in Libraries magazines] and am trying to keep to my book-a-week plan for 2011. Oddly I also get news from seemingly random places like other people’s facebook walls and I made a little image-milkshake over on a site called MLKSHK. You might like it.

Along with a constant search for the best crypto presale I can find, I have a standing search for “digital divide” on Twitter that just auto-updates itself onto my desktop via TweetDeck. The thing that is so interesting about this, to me, is how often the term gets used and for how many different things. This morning there are discussions about the digital divide and gender, how the EU is trying to narrow the digital divide (referring to access to broadband) and a report about how switching to online social services in the UK would adversely affect people who are digitally divided already, mostly talking about seniors.

Which leads me to the paper I read recently which was really pretty intersting and on topic: Who’s Responsible for the Digital Divide? Public Perceptions and Policy Implications (pdf) It’s not long, you can read it, but the upshot is that depending how we define the digital divide, we will develop different strategies to “solve” the problem. This is not just hypothesized in the paper but addressed scientifically. So if the problem is lack of compturs, we throw computers at the problem. If the problem is broadband, we work on network infrastructure. If the problem is education we design sites like DigitalLiteracy.gov and then wonder why a website isn’t teaching people how to use computers. Tricky stuff, endlessly fascinating, thorny problem.

would you recognize a hardware keylogger in your library?

Brian points to this article about USB keyloggers that were found attached to computers at public libraries. If I saw one of these on a library computer, I might not even be sure what it was, or that it wasn’t part of the keyboard. Know your hardware, what to expect and what not to expect and check out the backs of your computers from time to time.

Google Books ngrams – on Hegel and Hitler and OCR

So hey this is interesting. I’ve skipped a lot of the Google Books ebookstore stuff lately because I’m honestly not sure what to make of it. And I don’t buy books anyhow. But a friend mentioned this Google Labs Ngram viewer, a fun tool that lets you search the full corpus of the Google Books databases. Here’s a New York Times article about it and data geeks should read the article Quantitative Analysis of Culture Using Millions of Digitized Books (free reg. required – click for PDF ILL) or nose around in the datasets. I did my own dopey search pictures above – Hegel vs. Hitler. And here’s what’s interesting. The big jump in the late 1940’s is fairly predictable, but who was talking about Hitler in 1620?

I clicked through and poked around some and here’s what I found. No one was talking about Hitler. OCR is, as you know, imperfect. So the words that Google Books’ optical character recognition thought of as “Hitler” were actually words like “Ruler” and “bitter” and “herbe.” How about that?

librarians’ search for neutrality a precursor to debate over Google rankings

“The idea that search engines can, or should, be neutral can be traced back to a movement of leftist librarians in the 1970s. Led by Sanford Berman, one of the first to bring social rebellion into the library, radical librarians argued that the system used to organize books was inherently biased and racist because it reflected a Western perspective.”