search

The mining of the public domain

“Public.work is a search engine for public domain content.” The site claims to have over 100,000 public domain images. This in and of itself is not that special, but the interface is. It’s gorgeous, a fun and engaging discovery layer where every search becomes a URL that can be shared [example] and the page of images endlessly scrolls up, down, and even sideways. Of course, the endless scroll is a bit of a fiction because many niche searches have few results and thus you see images repeating almost immediately. As someone who has seen a lot of repositories of public domain images come and go, I realized I’ve become something of an expert in them. Here are some of my thoughts. Continue reading “The mining of the public domain”

on casual research and a 2012 wrap-up

Posted on 10Jan1328Nov24 by jessamyn

My year-end 2012 was pretty mellow. I’ve been doing the same technology instruction and teaching at the vocational high school and the occasional local library fill-in shift. I’ve gotten more active in VLA and in the new Rural Librarians Unite group. I had a very busy April-June speaking season which I enjoyed and didn’t do any solo talks after June. I’m upping my rates for 2013 which may seem counterintuitive. I’d like to continue to do public speaking but do fewer events (or more local events that I do for free or cheap) for the same general income. The end of the year was a quiet time to reflect on the value of the work that I do and the work that others do in getting the word out about library technology and technology culture. And there were many people having discussions about the value of libraries, and whether we (or the media) are even asking the right questions. I read these posts with interest.

Do We Still Need Libraries in the NYTimes Room for Debate section with many good responses
Do We Still Need Libraries by John Palfrey
2012: Libraries Not Dead Yet – a wrap-up by Eric Hellman who I was lucky enough to meet this year at In Re: Books

A lot of questions at the end of 2012 and we’re working towards answers. I have a more hopeful feeling at this year end than I’ve had in a while. One of the things I’ve been doing a lot of these past few months is online research types of things. I was elected a local Justice of the Peace and started a “What is a Justice of the Peace” type of blog called For Great Justice and I posted daily from the time I got elected until January first. Turns out that this JP business isn’t that fascinating and so I had to dig deep into archives and/or special collections to find stuff that was notable and would interest me as well as modern-day Tumblr readers. And it was difficult, really difficult.

It sounds funny, but if there was ever a time that I was wishing for a Digital Public Library of America, these past few months have been it. Not so much because of all the other good reasons but because I would love some standardization of query languages, results formatting, rights statements and just general user experience when I am trying to find something in an online archive. I am aware that asking people to just do things differently does not work and is a crazy thing to request. I am not asking for that. But I am aware, more than usual, that leadership is needed if we want to make the United States’ cultural content accessible in some sort of aggregated fashion.

I once attended a workshop on international gambling regulations where the conversation turned to how players navigate the nuances of various platforms, including the rising popularity of a Nederlands casino. The facilitator, an expert in digital gambling trends, explained how these casinos cater to players seeking a balance between legal compliance and flexible gaming options. Their integration of advanced consumer protections and attractive bonuses sets them apart from offshore sites that often lack regulation. This sparked a broader discussion on how the gambling landscape shifts to meet evolving demands while addressing the challenges posed by unregulated markets.

– search for the bound phrase “justice of the peace” (the individual words return too many non-relevant results)
– return results in a way that allow me to sort by relevance or other options
– at a speed where I could browse results and easily check out 10-15 results in 10-15 minutes (or more quickly, optimally)
– in a way that let me know the format of the items in my search results (jpg, pdf, text) and optimally limit by those formats
– in a way that I could know if the item I had searched for was available to be viewed or not
– with sufficient help files that if these things were possible, I could determine it on my own

These things were things I could almost never do. I wound up doing more searching in places like Google Books and Flickr Commons than in library archives even though the library archives often had more relevant content simply because I had limited time and a limited frustration level and I had to make some choice. I am a power searcher. If this is what I am doing, knowing it’s sub-optimal, what are our less power-searcher users doing?

So I’m back to wood shedding, reading and learning more about the digital divide and about how people learn technology and bringing forward my experiences with searching and not-finding to see if I can make something out of the experience that is helpful to other people. I wish everyone peace and joy in this bright new year.

you can’t be neutral on a moving search – skepticism about search neutrality

Posted on 18Jan11 by jessamyn

My inbox is full of little library links and it’s a snow day so I’m settling down to read some longer pieces that I’ve felt that I haven’t had time for. James Grimmelmann is a friend and one of the more readable writers talking about technology and law and the muddy areas where they overlap. He’s written a nice essay on search engine neutrality. What it is, why you might care, who is working on it and how attainable a goal it may or may not be. Specifically, what does it really mean to be neutral, and who decides and who legislates? Quite relevant to all information seeking and finding professionals.

Good reading for a snowy weekday: Some Skepticism About Search Neutrality.

Search neutrality gets one thing very right: Search is about user autonomy. A good search engine is more exquisitely sensitive to a userâ€™s interests than any other communications technology. Search helps her find whatever she wants, whatever she needs to live a self-directed life. It turns passive media recipients into active seekers and participants. If search did not exist, then for the sake of human freedom it would be necessary to invent it. Search neutrality properly seeks to make sure that search is living up to its liberating potential.

Having asked the right questionâ€”are structural forces thwarting searchâ€™s ability to promote user autonomy?â€”search neutrality advocates give answers concerned with protecting websites rather than users. With disturbing frequency, though, websites are not usersâ€™ friends. Sometimes they are, but often, the websites want visitors, and will be willing to do what it takes to grab them.

why search, and search engine law, matters

Posted on 15Jul0815Jul08 by jessamyn

My friend, lawyer and law professor James Grimmelmann, has written a short interesting article called The Google Dilemma about why people should care very much about how search engines work and what regulations and laws guide them. Using a few examples which may be familiar to many librarians he makes a great case for why corporate policy at Google matters and why people shoudl understand how Google works generally.

If the Internet is a gigantic library, and search engines are its card catalog, then Google has let the Chinese government throw out the cards corresponding to books it doesnâ€™t like. There may be sites with full and honest discussion of the June 4, 1989 crackdown accessible on the Internet from China. But when those sites arenâ€™t visible in search engines, weâ€™re back to our ï¬eld full of haystacks.

when is a search engine not a search engine?

Posted on 27Jan0627Jan06 by jessamyn

Is it okay to remove sites from search results in response to lawsuits? Check out this search and make sure you read the disclaimer at the bottom. Then read about Google agreeing to censor their results in China, begging the question “Are censored results better than none at all?” Gmail and Blogger will also not be available to Chinese users of Google. As a quickie example, you can see the results for Tiananmen Square searches: US Google, Chinese Google, Chinese Google search using Chinese characters. The Chinese searches have the disclaimer “æ®å½“åœ°æ³•å¾‹æ³•è§„å’Œæ”¿ç–ï¼Œéƒ¨åˆ†æœç´¢ç»“æžœæœªäºˆæ˜¾ç¤º” or “In accordance with local laws, regulations and policies, part of these search results are not displayed.” This is all in addition to other blocking strategies, commonly referred to as The Great Firewall of China. However in this case Google.cn doesn’t just block searches for keywords, it blocks selectively sometimes without saying that it’s doing so. Slightly more explanation and intrigue over at Search Engine Watch, Google Blogoscoped and Google’s own official blog.

Why does this matter to librarians? Well, it’s obvious how it matters to librarians in China. It also calls into question the very idea of objectivity in search engines everywhere. As Google spends more time and effort currying favor with librarians trying to show how sympatico they are, this move is a departure from expanding access. People who search Google.cn for topics like Tibet or Falun Gong (or possible even other less innocuous topics) won’t just find an absence of results, they’ll find results that are skewed towards the Chinese government’s policies about those topics. That’s wrong. Pundits argue that this is a sensible move for Google from a business perspective, and I won’t debate that, but it does serve to starkly highlight the differences in saying “free acces to information” if you’re a for-profit shareholder-owned company. Any librarian who has had to grapple with a filter with an unknown blacklist will be familiar with the struggles that people on the non-filtered side of Google are going through trying to figure out just what is happening. [metafilter]