Excerpt: THE TROUBLE WITH GOOGLE
A
mess can be rewarding as the object of random browsing, as for example
at a flea market, through a stack of Sunday newspaper flyers, among the
boxes of a departed loved one's mementos and documents, or at a cafe
well-situated for people-watching on a busy street. The Web is a
giant mess, too. Aside from a few notable exceptions, such as Web
censorship in China and the prosecution of Web gambling promoters and
pornographers in the U.S., anyone can put up any sort of Website they
want. There is no inherent unifying or overarching structure to
the aggregate content, other than the technical rules followed in
making the content displayable and the Website reachable through an
address.
But the sort of roam-and-scan browsing that
works well in a flea market isn't really an option with the Web.
There's no easily accessible list of all Websites to browse through,
and even if there were it's hard to imagine how a list of more than
eight billion Web pages could be arranged to allow your eye to be
caught by the pages likely to hold interest. Eight billion, if
you're wondering, is the number of Web pages Google claims to
"index"--that is, sort through to allow searching for text. By
enabling you to hunt down Websites containing certain words, Google and
other search engines impose a form of order on the Web that makes
fantastically useful what would otherwise be paralyzingly cluttered and
inaccessible.
Google's impact is moving beyond Web text
searches as it throws itself into shopping, maps, books and
images. It could even spill into the physical world, as the
growing availability of tiny, dirt-cheap chips with built-in radio
transmitters is expected to create an opportunity for Google to let
people call up the exact location of their car keys or their
children. Imagine being able to simply toss things into your
attic without any concern for any form of organization, and then going
to Google when you needed one of the items and calling up its position
within the attic. In short, Google is a wonderful way to get your
hands on a specific piece of information out of a vast trove that no
one has to waste time putting in order.
Googling has become such a routine,
comfortable and seemingly effective part of everyday life that it's
easy to overlook its drawbacks. One of them is what Bret
Rappaport calls "the shade tree problem." Rappaport, as you may
recall, is the natural-landscape-championing lawyer from Chapter 2; he
tends to put things in floratic terms. Imagine, he says, a
paralegal in a law firm asked to research case law relating to a Texas
client's ire with a neighbor whose tree has grown to overhang the
client's lawn, preventing part of the lawn from getting enough sun to
survive. The paralegal would likely run to "Lexis"--the legal
world's version of Google--and enter in the keywords "tree," "lawn,"
"neighbor" and "shade." A few cases pop up, and are dutifully
handed over, wrapping up the chore in five minutes. But 30 years
ago, says Rappaport, the paralegal would have hit the Texas law books,
running his or her finger over topic listings and indexes, perhaps
intending to look up "trees," but noticing there are also sublistings
for treehouses, oaks and bushes. In leafing through the book to
check out some of the indicated cases, other cases leap out as
interesting and possibly relevant. Perhaps it takes half an hour,
but in the end the paralegal uncovers what turns out to be the most
useful case in the books, one in which a vine invaded a neighbor's
swimming pool, and in which the words "tree" and "lawn" never
appear. What's more, this more prolonged and varied hunt has
imbued the paralegal with a bit of perspective and even expertise in
the subject that could come in handy in this case or another one.
Over time and many such hunts, the expertise will extend to a range of
topics. In other words, the very imprecision and inefficiencies
of the conventional search process compared to Googling provides better
results and a measure of enrichment, if at a cost in time.
A number of search engines have tried to get
around the flaws in Google-style exact-match searching.
Clusty.com is one of many that use "clustering," a technique that
returns not just Web pages that match your exact queries, but also
other search categories that might be fruitful. Thus a search on
"hot dog," for example, yields a suggestion to look at Web pages
related to "sausages," "eating contests," and "recipes"--the sorts of
categories that you might run across if you were browsing through a
directory for "hot dog." A company called Fast offers different
search engines--some based on technology developed to help the U.S.
military hunt down information useful in combating terrorism--that can
derive a certain amount of meaning and context from the text it
scans. That might allow it, for example, to distinguish sites
that label certain movies as "bombs" from sites that offer recipes for
homemade bombs, and to throw in sites that include the word "explosive"
but don't mention the word "bomb." A firm called Cymfony provides
a search service that scours blogs as well as the Web to track down
information with a particular subjective content--for example, in
searching out references to a company, it can distinguish those that
praise the company from those that flame it.
There are also innumerable search engines that
specialize in a subset of the Web, or in searching specific non-Web
databases, and these engines often prompt searchers to dig deeper by
enlisting specific criteria. A prominent example is Amazon.com's
book-searching engine, which allows one to search its catalog with an
eye for books that offer an unusually large number of words per pound,
among many other odd but conceivably useful categories. A number
of companies are working to make video searchable. IBM, for one,
struck a deal with the NFL in 2005 to do so with all the NFL's game
films, allowing a coach to call up in five seconds, for example, all of
an upcoming opponent's third-down plays with four yards to go for a
first down--the sort of task that had previously taken six hours.
And then there's the search engine BananaSlug, which adds a random
search term, such as "coral," or "justice," to your specified search
terms. Strange as it sounds, you simply have to try it to
appreciate its utility.
Some of these techniques work by trying to
impose more order on the search process than Googling does, and some
less. One manager at the search company Fast routinely draws on a
portfolio of 30 different search engines to track down
information. In the end, there probably will never be one search
technique, or one particular level of order, that will always prove
appropriate or sufficient for dealing with a giant mess, and especially
one as gigantic as the Web. And somehow, that seems fitting.