|
Politech
mailing list archives
Google's SafeSearch is overzealous, blocks innocuous domains [fs]
From: Declan McCullagh <declan () well com>
Date: Fri, 23 Apr 2004 09:22:52 -0400
http://news.com.com/2100-1032_3-5198125.html?tag=nefd.lede
Google's chastity belt too tight
Last modified: April 23, 2004, 4:00 AM PDT
By Declan McCullagh
Staff Writer, CNET News.com
PartsExpress.com proudly touts itself as the Net's No. 1 source for
audio, video and speaker components--but online shoppers who rely on an
optional feature in the Google search engine to block porn sites would
never know it.
By an accident of spelling, the domain name of the Ohio electronics
retailer includes an unfortunate string of letters, "sex," which is
enough to block the Web site from Google's filtered results.
PartsExpress.com is not alone. A CNET News.com investigation shows that
Google's SafeSearch filter technology incorrectly blocks many innocuous
Web sites based solely on strings of letters such as "sex," "girls" or
"porn" embedded in their domain names.
Google's SafeSearch flaws are more than academic--they can have serious
consequences for innocent Web site operators blocked out by them. Google
is the most widely used search engine on the Web, and failure to appear
in its listings can have a direct impact on sales for some companies,
particularly smaller enterprises with limited marketing budgets.
Research company WebSideStory reported last month that Google claimed an
all-time high in search referrals, 41 percent of the United States
total, and the search giant's market share is steadily expanding.
"Traffic from Google can make or break a business," said Maria Medina,
whose family-run clothing business at ALittleGirlsBoutique.com doesn't
pass the SafeSearch censor. "Here I am, a mom of four children, creating
an at-home business that sells little girl dresses and accessories, in
order to spend more time with my children, and I have been filtered out
as not being family friendly. Ridiculous."
Matt Cutts, the Google engineer who designed SafeSearch four years ago,
said his algorithm looks for a "relatively small" number of trigger
words in a Web page's address. If one of those words appears, the
SafeSearch algorithm puts the address on a block list and does not take
the next step of evaluating the content of the site. "We try to find the
best trade-off of precision, recall and safety," Cutts said. "People who
opt in to SafeSearch are mostly OK with us being on the conservative side."
Cutts would not disclose how many Web searches are done with SafeSearch
enabled, saying only that it's a small percentage of the millions of
queries handled by Google each day. But the sloppy filter stands out as
a rare black eye for a company that prides itself on superior search
technology and boasts on its payroll one of the world's highest
concentrations of computer science doctoral degrees. Google claims
SafeSearch "uses advanced proprietary technology that checks keywords
and phrases" and filters out only Web pages "containing pornography and
explicit sexual content."
"That's not very bright," said Karen Schneider, a librarian who runs the
Librarians' Index to the Internet and has made a study of filtering
software. SafeSearch is "certainly evocative of the very primitive
CyberSitter-type tools of the mid-1990s--not a tool of fairly
sophisticated development."
[...remainder snipped...]
_______________________________________________
Politech mailing list
Archived at http://www.politechbot.com/
Moderated by Declan McCullagh (http://www.mccullagh.org/)
By Date
By Thread
Current thread:
- Google's SafeSearch is overzealous, blocks innocuous domains [fs] Declan McCullagh (Apr 23)
|