Posted Wednesday, August 11, 2004 7:44:26 PM by Kim
Deep Web Searching: Now if this isn't geeky, I don't know what is. Marcus Zillman is the Executive Director of the Virtual Private Library, and as his life's work he has conducted extensive research into the part of the web known as "The Deep Web".
The Deep Web covers somewhere in the vicinity of 600 billion pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find about 3.3 billion pages at the time of this writing.
At the link above you can find a huge number of references all about the Deep Web and the steps that are being taken to access it.
When you think about the amount of stuff that is stored on government, business, and private servers all over the world it's pretty mind boggling the amount and kinds of data that can be retrieved, if you know how to look. Add in pages that are behind registration gateways (like the one at Community MX) and you have many millions of documents that the bots can't find.
For instance, a search at the GPO Access search engine allows you to dig into every U.S. federal budget going back to 1997, and every document in the Congressional record back to 1994. (Maybe I can find that tax refund I was sure I was supposed to get.)
There are way too many links on that page to go into them all, but some of my favorites are the massive databases you can search at the U.S. Library of Congress, the National Library of Canada, and the National Library of Australia, among others.
OK, I admit I find this kind of thing really interesting. Guess that lands me squarely in the geek corner.
Category tags: Using the Web