There’s a lot more to the web than meets the eye…
The Internet is a group of interconnected networks and the visible World Wide Web is a network of hyperlinks on web pages, linking to each other. Web crawlers like Google bot traverse the World Wide Web following those hyperlinks, indexing what they find. The result is the ability to type “deep dark web site:obamaconspiracy.org” and end up viewing this article.
A search engine requires three things: that the page be linked from somewhere else, that there be a network connection between the search engine and the server (no firewall, VPN, etc.), and that the content is not protected by a password. One additional provision exists for most search engines, and that is that the web site not have instructions that ask the crawler not to index certain pages.
All that works pretty well for consumers, but a lot of the web gets left out. One estimate is that only 4% of the web is represented in Google’s collection of ~8 billion pages. For example, if you search for:
“OARPA” DARPA technology
you’ll end up with some interesting results, plus articles on this web site, but not everything that exists.
The Obots Advanced Research Projects Administration (OARPA), of which I am a part, is interested in accessing the dark web (aka “the deep web”) using the DARPA technology code-named Memex. The goals of Memex are:
- Development of next-generation of search technologies to revolutionize the discovery, organization and presentation of domain-specific content
- Creation of a new domain-specific search paradigm to discover relevant content and organize it in ways that are more immediately useful to specific tasks
- Extension of current search capabilities to the deep web and nontraditional content
- Improved interfaces for military, government and commercial enterprises to find and organize publically available information on the Internet
That’s what they tell the “public.”