Searching on the Internet today can be compared to dragging a net across the surface of the ocean.
While a great deal may be caught in the net, there is still a wealth of information that is deep, and therefore, missed. The reason is simple: Most of the Web's information is buried far down on dynamically generated sites, and standard search engines never find it.
According to Wikipedia, The Deep Web (also called the Deepnet, the Invisible Web, the Undernet or the hidden Web) is World Wide Web content that is not part of the Surface Web, which is indexable by standard search engines.
The Deep Web is the set of information resources on the World Wide Web not reported by normal search engines.
According to several researches, the principal search engines index only a small portion of the overall web content, the remaining part is unknown to the majority of web users.
What do you think if you were told that under our feet, there is a world larger than ours and much more crowded? We will literally be shocked, and this is the reaction of those individual who can understand the existence of the Deep Web, a network of interconnected systems, are not indexed, having a size hundreds of times higher than the current web, around 500 times.
Very exhaustive is the definition provided by the founder of BrightPlanet, Mike Bergman, that compared searching on the Internet today to dragging a net across the surface of the ocean: a great deal may be caught in the net, but there is a wealth of information that is deep and therefore missed.
- Dynamic content: dynamic pages which are returned in response to a submitted query or accessed only through a form, especially if open-domain input elements (such as text fields) are used; such fields are hard to navigate without domain knowledge.
- Unlinked content: pages which are not linked to by other pages, which may prevent Web crawling programs from accessing the content. This content is referred to as pages without backlinks (or inlinks).
- Private Web: sites that require registration and login (password-protected resources).
- Contextual Web: pages with content varying for different access contexts (e.g., ranges of client IP addresses or previous navigation sequence).
- Limited access content: sites that limit access to their pages in a technical way (e.g., using the Robots Exclusion Standard, CAPTCHAs, or no-cache Pragma HTTP headers which prohibit search engines from browsing them and creating cached copies).
- Non-HTML/text content: textual content encoded in multimedia (image or video) files or specific file formats not handled by search engines.
- Text content using the Gopher protocol and files hosted on FTP that are not indexed by most search engines. Engines such as Google do not index pages outside of HTTP or HTTPS.
As we will see this interpretation of the Deep Web is deeply wrong, we are facing with a network definitely different from the usual web but in many ways repeats the same issues in a different sense.
Accessing the Deep Web
To access the deep web, We will need to make use of the Tor Network.
What is a Tor? How to preserve the anonymity?
Tor is the acronym of “The onion router”, a system implemented to enable online anonymity. Tor client software routes Internet traffic through a worldwide volunteer network of servers hiding user’s information eluding any activities of monitoring.
As usually happen, the project was born in military sector, sponsored the US Naval Research Laboratory and from 2004 to 2005 it was supported by the Electronic Frontier Foundation.
Actually the software is under development and maintenance of Tor Project. A user that navigate using Tor it’s difficult to trace ensuring his privacy because the data are encrypted multiple times passing through nodes, Tor relays, of the network.
Connecting to the Tor networkImagine a typical scenario where Alice desire to be connected with Bob using the Tor network. Let’s see step by step how it is possible.
She makes an unencrypted connection to a centralized directory server containing the addresses of Tor nodes. After receiving the address list from the directory server the Tor client software will connect to a random node (the entry node), through an encrypted connection. The entry node would make an encrypted connection to a random second node which would in turn do the same to connect to a random third Tor node. The process goes on until it involves a node (exit node) connected to the destination.
Consider that during Tor routing, in each connection, the Tor node are randomly chosen and the same node cannot be used twice in the same path.
To ensure anonymity the connections have a fixed duration. Every ten minutes to avoid statistical analysis that could compromise the user’s privacy, the client software changes the entry node
Up to now we have considered an ideal situation in which a user accesses the network only to connect to another. To further complicate the discussion, in a real scenario, the node Alice could in turn be used as a node for routing purposes with other established connections between other users.
A malevolent third party would not be able to know which connection is initiated as a user and which as node making impossible the monitoring of the communications.
After this necessary parenthesis on Tor network routing, we are ready to enter the Deep Web simply using the Tor software from the official web site of the project. Tor is able to work on all the existing platforms and many add-ons make simple they integration in existing applications, including web browsers. Despite the network has been projected to protect user’s privacy, to be really anonymous it’s suggested to go though a VPN.
A better mode to navigate inside the deep web is to use the Tails OS distribution which is bootable from any machine without leaving any trace on the host. Once the Tor Bundle is installed, it comes with its own portable Firefox version, ideal for anonymous navigation due an appropriate control of installed plugins, in the commercial version in fact common plugins could expose our identity.
Once inside the network, where is it possible to go and what is it possible to find?
Well once inside the deep web, we must understand that the navigation is quite different from ordinary web, every research is more complex due the absence of indexing of the content.
A user that start it’s navigation in the Deep Web have to know that a common way to list the content is to adopt collection of Wikis and BBS-like sites which have the main purpose to aggregate links categorizing them in more suitable groups of consulting. Another difference that user has to take in mind is that instead of classic extensions (e.g. .com, .net, .org) the domains in the Deep Web generally end with the .onion suffix.
Following a short list of links that have made famous the Deep Web published on Pastebin
Cleaned Hidden Wiki should be a also a good starting point for the first navigations
Be careful, some content are labeled with common used tag such as CP= child porn, PD is pedophile, stay far from them.
The Deep Web is considered the place where every thing is possible, you can find every kind of material and services for sale, most of them illegal. The hidden web offers to cybercrime great business opportunity, hacking services, malware, stolen credit cards, weapons.
The deep Web is estimated to be about 500 times larger than the surface Web, with, on average, about three times higher quality based on our document scoring methods on a per-document basis. On an absolute basis, total deep Web quality exceeds that of the surface Web by thousands of times. Total number of deep Web sites likely exceeds 200,000 today and is growing rapidly. Content on the deep Web has meaning and importance for every information seeker and market. More than 95% of deep Web information is publicly available without restriction. The deep Web also appears to be the fastest growing information component of the Web.
We all know the potentiality of the e-commerce in ordinary web and its impressive growth in last couple of years, well now imagine the Deep Web market that is more that 500 times bigger and where there is no legal limits on the odds to sell. We are facing with amazing business controlled by ciber criminal organizations.
Speaking of dark market we cannot avoid to mention Silk Road web site, an online marketplace located in the Deep Web, the majority of its products are derived from illegal activities. Of course it’s not the only one, many other markets are managed to address specify products, believe me, many of them are terrifying.
The figure below displays the distribution of deep Web sites by type of content.
More than half of all deep Web sites feature topical databases. Topical databases plus large internal site documents and archived publications make up nearly 80% of all deep Web sites. Purchase-transaction sites — including true shopping sites with auctions and classifieds — account for another 10% or so of sites. The other eight categories collectively account for the remaining 10% or so of sites.
Most transactions on the Deep Web accept BitCoinsystem for payments allowing the purchase of any kind of products preserving the anonymity of the transaction, encouraging the development of trade in respect to any kind of illegal activities. We are being faced with an autonomous system that allows the exercise of criminal activities while ensuring the anonymity of transactions and the inability to track down the criminals.
The most important findings from the analysis of the deep Web are that there is massive and meaningful content not discoverable with conventional search technology and that there is a nearly uniform lack of awareness that this critical content even exists.
I will provide more information regarding this topic in the near future. In the meantime, find below a summary and some key facts regarding the Deep Web:
· Public information on the deep Web is currently 400 to 550 times larger than the commonly defined World Wide Web.
· The deep Web contains 7,500 terabytes of information compared to 19 terabytes of information in the surface Web.
· The deep Web contains nearly 550 billion individual documents compared to the 1 billion of the surface Web.
· More than 200,000 deep Web sites presently exist.
· Sixty of the largest deep-Web sites collectively contain about 750 terabytes of information — sufficient by themselves to exceed the size of the surface Web forty times.
· The deep Web is the largest growing category of new information on the Internet.
· Deep Web sites tend to be narrower, with deeper content, than conventional surface sites.
· Total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface Web.
· Deep Web content is highly relevant to every information need, market, and domain.
· More than half of the deep Web content resides in topic-specific databases.
· A full ninety-five per cent of the deep Web is publicly accessible information — not subject to fees or subscriptions.