Greetings and Salutations: I realize that this is not specifically a Bugtraq issue, but I have posted this to Usenet to the Privacy forums and received little to no response. I also consider Bugtraq to be the haven of the most premier security analysts available on "The Internet". I would appreciate your comments on the below and request that you reply directly to my e-mail address. Thank you Ken Anonymous Anonymity - Request For Comments "I think paranoia can be instructive in the right doses. Paranoia is a skill." - John Shirley This document is available / updated at the following: http://digital.net/~gandalf/Anonymous_Anonymity.htm I would like to first ask the community to read this and comment on the "Issues" section. I am struggling with the how to fix the issues presented. If you can conceive a better way to fix the first issue I would appreciate that input. If there is a solution that is already well known, please tell me. Thanks. Table Of Contents: 1) Abstract 2) High Level Description 3) Description 4) Issues Abstract: The current state of anonymous proxies do not provide adequate protection for the entity wishing to preserve their anonymity. Anonymous remailers and their ISP's have had court orders to have their logs subpoenaed in court (i). There is also a "trust" that the anonymous proxy is truly anonymous. Given that Country "C" restricts access to certain sites on "The Internet" located in country "A". Also given that country "C" wishes to gain knowledge of which of its citizens are trying to access restricted sites, country "C" could set up anonymous proxies in country "N" to monitor its own citizens. In addition if country "C" wished to monitor already popular anonymous sites for traffic, they could install a employee in the offices of the ISP that serves the popular anonymous site and have that employee surreptitiously monitor the traffic going to / leaving that site. Proposed is a truly anonymous system wherein no one entity has a complete picture of the transaction. This system can be installed on a corporate LAN (Local Area Network) to allow anonymous access of "sensitive" data (Example Anonymous employee suggestions, Human Resources "sensitive" procedures / documentation (medical forms, complaint procedures)) or it can be installed on "The Internet". I have seen the statement "Information Wants to Be Free". I would revise that statement to "Information Will Be Free". The information does not care one way or the other. But humans, simply by their curiosity and need to explore ideas will make the information free. High Level Description: The software will facilitate the transfer of files (HTTP, FTP, etc.) between two computers using anonymous proxies. Every machine will have "the least" amount of knowledge to make the transfer possible. One computer (the end point) will have access to the data and will know the intermediary proxy but will not know what computer the file is ultimately destined for. Another computer (the intermediary server or the intermediary proxy) will know what two computers the file is being transferred between but will not know the contents of the file. The last computer (The destination / anonymous machine) will know what the file is and who the proxy is, but not where the file is coming from. When the software is launched, it decides how much bandwidth is available for the connection. If it is a low bandwidth then the machine will perform the services of an Intermediate Proxy or End Point. If high bandwidth then the machine can perform as a Intermediary Server and / or as a Intermediary Proxy. This information is only known by the machine that runs the software, it is not told to any other computer. This way nobody know if a computer is a server or just a transfer agent. Connections are made to other computers, requests are sent out for additional connections until "enough" (depending on bandwidth) connections are made. Scalability is not an issue, as connections / servers are overloaded traffic will simply be dropped or passed onto other servers that are not as loaded. Searches are passed to all connected machines. If the operator makes a selection then that data is transferred to the machine. Searches are performed via full URI Scheme (ii) request, by words or phrases contained in the file or by filename (or parts thereof). Files retrieved (either from "The Internet" or from another machine) are saved in cache on each machine. When the file cache is full, the files that haven't been accessed for the longest time are deleted. This allows for a "shadow" Internet, sites that are censored or deleted are still available via the Anonymous Anonymity network. I have looked at The Freenet Project (iii), and they deserve the credit in this project for the idea of a "shadow" Internet, but the Anonymous Anonymity Network is fundamentally different. On The Freenet Project web pages are published only on the The Freenet Project (and does not allow for searching), the Anonymous Anonymity Network allows for searching of not only files on the Anonymous Anonymity Network but also the anonymous transfer of files into the Anonymous Anonymity Network from "The Internet", thus connecting "The Internet" with the Anonymous Anonymity Network. Also, the files do not have to be passed from node to node to get to the final destination (as in The Freenet Project), they are fetched and sent (via one hop) to the final requestor. Detailed Description: There are up to five devices involved in each transaction. 1) Destination Machine - The machine that wishes to remain anonymous 2) Intermediary Server. 3) Intermediary Proxy. 4) End point - HTTP anonymous Proxy or file server 5) The (HTTP, FTP, NNTP, etc) server that the Anonymous Machine wishes to reach. With this anonymous network, as with the original design of "The Internet", there is no central server. The software is initiated on the users machine. The bandwidth is detected: 1) "Low Bandwidth" - Less than 512 kilobits / second the machine establishes itself mainly as a Intermediary Proxy / End Point. 2) "High Bandwidth" - Greater than 512 kilobits per second and TCP port 80 inbound allowed, the machine establishes itself mainly as a Intermediary Server. All connections / communications will use the HTTPEncode encoding. HTTPEncode uses the same idea as UUEncode with a slight difference. Whereas UUEncode takes binary data and encodes it into "plain text", HTTPEncode takes that binary data one step further. The binary data is not only encoded to ASCII characters, the HTTPEncode will create HTTP wrappers that add HTTP tags to the beginning and end of the data, and throw in random HTML tags inside the data. The encoding will also redistribute the character count so that the end product has approximately the same character distribution as "normal" HTML pages. This is to avoid transport layer --> application layer firewalls that look for tunneling over port 80. When the software is installed the user is asked if they have any filtering software that blocks what sites they are able to go to / monitors what sites they go to. If they do then their machine is not allowed to be an end point that fetches "fresh" web pages. Any firewall devices will have to be set up to allow inbound port 80 (or port "X", user defined (since some ISP's block port 80)) connections. If this cannot be done then this machine is primarily a outbound / Intermediary Proxy connect machine. The software then attempts connection to a Intermediary Server. The IP address of an initial intermediary server can be entered manually or downloaded from a web site. The IP addresses are checked against WhoIs or ARIN to see if they are geographically diverse. IPv6 will make this process easier because it addresses according to the location of the machine. Servers that are "farthest away" will be chosen over servers that are "close". Intermediary servers should have port 80 open as an inbound connection so that they appear to be another web server. If the machine has determined that it has the capabilities to be a Intermediary Server then it should allow connections to itself as a Intermediary Server. The machine should also search for other Intermediary Servers so that requests are distributed between many servers. Note: If inbound port 80 cannot be established then that machine can still act as a server by making the port 80 outbound connection when asked to by another machine (see next paragraph handing off to another server). Obviously when the port 80 outbound connection is made two way communications can then ensue. If a server has too many nodes, it should pass any new connections off to another server and notify the machine that is trying to connect of this handoff so that it can establish a direct connection to the other server. If an Intermediary Proxy is using more than 50% of its bandwidth proxying connections, then additional connection requests should be denied. All connections / communications should be encrypted with the exception of the request. Each connection creates a unique encryption public/private key pair for use in communication (this is so that the user cannot be identified by using the same public key over and over again). Routing - Since data is routed node to node, the routing will not allow least cost (efficient) routing. Just individual direct connections will be in the routing table (IP Address / Search Request). Data would be "routed" by each node keeping a table of incoming IP Address / search request hashes paired with outgoing IP Address / search request hashes. The route back being (of course) the path of pairs of IP Address's and search request hashes that are related. This gives each node the "least knowledge" of the source and destination. An Intermediary Server should not know whether a node that is connected is another Intermediary Server or a Anonymous Machine or an End Point. Searches can be of the form: 1) URI Scheme request (http, https, ftp, gopher, file, etc) 2) File Name (or parts of file name) 3) Data in file (words, phrases, ANDed words or ORed words) The search request is added to the public key and hashed, this is to make each search unique. This is referred to as the search hash. The search with a unique public key and search hash is passed from the Anonymous Machine to all Intermediary Servers. When a search request is seen, a lookup of the search hash is made on the server in the "already known searches" search table and if the hash of the search matches a already received search, the search is dropped (this search has already been through this machine). If the search is not dropped, the search hash is stored in a lookup table with the IP Address that the search was received from. The Intermediary Server passes the entire search to all Intermediary Servers Anonymous Machines and End Point machines it knows except for the machine the search request came from (the server doesn't know what "kind" of machine it is connected to). If an end point machine can satisfy the request / has matches for the request then that data relating to the request is encrypted using the public key and passed back to the Intermediary Server with the search response hash number. The response data from the end point machine is the URI, the URI hash, the size of the file and the date of the file (when the file was created). When positive responses are received then those responses are returned via the routing (above) to the IP Address that initiated the request. The Anonymous Machine then (by operator choice or by random) chooses a one of the hashes to act on the request. If no node has the URI available in cache then a node that can connect to the URI is chosen. If any node returns a hash indicating that they have the file, then a second search request is sent out via another connection (i.e do not send the hash request out via the server that the original search response came in from) using the hash as the search request. Nodes that have that hash return the hash and hash dictionary (see below). If the file is large, this search hash / hash dictionary will allow the Anonymous Machine to transfer parts of the file from many sources. The Anonymous Machine will also be able to offer portions of the file out when as they are received if other machines are looking for that same file. Note: To further obfuscate the "real" requests, Anonymous Machines should take random incoming requests / pick random words and send them out as fake requests to Intermediary Servers. Results from these fake requests are, of course, ignored. When the search table fills, requests are dropped in a FIFO manner for a specific IP Address. If someone tries to flood the network with requests to empty the tables, only the IP Address they are connected to will suffer, not other IP Address's. Note: The positive responses to the search may be a form of "I can act as your proxy for that URL, but I don't have the URL" or "I have the entire URL, and this is the last date that I accessed that page plus here is the hash of the data on that page". The operator can choose whether they want a copy (possibly stale) or if they want to chose a proxy that can get the current page. All links on that page are different files that are searched / requested for. Additionally (in this manner) the Anonymous Anonymity network could host its own WWW network where those pages were only accessible to someone connected to the Anonymous Anonymity network, or via a machine proxying for the Anonymous Anonymity network. When the Anonymous Requester receives a request that is acceptable, a connection request is sent along the path that is in the response data using the IP Address / search hash connection pair generated in the previous paragraph. This connection request has a new public key associated with the request. The Intermediary Proxy Server sends out a request on all connections for a proxy and randomly chooses one of those responses and requests that Intermediary Proxy IP address. The IP address of this Intermediary Proxy is sent to the path of both the End Point machine and the Anonymous Requester. The End Point machine and the Anonymous Requester set up connections with the Intermediary Proxy on TCP Port 80. Again, data is encrypted and then HTTPEncoded. The Intermediary Proxy knows the source and destination, but not what data is being exchanged. When the data exchange is complete the connection is terminated. The whole idea behind this network is for each node to know the minimum information for the system to work. The less a node knows the less information that can be pieced together to get the whole picture. In training for Security Clearances the quote goes something like "Unclassified information can easily be combined to reveal classified information." File name: The file name is retuned with a SHA-2 hash and a SHA-2 hash dictionary. The SHA-2 hash is just a SHA-2 hash of the file. The SHA-2 Hash Dictionary is a SHA-2 hash of "X" bytes of the file (where "X" is size of file / 1023 and where "X" is greater than 32 Kbytes). The Anonymous Machine would request chunk "y" of the file from the End Point. These requests would continue until the Anonymous Machine has all the chunks it needs or until the connection is broken. In this manner the Anonymous Machine could be requesting parts of a particular file while also sending out parts of a particular file to other users. If the file is less than 32 MBytes then the hash table would be 32 Kbytes chunks of the file with the number of hashes indicated in the hash table. This hash allows (in the case, for example, of large FTP URI Scheme requests) requests to be made of parts of the file being requested if it is a large file. The file hash and the hash segment of the file would be requested, therefore several machines could be sending parts of the file to the anonymous requester at the same time. Issues: 1) The issue with a party owning the server and the anonymous proxier and / or the intermediary machine. This is essentially the Man In The Middle attack. The attacker "owns" the server in the middle which directs the anonymous machine to proxies and end point devices that it also controls, therefore the server knows the anonymous machine and what they are requesting. Same thing if the attacker wants to find out what files are on the end point machine, they act like the anonymous requester and the intermediary servers / proxies and make requests. While this issue is not completely solved by the above scheme, it is mitigated by the Anonymous Machine searching on the hash after the initial responses are received. Even if the server is acting as a man in the middle, the server would need to maintain a table of URIs / hashes returned. As the network grows this table would become huge. 2) HTTPS connections. The HTTPS transfer would require several data requests that would require the end point to serve up multiple pages to the anonymous requester. the Man In The Middle attack would be mitigated by the fact that the anonymous requester would be able to verify the SSL certificate of the site that they are visiting. 3) Abuse of the anonymous system by someone who is stalking, etc. The IP address of the proxier is the address that shows up on the logs and stalking / spamming / etc. would be blamed on whoever owns the IP Proxier address. 4) Not being able to make HTTP requests that divulge the end stations IP address. (Example http://www.whatismyip.com/ ) 5) Creation of HHTPEncode algorithm that ensures even letter distribution / HTML format of data. 6) Spammers - Assuming that the this system is programmed in open source, you will (at some time) have some smart spammer figure out a way to redirect HTTP requests to them and they will serve out their own spamvertized pages. Same with data files, nodes could put out data files that have nothing to do with the request made. A local file should be kept where the user can ignore all responses from a specific connection or ignore a specific hash. The file would be only locally significant because if it became global then nefarious people could "poison" sites that are serving out good information and say that these are "bad" sites. 7) Thomas J. Boschloo wrote "The problem remains, how to download this software without drawing attention onto oneself!" i) Newman, Ron and Copeland, Frank "The Church of Scientology vs. Grady Ward" (Specifically "Scientology targets ISPs and anonymous remailers"). URL: http://www.xs4all.nl/~kspaink/cos/rnewman/grady/home.html Wednesday, July 24, 1996 (Accessed July 4, 2005) ii) IANA Registry of URI Schemes "Uniform Resource Identifier (URI) SCHEMES". URL: http://www.iana.org/assignments/uri-schemes 03 June 2005 (Accessed July 4, 2005) iii) The Freenet Project "The Freenet Project - index - beginner". URL: http://freenetproject.org, 04 July 2005 I would appreciate any and all comments on the above Anonymous Anonymity network. Specifically any solutions to the presented problems or if someone has already covered this ground I would appreciate pointers to their work. Thank you for your comments. Ken Hollis --------------------------------------------------------------- Do not meddle in the affairs of wizards for they are subtle and quick to anger. Ken Hollis - Gandalf The White - gand...@xxxxxxxxxxx - O- TINLC WWW Page - http://digital.net/~gandalf/ Trace E-Mail forgery - http://digital.net/~gandalf/spamfaq.html Trolls crossposts - http://digital.net/~gandalf/trollfaq.html Woodworking For Geeks - http://digital.net/~gandalf/woodmain.htm