Anonymous Anonymity - Request For Comments

Gandalf The White <gandalf@xxxxxxxxxxx> · Sun, 17 Jul 2005 12:46:55 -0500

Greetings and Salutations:

I realize that this is not specifically a Bugtraq issue, but I have posted
this to Usenet to the Privacy forums and received little to no response.  I
also consider Bugtraq to be the haven of the most premier security analysts
available  on "The Internet".  I would appreciate your comments on the below
and request that you reply directly to my e-mail address.

Thank you

Ken

Anonymous Anonymity - Request For Comments

"I think paranoia can be instructive in the right doses. Paranoia is a
skill." - John Shirley

This document is available / updated at the following:
http://digital.net/~gandalf/Anonymous_Anonymity.htm

I would like to first ask the community to read this and comment on the
"Issues" section.  I am struggling with the how to fix the issues presented.
If you can conceive a better way to fix the first issue I would appreciate
that input.  If there is a solution that is already well known, please tell
me.  Thanks.

Table Of Contents:
1) Abstract
2) High Level Description
3) Description
4) Issues

Abstract:
The current state of anonymous proxies do not provide adequate protection
for the entity wishing to preserve their anonymity.  Anonymous remailers and
their ISP's have had court orders to have their logs subpoenaed in court
(i).  There is also a "trust" that the anonymous proxy is truly anonymous.

Given that Country "C" restricts access to certain sites on "The Internet"
located in country "A".  Also given that country "C" wishes to gain
knowledge of which of its citizens are trying to access restricted sites,
country "C" could set up anonymous proxies in country "N" to monitor its own
citizens.  In addition if country "C" wished to monitor already popular
anonymous sites for traffic, they could install a employee in the offices of
the ISP that serves the popular anonymous site and have that employee
surreptitiously monitor the traffic going to / leaving that site.

Proposed is a truly anonymous system wherein no one entity has a complete
picture of the transaction.  This system can be installed on a corporate LAN
(Local Area Network) to allow anonymous access of "sensitive" data (Example
Anonymous employee suggestions, Human Resources "sensitive" procedures /
documentation (medical forms, complaint procedures)) or it can be installed
on "The Internet".

I have seen the statement "Information Wants to Be Free".  I would revise
that statement to "Information Will Be Free".  The information does not care
one way or the other.  But humans, simply by their curiosity and need to
explore ideas will make the information free.

High Level Description:

The software will facilitate the transfer of files (HTTP, FTP, etc.) between
two computers using anonymous proxies.  Every machine will have "the least"
amount of knowledge to make the transfer possible.  One computer (the end
point) will have access to the data and will know the intermediary proxy but
will not know what computer the file is ultimately destined for.  Another
computer (the intermediary server or the intermediary proxy) will know what
two computers the file is being transferred between but will not know the
contents of the file.  The last computer (The destination / anonymous
machine) will know what the file is and who the proxy is, but not where the
file is coming from.

When the software is launched, it decides how much bandwidth is available
for the connection.  If it is a low bandwidth then the machine will perform
the services of an Intermediate Proxy or End Point.  If high bandwidth then
the machine can perform as a Intermediary Server and / or as a Intermediary
Proxy.  This information is only known by the machine that runs the
software, it is not told to any other computer.  This way nobody know if a
computer is a server or just a transfer agent.

Connections are made to other computers, requests are sent out for
additional connections until "enough" (depending on bandwidth) connections
are made.  Scalability is not an issue, as connections / servers are
overloaded traffic will simply be dropped or passed onto other servers that
are not as loaded.

Searches are passed to all connected machines.  If the operator makes a
selection then that data is transferred to the machine.  Searches are
performed via full URI Scheme (ii) request, by words or phrases contained in
the file or by filename (or parts thereof).  Files retrieved (either from
"The Internet" or from another machine) are saved in cache on each machine.
When the file cache is full, the files that haven't been accessed for the
longest time are deleted.  This allows for a "shadow" Internet, sites that
are censored or deleted are still available via the Anonymous Anonymity
network.

I have looked at The Freenet Project (iii), and they deserve the credit in
this project for the idea of a "shadow" Internet, but the Anonymous
Anonymity Network is fundamentally different. On The Freenet Project web
pages are published only on the The Freenet Project (and does not allow for
searching), the Anonymous Anonymity Network allows for searching of not only
files on the Anonymous Anonymity Network but also the anonymous transfer of
files into the Anonymous Anonymity Network from "The Internet", thus
connecting "The Internet" with the Anonymous Anonymity Network. Also, the
files do not have to be passed from node to node to get to the final
destination (as in The Freenet Project), they are fetched and sent (via one
hop) to the final requestor.

Detailed Description:
There are up to five devices involved in each transaction.
1) Destination Machine - The machine that wishes to remain anonymous
2) Intermediary Server.
3) Intermediary Proxy.
4) End point - HTTP anonymous Proxy or file server
5) The (HTTP, FTP, NNTP, etc) server that the Anonymous Machine wishes to
reach.

With this anonymous network, as with the original design of "The Internet",
there is no central server.  The software is initiated on the users machine.
The bandwidth is detected:
1) "Low Bandwidth" - Less than 512 kilobits / second the machine establishes
itself mainly as a Intermediary Proxy / End Point.
2) "High Bandwidth" - Greater than 512 kilobits per second and TCP port 80
inbound allowed, the machine establishes itself mainly as a Intermediary
Server.

All connections / communications will use the HTTPEncode encoding.
HTTPEncode uses the same idea as UUEncode with a slight difference.  Whereas
UUEncode takes binary data and encodes it into "plain text", HTTPEncode
takes that binary data one step further.  The binary data is not only
encoded to ASCII characters, the HTTPEncode will create HTTP wrappers that
add HTTP tags to the beginning and end of the data, and throw in random HTML
tags inside the data.  The encoding will also redistribute the character
count so that the end product has approximately the same character
distribution as "normal" HTML pages.  This is to avoid transport layer -->
application layer firewalls that look for tunneling over port 80.

When the software is installed the user is asked if they have any filtering
software that blocks what sites they are able to go to / monitors what sites
they go to.  If they do then their machine is not allowed to be an end point
that fetches "fresh" web pages.  Any firewall devices will have to be set up
to allow inbound port 80 (or port "X", user defined (since some ISP's block
port 80)) connections.  If this cannot be done then this machine is
primarily a outbound / Intermediary Proxy connect machine.

The software then attempts connection to a Intermediary Server.  The IP
address of an initial intermediary server can be entered manually or
downloaded from a web site.  The IP addresses are checked against WhoIs or
ARIN to see if they are geographically diverse.  IPv6 will make this process
easier because it addresses according to the location of the machine.
Servers that are "farthest away" will be chosen over servers that are
"close".  Intermediary servers should have port 80 open as an inbound
connection so that they appear to be another web server.  If the machine has
determined that it has the capabilities to be a Intermediary Server then it
should allow connections to itself as a Intermediary Server.  The machine
should also search for other Intermediary Servers so that requests are
distributed between many servers.  Note:  If inbound port 80 cannot be
established then that machine can still act as a server by making the port
80 outbound connection when asked to by another machine (see next paragraph
handing off to another server).  Obviously when the port 80 outbound
connection is made two way communications can then ensue.

If a server has too many nodes, it should pass any new connections off to
another server and notify the machine that is trying to connect of this
handoff so that it can establish a direct connection to the other server.
If an Intermediary Proxy is using more than 50% of its bandwidth proxying
connections, then additional connection requests should be denied.

All connections / communications should be encrypted with the exception of
the request.  Each connection creates a unique encryption public/private key
pair for use in communication (this is so that the user cannot be identified
by using the same public key over and over again).

Routing - Since data is routed node to node, the routing will not allow
least cost (efficient) routing.  Just individual direct connections will be
in the routing table (IP Address / Search Request).  Data would be "routed"
by each node keeping a table of incoming IP Address / search request hashes
paired with outgoing IP Address / search request hashes.  The route back
being (of course) the path of pairs of IP Address's and search request
hashes that are related.  This gives each node the "least knowledge" of the
source and destination.  An Intermediary Server should not know whether a
node that is connected is another Intermediary Server or a Anonymous Machine
or an End Point.

Searches can be of the form:
1) URI Scheme request (http, https, ftp, gopher, file, etc)
2) File Name (or parts of file name)
3) Data in file (words, phrases, ANDed words or ORed words)

The search request is added to the public key and hashed, this is to make
each search unique.  This is referred to as the search hash.  The search
with a unique public key and search hash is passed from the Anonymous
Machine to all Intermediary Servers.  When a search request is seen, a
lookup of the search hash is made on the server in the "already known
searches" search table and if the hash of the search matches a already
received search, the search is dropped (this search has already been through
this machine).  If the search is not dropped, the search hash is stored in a
lookup table with the IP Address that the search was received from.  The
Intermediary Server passes the entire search to all Intermediary Servers
Anonymous Machines and End Point machines it knows except for the machine
the search request came from (the server doesn't know what "kind" of machine
it is connected to).  If an end point machine can satisfy the request / has
matches for the request then that data relating to the request is encrypted
using the public key and passed back to the Intermediary Server with the
search response hash number.  The response data from the end point machine
is the URI, the URI hash, the size of the file and the date of the file
(when the file was created).  When positive responses are received then
those responses are returned via the routing (above) to the IP Address that
initiated the request.

The Anonymous Machine then (by operator choice or by random) chooses a one
of the hashes to act on the request.  If no node has the URI available in
cache then a node that can connect to the URI is chosen.  If any node
returns a hash indicating that they have the file, then a second search
request is sent out via another connection (i.e do not send the hash request
out via the server that the original search response came in from) using the
hash as the search request.  Nodes that have that hash return the hash and
hash dictionary (see below).  If the file is large, this search hash / hash
dictionary will allow the Anonymous Machine to transfer parts of the file
from many sources.  The Anonymous Machine will also be able to offer
portions of the file out when as they are received if other machines are
looking for that same file.  Note: To further obfuscate the "real" requests,
Anonymous Machines should take random incoming requests / pick random words
and send them out as fake requests to Intermediary Servers.  Results from
these fake requests are, of course, ignored.

When the search table fills, requests are dropped in a FIFO manner for a
specific IP Address.  If someone tries to flood the network with requests to
empty the tables, only the IP Address they are connected to will suffer, not
other IP Address's.

Note:  The positive responses to the search may be a form of "I can act as
your proxy for that URL, but I don't have the URL" or "I have the entire
URL, and this is the last date that I accessed that page plus here is the
hash of the data on that page".  The operator can choose whether they want a
copy (possibly stale) or if they want to chose a proxy that can get the
current page.  All links on that page are different files that are searched
/ requested for.  Additionally (in this manner) the Anonymous Anonymity
network could host its own WWW network where those pages were only
accessible to someone connected to the Anonymous Anonymity network, or via a
machine proxying for the Anonymous Anonymity network.

When the Anonymous Requester receives a request that is acceptable, a
connection request is sent along the path that is in the response data using
the IP Address / search hash connection pair generated in the previous
paragraph.  This connection request has a new public key associated with the
request.  The Intermediary Proxy Server sends out a request on all
connections for a proxy and randomly chooses one of those responses and
requests that Intermediary Proxy IP address.  The IP address of this
Intermediary Proxy is sent to the path of both the End Point machine and the
Anonymous Requester.

The End Point machine and the Anonymous Requester set up connections with
the Intermediary Proxy on TCP Port 80.  Again, data is encrypted and then
HTTPEncoded.  The Intermediary Proxy knows the source and destination, but
not what data is being exchanged.  When the data exchange is complete the
connection is terminated.

The whole idea behind this network is for each node to know the minimum
information for the system to work.  The less a node knows the less
information that can be pieced together to get the whole picture.  In
training for Security Clearances the quote goes something like "Unclassified
information can easily be combined to reveal classified information."

File name:
The file name is retuned with a SHA-2 hash and a SHA-2 hash dictionary.  The
SHA-2 hash is just a SHA-2 hash of the file.  The SHA-2 Hash Dictionary is a
SHA-2 hash of "X" bytes of the file (where "X" is size of file / 1023 and
where "X" is greater than 32 Kbytes).  The Anonymous Machine would request
chunk "y" of the file from the End Point.  These requests would continue
until the Anonymous Machine has all the chunks it needs or until the
connection is broken.  In this manner the Anonymous Machine could be
requesting parts of a particular file while also sending out parts of a
particular file to other users.  If the file is less than 32 MBytes then the
hash table would be 32 Kbytes chunks of the file with the number of hashes
indicated in the hash table.  This hash allows (in the case, for example, of
large FTP URI Scheme requests) requests to be made of parts of the file
being requested if it is a large file.  The file hash and the hash segment
of the file would be requested, therefore several machines could be sending
parts of the file to the anonymous requester at the same time.

Issues:
1) The issue with a party owning the server and the anonymous proxier and /
or the intermediary machine.  This is essentially the Man In The Middle
attack.  The attacker "owns" the server in the middle which directs the
anonymous machine to proxies and end point devices that it also controls,
therefore the server knows the anonymous machine and what they are
requesting.  Same thing if the attacker wants to find out what files are on
the end point machine, they act like the anonymous requester and the
intermediary servers / proxies and make requests.  While this issue is not
completely solved by the above scheme, it is mitigated by the Anonymous
Machine searching on the hash after the initial responses are received.
Even if the server is acting as a man in the middle, the server would need
to maintain a table of URIs / hashes returned.  As the network grows this
table would become huge.

2) HTTPS connections.  The HTTPS transfer would require several data
requests that would require the end point to serve up multiple pages to the
anonymous requester.  the Man In The Middle attack would be mitigated by the
fact that the anonymous requester would be able to verify the SSL
certificate of the site that they are visiting.

3) Abuse of the anonymous system by someone who is stalking, etc.  The IP
address of the proxier is the address that shows up on the logs and stalking
/ spamming / etc. would be blamed on whoever owns the IP Proxier address.

4) Not being able to make HTTP requests that divulge the end stations IP
address. (Example http://www.whatismyip.com/ )

5) Creation of HHTPEncode algorithm that ensures even letter distribution /
HTML format of data.

6) Spammers - Assuming that the this system is programmed in open source,
you will (at some time) have some smart spammer figure out a way to redirect
HTTP requests to them and they will serve out their own spamvertized pages.
Same with data files, nodes could put out data files that have nothing to do
with the request made.  A local file should be kept where the user can
ignore all responses from a specific connection or ignore a specific hash.
The file would be only locally significant because if it became global then
nefarious people could "poison" sites that are serving out good information
and say that these are "bad" sites.

7) Thomas J. Boschloo wrote "The problem remains, how to download this
software without drawing attention onto oneself!"

i) Newman, Ron and Copeland, Frank "The Church of Scientology vs. Grady
Ward" (Specifically "Scientology targets ISPs and anonymous remailers").
URL: http://www.xs4all.nl/~kspaink/cos/rnewman/grady/home.html Wednesday,
July 24, 1996 (Accessed July 4, 2005)
ii) IANA Registry of URI Schemes "Uniform Resource Identifier (URI)
SCHEMES".  URL: http://www.iana.org/assignments/uri-schemes 03 June 2005
(Accessed July 4, 2005)
iii) The Freenet Project "The Freenet Project - index - beginner".  URL:
http://freenetproject.org, 04 July 2005

I would appreciate any and all comments on the above Anonymous Anonymity
network.  Specifically any solutions to the presented problems or if someone
has already covered this ground I would appreciate pointers to their work.

Thank you for your comments.

Ken Hollis

---------------------------------------------------------------
Do not meddle in the affairs of wizards for they are subtle and
quick to anger.
Ken Hollis - Gandalf The White - gand...@xxxxxxxxxxx - O- TINLC
WWW Page - http://digital.net/~gandalf/
Trace E-Mail forgery - http://digital.net/~gandalf/spamfaq.html
Trolls crossposts - http://digital.net/~gandalf/trollfaq.html
Woodworking For Geeks - http://digital.net/~gandalf/woodmain.htm