Introduction
I wrote this article as supplementary reading for a talk on Vuvuzela. I hope it can be read as a stand-alone post as well but that isn’t the goal.
In order to understand Vuvuzela I feel it’s important to understand the limitations that the earlier mechanisms suffer from. This is a short post contrasting Onion Routers and Mix Networks. It is important to note that both these protocols fail to hide metadata. I’ll reiterate and give some more context to drive this point home, just knowing that they don’t hide metadata is enough for now. I’ve also linked to articles I’ve read in the past and also the ones I’ve stumbled upon whilst reading the said paper.
Prerequisites
You need to have a decent idea about the following concepts.
- Public / Private Keys and Digital Signatures. This and this provide a decent explanation as to how keys work in general.
- Proxy Servers.
Onion Router
-
A detailed explanation of the protocol and workings can be found here.
-
Use it under the assumption that the adversary cannot monitor all nodes. If you can protect just one server in the chain, you can be assured that the content of your request and response stay private. Under normal circumstances either source or destination or both remain hidden. If the first server is compromised, the source of the request (the client trying to visit say YouTube.com) is given away but the end destination YouTube.com stays hidden as everything in encrypted using multiple keys. Similarly if the last server is compromised the attacker will get to know the end destination (i.e YouTube.com) but will not know where it originated from.
-
The chain of servers chosen normally spans across diverse geographical boundaries. This makes it harder for perpetrators to manipulate network traffic as rules, regulations and privileges vary across political divisions.
-
Unlike mix networks, onion routers can be used for time sensitive data as well. Requests are forwarded immediately. Mix networks normally wait for quite sometime before they start forwardings requests. They are safer but the message takes longer to reach the final destination.
-
The client (i.e the user who’s using the Onion Router protocol to protect his privacy) needs public keys of all the servers in the chain. However each server can only decrypt it’s layer and the content stays hidden. Each server just takes it a step forward towards decryption but cannot actually see the content.
-
If the website visited by the user uses HTTPS, even the last server in the Onion chain cannot see the content even after decrypting all the layers. The last server will know the end destination of the request but it will not know the content of the request nor will it know the source of the request. It just knows that the second last server in the chain has sent it some encrypted data. The last server on decrypting the data, will find details of the destination server (say YouTube.com) and whom it has to forward the response to (the second last server, Onion Routers follow the same chain of servers).
Mix Networks
-
Please read this. It explains it pretty well. It also draws real life parallels to an observer in a crowd. It really clears things up.
-
You could watch this video if you have 12 minutes to spare. Not required though.
-
It provides an additional layer of protection. The content of the requests are protected even if the adversary monitors all the nodes in the network.
-
Does not depend on route reproducibility.
-
It cannot be used for time sensitive data.
-
It too is prone to meta data analysis.
-
I like to think of it this way. Tor plays with space alone (the bytes that you send across the network go through various other servers thus they aren’t where they are supposed to be) and mixnets play with both space and time (adds delays and shuffles the request through various servers as well). Thinking in terms of space and time helps clarify a lot of concepts as they comprise the state which we exist or can manipulate.
Security and Meta Data Abuse
-
Adversaries are very powerful and have great reach. They can control ISPs, cloud providers, assign people to monitor suspects/victims over very long durations of time etc. They therefore can block certain users, observe traffic flow patterns from a user to the server, between servers and so on.
-
It is fairly easy to know that the victim / suspect is connected to such a server (Tor, Mix Net etc) as they are generally publicly known, basic traffic observation will reveal that. It is not trivial to track his activity and prove that he did something particular. That’s key.
-
A patient adversary can easily follow patterns in case of Onion Routing. Say Alice visits abcxyz.com everyday at 8 PM, an adversary can block all other users other than Alice. He can then look for activity, he will notice that Alice’s network is active and the request flows through a particular chain. Since she’ll be the only person using the network it’s quite evident that she’s the one visiting the particular website.
-
An adversary can monitor the network and identify individuals by looking at request sizes. Say Alice, Bob and Charlie send a request to abcxyz.com or are engaged in a chat. Network traffic can be monitored at the entry and the last server. Alice sends abnormally large data compared to Bob and Charlie, so the adversary can find out the recipient of the request (abcxyz.com or chat server) from the last server in mix and correlate it with the sizes of the request sent by Alice, Bob and Charlie.
-
Mix nets make it harder to pin requests to a particular user but a patient adversary over many rounds can statistically correlate information to a user.
-
Adversaries can disconnect everyone except two parties and try figuring out if they are chatting or not by intercepting their traffic. If request request by Alice corresponds to a request to Bob, it isn’t far fetched to assume that they are chatting.
Scaling
Both these approaches are very scalable. A simple strategy would be to add more servers following the respective protocols and the clients are given an option to choose the chain of servers instead of using all of them (based on the load of each server). The client would then make appropriate requests with the given data. Say you are very paranoid in general and therefore choose a longer chain not limited by the boundaries of a single country. Since requests of a client does not have to go through all the servers, both these approaches are easily scalable. To the best of my knowledge and understanding both scale linearly.