Posted on March 22, 2019 by Abhirath Mahipal
In the context of a call things like who did you call, location, duration of the call, the number of times you called up the person etc. Anything that describes additional parameters about the content (recording of the phone call in this case) can be termed as metadata.
Verizon Injects Custom Headers to Un-Encrypted Traffic is an interesting read. It doesn’t directly address metadata but custom headers definitely fall under metadata. The article will give you some insight as how to such info can be used.
State of the art privacy mechanisms even go to the extent of hiding your screen size and resolution as it can be used to identify a whistleblower, journalist etc amongst a pool of their suspects. Ganesh mentioned about an interesting flaw in browsers which goes by the name Canvas Fingerprinting.
Some systems offer strong privacy guarantees but do not scale. Some systems broadcast every message to every user. Other systems use Private Information Retrieval (PIR) to hide metadata. Simplest PIR technique is to send the entire database to every user who requests for information. This way observing parties (using Man in the Middle Attacks for instance) will not get to know which particular information (or message from a recipient) they were trying to access.
Systems that scale do not hide metadata and a persistent adversary can easily correlate activities with a particular individual. Tor and Mixnets are such systems. A powerful adversary can manipulate traffic and several nodes. Say amongst thousand users he stops Charlie from accessing a website and the same very same instance the adversary notices that the particular website stops receiving traffic. My manipulating and making the right moves over a period of time the identity of the user can come to light.
I like to think of it this way. Tor plays with space alone (the bytes that you send across the network go through various other servers thus they aren’t where they are supposed to be) and mixnets play with both space and time (adds delays and shuffles the request through various servers as well). Thinking in terms of space and time helps clarify a lot of concepts as they comprise the state which we exist or can manipulate.
Vuvuzela uses two protocols to operate. One protocol to initiate conversations and another one to exchange messages between parties who’ve already established connection (by sharing keys i.e).
The authors expresses it’s privacy guarantees in terms on differential privacy. I don’t really understand them in depth but basically the adversary can never be sure. His suspicions on a person can increase with time but he can never be sure if he was involved in the communication or not as to the external observing party everything looks the same.
Vuvuzela uses a lot of network bandwidth to operate. Also it stores all the incoming and outgoing messages in volatile memory. Additionally you would also need multiple servers following the Vuvuzela to add enough noise or cover traffic. So hardware costs definitely lean on the higher side.