> ... the substantial information disclosure and possible misappropriation o...

trotsky · on April 26, 2011

I agree that random collisions is an unlikely attack vector. However, there is not a general understanding that disclosing sha256 hashes is the same as disclosing the file. Imagine a social engineering attack that requested employees run a 'sha256sum ~/Documents/* > hashes.txt' and mail the results with the explanation that this is to make sure they have no infected documents/old versions/unauthorized files on their hard drives. Many people would be willing to do something like that if it appeared to be from a legitimate source, but if they had been asked to email all their documents they'd be much more unlikely to comply.

Hashes are also disclosed in other ways. In certain cases security researchers will reveal a hash of a file publicly to provide proof of a file that might contain a proof of concept exploit against a privately disclosed bug - with the idea that the contents of the file could be revealed at a later date. If someone the researcher shared that file with privately placed it on dropbox, that file could be revealed publicly.

Online AV systems could be another form of disclosure. Many "online scan" products report the hashes of local files back to the server for malware detection - it is faster to upload your hashes than download the hashes of the many millions of signatures a product can scan for.

Another version of this is virustotal.com or similar services that will scan a submitted file against a large number of AV products. The resultant scans include the sha256 hash and are often publicly accessible, while the contents of the file isn't. In the days after several recent Adobe flash 0-days, virustotal reports on infected documents were reported publicly days before the bug was fixed or the actual exploit was publicly revealed. Here is one such example for CVE-2011-0611 submitted on 4/9/2011, made public on 4/11/2011 but no patch was available until 4/15/2011: http://www.virustotal.com/file-scan/report.html?id=1e677420d...

Granted, all of these presume that sensitive files are being placed on dropbox when they probably shouldn't be. But these things do happen.

As far as information disclosure, someone who has a legitimate copy of a file could then use the hash to determine if the file is being leaked off site or distributed inappropriately. This may be seen as a feature to some document owners, but it could serve to detect exfiltration that one might otherwise agree with. Whistle blowers come to mind. If you suspected a leak, one might provide slightly different copies of a sensitive document to a group of employees and see if any of the hashes appeared on dropbox after admonishing them to not allow the file to leave the enterprise.

I understand that many of these concerns could be dismissed with well, they already have bad document handling procedures, etc. Which would be valid, however in the real world a lot of poor behavior goes on. I'm just listing these as examples of the kind of problems that could arise, I'm not trying to take a stand on how likely any of the attacks might be.

speleding · on April 26, 2011

It would still allow collision attacks though. There are probably a lot of legal and medical documents (recipes) that only differ in a few words, such as name and date of birth. By trying a bunch of combinations you can test if those documents exist.

knobst · on April 27, 2011

The collision attacks outlined above still work, with a regular dropbox account, no dropship needed. You can create 100,000 attack files, and then upload each one. The ones that don't actually transmit bytes show you that the file exists. (EG a highly regular file like some health or banking record...) Its just watching if de-duplication happens or not.

They need to patch that hole, I think by requiring everything to upload, then deduplicate on the server...

Which is another way of saying what speleding points out.