An excellent point. So capturing the bandwidth savings and the storage cost savings suggests an interesting mid-tier between longer term storage and shorter term.
There's no bandwidth cost associated with transferring data TO Tarsnap (that is, data transfer into EC2/S3 is free of charge), and if you have to get a file out the cost is the same whether you store files or "blocks" (I'm assuming the block disassembly is done on EC2).
So our hypothetical Tarsnap alternative could store file metadata in S3, recently modified files in S3 also (quicker retrieval) and push older files into Glacier (can be retrieved when needed, cheaper storage if (as is likely) never accessed).
It will take marginally more time and bandwidth to transfer a file every time there is a change than to only transfer changed blocks (i.e. parts of files that have changed) but for 90% of users I would bet the (significant?) decrease in long-term storage cost would make that worthwhile.
If you read the article, you don't have to assume (incorrectly) that the client sends the whole file for each backup. Indeed, a significant motivation is to avoid sending very large files across bandwidth that is constrained (or costly) on the client side.
> I currently have about 1500 such archives stored. Instead of uploading the entire 38 GB — which would require a 100 Mbps uplink, far beyond what Canadian residential ISPs provide — Tarsnap splits this 38 GB into somewhere around 700,000 blocks, and for each of these blocks, Tarsnap checks if the data was uploaded as part of an earlier archive.
"There's no bandwidth cost associated with transferring data TO Tarsnap"
Except for those of us who's outbound bandwidth isn't free and/or unlimited. It makes no difference what Amazon charges Tarsnap for it - I can't upload all of my 128G SSD every hour - I just don't have the bandwidth to do it, and if I _did_ have the bandwidth, it'd probably send me broke pretty quickly (or have my ISP throttle me or cut me off).
It's not a Tarsnap cost, but it can be a Tarsnap user cost, and I suspect cperciva considers that just as much a "real cost" as actual monetary expenses to Tarsnap.
The point of a dedupe-based service like Tarsnap is that while logically you upload the whole disk, physically only the changed blocks are sent to Tarsnap.