Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, 1) is zip with compression into single file, 2) is zip without compression into multiple files. You can also combine the two. And in all cases, you need a container format.

The tasks are related enough that I don't really see the problem here.



I meant that they should be separate tools that can be piped together. For example: you have 1 directory of many files (1Gb in total)

`zip out.zip dir/`

This results in a single out.zip file that is, let's say 500Mb (1:2 compression)

If you want to shard it, you have a separate tool, let's call it `shard` that works on any type of byte streams:

`shard -I out.zip -O out_shards/ --shard_size 100Mb`

This results in `out_shards/1.shard, ..., out_shards/5.shard`, each of 100Mb each.

And then you have the opposite: `unshard` (back into 1 zip file) and `unzip`.

No need for 'sharding' to exist as a feature in the zip utility.

And... if you want only the shard from the get go without the original 1 file archive, you can do something like:

`zip dir/ | shard -O out_shards/`

Now, these can be copied to the floppy disks (as discussed above) or sent via the network etc. The main thing here is that the sharding tool works on bytes only (doesn't know if it's an mp4 file, a zip file, a txt file etc.) and does no compression and the zip tool does no sharding but optimizes compression.


In unix, that is split https://en.wikipedia.org/wiki/Split_(Unix) (and its companion cat).

The problem is that on DOS (and Windows), it didn't have the unix philosophy of a tool that did one thing well and you couldn't depend on the necessary small tools being available. Thus, each compression tool also included its own file spanning system.

https://en.wikipedia.org/wiki/File_spanning


The key thing that you get by integrating the two tools is the ability to more easily extract a single file from a multipart archive— Instead of having to reconstruct the entire file, you can look in the part/diskette with the index to find out which other part/diskette you need to use to get at the file you want.


Don't forget that with this two-step method, you also require enough diskspace to hold the entire ZIP archive before it's sharded.

AFAIK you can create a ZIP archive saved to floppy disks even if your source hard disk has low/almost no free space.

Phil Katz (creator of the ZIP file format) had a different set of design constraints.


The problem seems to be that each individual split part is valid in itself. This means that the entire file, with the central directory at the end, can diverge from each entry. This is the original issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: