Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My point was that there don't need to be files anywhere to represent data at all. Computers can work entirely without files. For example, your whole hard drive could consist of an RDBMS, where you'd not "download" files, but rather "download" streams of tables, which would import directly as tables into the RDBMS.

"Files" are a very specific abstraction; thinking they're the only way to transfer chunks of data around is a symptom of lack of imagination. There are very similar abstractions to files, such as object-store objects. The only practical difference between files and objects is that updates to objects are transactional+atomic, such that nobody ever sees an "in progress" object. But an object store (backed by a block device) is a simpler system than a filesystem (backed by a block device.)

You can control the objects that represent your data. You could also control, via an RBAC-like system, the K-V or tuple-store records that represent your data. Or the Merkle-tree commits. Or the blockchain transactions. Or the graph edges. Or the live actor processes holding in-memory state. You can transfer all of these things around between computers, both by replication (copying) and by migration (moving.) What do files have to do with any of this? An accident of history is what.



They are not the only way to transfer chunks of data.

They are the most successful and versatile mass way to transfer chunks of data and define ownership in the history of computing.

I’m sure an RDBMS or a graph DB can do those things as well. But no one has succeeded in doing it even close to as effectively as files managed to. And many have tried. In fact, probably the greatest computer software failure of all time, Windows Longhorn, was largely a failure in trying to replace a file based system with a graph DB based system.

People very much can imagine alternatives. There are no shortage of imaginable alternatives. There is a huge shortage of successful in use alternatives that are as versatile or effective as files.


You're focusing on "files" as they compare to things very different from them. But imagine for a moment what an OS with an object store in place of a filesystem, would be like. Pretty much exactly the same, except that the scratch buffers backing temp files and databases wouldn't hang off the object-store "tree", but rather would either be anonymous (from mmap(2)), or would be represented by a device node (i.e. a logical volume) rather than being objects themselves. All the freestanding read-only asset bundles, executables, documents, etc. would stay the same, since these were always objects being emulated under a filesystem to begin with.

And downloads would also be objects. Because, when you think about it, at least over HTTP, downloads and uploads already are of objects—the source doesn't get allocated a scratch buffer on the destination that it can then freely seek(2) around and write(2) into; instead, the source just streams a representation to the dest, that gets buffered until it's complete, and then a new object is constructed on the dest from that full local stream-buffered copy. (WebDAV introduces some file semantics into HTTP's representational object semantics, but it doesn't actually go all the way to enabling you to mount a DBMS over WebDAV.) Other protocols are similar (e.g. FTP; SMTP.) Even BitTorrent is working with objects, once you realize that it's the pieces of your files that are the objects. Rsync is the only weird protocol, that would really need to be reimplemented in terms of syscalls to allocate explicit durable scratch buffers. (That and SMB/NFS/AFP/etc., but those are protocols with the explicit goal of exposing a share on a remote host as something with filesystem semantics, so you'd kind of expect them to need filesystem support on the local machine.)

Now, want to know something interesting? We already have this. Any inherently copy-on-write filesystem, like APFS or btrfs, is actually an object store masquerading as a filesystem. You get filesystem semantics, but they're layered on on top of object-storage semantics, and it's more efficient when you strip them away and use the object storage semantics directly (like when using btrfs send/receive, or when telling APFS to directly clone a container.) And these filesystems also have exactly what I mentioned above: special syscalls (or in this case, file attributes) to allocate scratch buffers that bypass Copy-on-Write, for things like databases.

There's no reason that a modern ground-up OS (e.g. Google's Fuchsia) would need to use a filesystem rather than an object store. A constructive proof's already there that it can be done, just obscured a bit behind a need for legacy compatibility; a need that wouldn't be there in a ground-up OS design.

(Or, you can take as a constructive proof any "cloud native" unikernel design that just uses IaaS object/KV/tuple/document-storage service requests as its "syscalls", and has no local persistent storage whatsoever, never even bothering to understand block devices attached to it by its hypervisor.)


> "Files" are a very specific abstraction; thinking they're the only way to transfer chunks of data around is a symptom of lack of imagination.

I didn't say they were the only way to transfer chunks of data around. I said they were the units of ownership of data. If your data is somewhere in a huge RDBMS mixed together with lots of other people's data, you don't own it, because you don't control its fate; whoever owns and manages the RDBMS does. The same goes for all the other object control and storage systems you mention: individual people who have personal data don't own any of those things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: