assocfs note

While I'm still dabbling with fixing some SSS issues here and there I thought I'd post an old excerpt from assocfs design document.

It's a non-hierarchical filesystem - in other words, associative filesystem. It's basically a huge graph database. Every object is addressed by its hash (content addressable, like git), knowing the hash you can find it on disk. For more conventional searches (for those who does not know or does not care about the hash) there is metadata - attributes, drawn from an ontology and associated with a particular hashed blob.

This gives a few interesting properties:

It also has some problems:

Luckily, databases are a very well established field and building an efficient storage and retrieval on this basis is possible.

As a user you basically perform searches on attribute sets using something like humane representation of relational algebra. Blobs can have more conventional names, specified with extra attributes, for example: UNIX_PATH=/bin/bash UNIX_PATH=/usr/bin/bash allows single piece of code to be addressed by UNIX programs as both /bin/bash and /usr/bin/bash, without needing any symlinks.

You can assign absolutely any kind of attributes to blobs, the actual rules for assigning are specified in the ontology dictionary, which is part of the filesystem and grows together with it (e.g. installed programs may add attribute types to blobs). Security labels are also assigned to blobs that way.

Attributes "orient" blobs in filesystem space - without attributes the blobs are practically invisible, unless you happen to know their hash exactly. They also form a kind of semantic net between blobs giving a lot of information about their semantical meaning to the user and other subsystems.

Since recalculating hashes for entire huge files would be troublesome, the files are split up in smaller chunks, which are hashed independently and collected into a "record" object, similar to a git tree. Changing one chunk therefore requires rehashing only two much smaller objects, rather than entire huge file.

Changes to the filesystem may be recorded into a "change" object, similar to a git commit, which may be cryptographically signed and used for securely syncing filesystem changes between nodes.