Sunday, September 30, 2012

Tag based vs hierarchical file system structure

Imagine your hard disk, home drive, My Documents folder or by extension your mailbox. You probably have some kind of hierarchical structure sorting your holiday pictures and bank statements in separate folders, instead of keeping them all in the same folder.
But the hierarchy of those folders will probably look remarkably the same : a folder for each year, maybe divided in folders per month, with some kind self chosen structure, based on the name of your bank, the type of account, or where you went on holiday.
Now imagine that you have a savings account with two different banks. You will probably store the statements of these accounts in a hierarchical folder structure. So you will end up storing the statements of the first account with bank x in this folder :


\banks\bank_x\savings\year\month

And those of bank y in this folder :

\banks\bank_y\savings\year\month

They look remarkably the same, don't they, apart from the bank name.
Now add a folder structure for your mobile phone and land line invoices and those for gas and electricity, and you will get this kind of structure :

\invoices\telco\company_x\mobile\year\month
\invoices\telco\company_y\landline\year\month
\invoices\energy\company_w\gas\year\month
\invoices\energy\company_z\electricity\year\month

Do you see a pattern emerging?

Now imagine you are paying your invoices and are trying to get an overview of your spending and income. You start in your bank account folder and go down the year/month structure to get to this month's statements. Next you want to have look at an invoice for the same month, but to reach this folder you need go all the way up to the root folder, and then go down the structure of the invoices and companies, and then to the folder for the current year and month.

Wouldn't it be convenient to turn this hierarchy around, and sort it this way :

\year\month\service_x\company_y\type_z

But I guess you would probably want both structures? Depending on what you are trying to find, you would prefer one structure over an other. Or maybe there is a third/fourth structure that could be convenient.

Now image, that instead of dropping a file in a folder structure, being it an invoice, bank statement, holiday picture or love letter, you would assign tags to it. In case of an invoice from your telco for your mobile phone, it would probably get these tags :

telco_x, mobile, invoice, year, month

Tag based filtering


Now, imagine there is a filtering system. By filtering on the tags year and month, you will get an overview of the invoices and bank statements of that year/month. No need to go up and down folder structures to reach the documents of the same month.

Custom tag based hierarchies

Or if you would define a hierarchy based on these tags, you can still display your documents in a hierarchical way, or in another hierarchical way, if you define a different hierarchy, without the need of having to copy your documents to the other folder structure, or having to create symbolic links or shortcuts.

Creating extra tags

And if at some point you need an extra tag, f.e. you want to make a distinction between the mobile invoices of every family member, just add a tag with their names to the invoices. No need to create a separate folder structure for every family member.

Conclusion

So you end up with a system where you can still have your preferred hierarchical folder structure. But you can as easily have other custom folder structures and you have the flexibility to filter on files/documents in any way you like, as long as you assigned a tag for it to a file.

Application

These kind of systems exist, f.e. Gmail gives you the ability to assign tags to E-mails. Then these tags are used to sort your E-mails. An E-mail from a colleague about a sport activity at work or a could get the tags work and sports, while a mail from your club, could also have the tag sports.
When looking in the tag based folders sports you will see both mails, while looking in the folder work, you will get only the mail from your colleague.

Tag based file systems

Some examples, in no particular order :

4 comments:

isync said...

Fuse based TagLayer might be interesting for you.

Don M said...

Nice explanation. Can I have both, say a partially tag-based hierarchy?

I understand the point of driving hierarchy from tagging but:
Unlike hierarchical approach, you need to know the tag values to start looking. Six months after saving a file I may not remember any of tags. A hierarchy will remind me. Sure it is not neat but I at least know where to start looking. Also there is the operational risk of the unknown unknown. If I fail to use the right keywords my results may show what a appears to be a complete list with one small but crucial item missing.

Therein lies the issue. There is no hierarchy to tags. Or maybe there is but I have not been able to find a good approach in this regard. I would prefer visual hierarchy driven, perhaps,off of required top-level tags and then have additional tags to bucket items further out.

I am speaking now of personal or corporate drives/intranet sites not of WWW.

BR
Don

Dieter Adriaenssens said...

Good point, Don.
Using a tagcloud might help? Or some kind of filter that shows the tags that are used on files that match your current set of tags?
Fe. you start searching by 'invoice', and then you start typing 'elec' and an autocomplete function suggests 'electricity'.

Uberdork said...

https://code.google.com/p/dhtfs/

My two cents....

I have to say though, that it would be difficult to do this kind of archiving for all files.
It would seem more fitting to be able to use this type of file system only for user files.
I really can't see how to manage tagging the millions of files residing in a modern PCs file system....