Tagging files
2019-04-23
Derek Zhou
In my previous blog post about homedir, I talked about how I organized my home dir into a flat tree, with many files in the same directory for easy handling with command line tools. However, to stay organized I have to be able the do some taxonomy for all the files I have, so they can be easily found for a certain purpose. Traditionally, a tree like structure is used, however this is very restrictive and also the method I am getting out. It would be nice to attach arbitrary tags (short strings) to each individual file, and provide an easy way to map tags to files and viceversa. The question is How?
Survey of methods
Here are the methods that I tried, with non-satisfactory result.
Internal tagging
Internal tagging is the method to embedded the tag strings inside the file. This method has been used in many applications:
- images can be tagged with exif tags
- musics can be tagged with ID3 tags
- a text file can be tagged with strings in the comment
The good thing is the tags live inside the file, so you will never lost it or mess it up unless the file it tagged is lost or messed up. However, it does has its own problems:
- There is no uniform way to access the tags
- Some files don't have an easy way to attach tags
- Sometimes you don't want to alter the content of the file at all when you add/change/remove tags
External tagging
External tagging is to to store the tags, and the association of files and tags externally to the files; possibly in a database or some flat files. The content of the files are never changed, and there can be a uniform way to query the tags and the associations. On the other hand, it also has its own problems:
- files could be moved or deleted, and the tagging info in the db could be stale
- It feels too heavy weight; some tools are required to query and manipulate the tagging info
Tagging with symlinks
The method that I settled down to is a knid of external tagging. However, instead of using a database, or hack up a database like flat file storage, I used a simple way that is based on maintaining symlinks. In each project that I wish to add tagging, I create a dir under project root called "tags". Inside, I have sub dirs, each named as the tag string that I want to use for this project. In each tag dir, I have nothing but symlinks, that link to the real files. Let me give you a concrate example. If I have files under doc dirs:
$ ls docs/
progress_report.odt resume.odt
And if I want to tag "progress_report.odt" to "work", and tag "resume.odt" to "personal", I will do:
$ cd tags/work
$ ln -s ../../docs/progress_report.odt .
$ cd ../personal
$ ln -s ../../docs/resume.odt .
That's it.
Working with symlink tags
Finding file with a given tag is super easy. You just do an "ls" under the desired tag directory.
Finding the tags that one file is associated to is also easy enough. you do an "find" command:
$ find tags -name "progress_report.odt"
tags/work/progress_report.odt
From time to time, you want to make sure the tags and files are consistant with each other; ie. to find out all dangling links. That is also easy:
$ find tags -xtype l
This command will find all dangling links for you. The inconsistancy will happen if you delete or move the files. However, in my new home dir organization I rarely need to move file to another location. If I did, then I do need to fix up the links by hand.
Emacs came to help
Emacs has an exellent file manger built in called dired. With dired, the task of linking, deleting links, and especially fixing up links can be much easier than doing it from the command line. One especially useful feature of dired is "writable" dired, or wdired, with hot key of "C-x C-q" the dir listing is turned into an editable buffer, and you have all the emacs editing power to change the file names or files it links to.
Git integration
The tags symlnks are regular symlinks that git handles already very well. To make sure everything works, one rule has to be followed: all the symlinks has to be relative symlinks, that links to files within the same project. This way the project can be moved, or "git clone" to another place or another computer, with all links still valid.
stowed package
A package can also be stowed, together with all the symlinks tags, if you stick to the rule of always using relative symlinks. Many stowed packages can be virtually merged into a mega package, with all tags merged as well.
Conclusion
I manged external tag info that require nothing but day-to-day command line tools. Dired can be a boost in productivity but not a must. The tags can be searched, merged, maintained with ease.