As the SNEEP project continues to take form I have begun work on the tagging component. I will avoid the various arguments about the best way to scope tags (repo based vs third party). The nature of the SNEEP project suggests that developing tags as a repository based plugin is the way to go. In my view this is the right approach as long as issues such as exporting to and (preferably) interacting with third party tagging sites is addressed.
The tagging feature has been recognised as a vital part of any self respecting set of web2.0 extensions. We’ve already had requests for them from the Linnean Society who would like to see them alongside the prototype comments and bookmarks already on their site. Consensus on the importance and usefulness of tags seemed to be one of the (many) themes to have arisen from the EPrints & Web2.0 pow wow.
It was also suggested that tags can be and are used in varying ways (eleven apparently!). For example bookmarks can be thought of as a tag under which an arbitrary set of references to things can be stored for the use of a single person. If bookmarks are implemented as a ‘view’ of the tagging system they immediately gain all the extra good things that tags offer (sharing, etc). Other uses mentioned were ‘rating’ (by tagging an item as ‘good’, ‘bad’, ‘terrible’) or even as a way of organising tasks (‘to read’, ‘to buy’, ‘to review’).
With this in mind I want to make sure that the SNEEP.tags database structure is as flexible, extensible and scalable as possible. A bit of effort on the DB side of things should hopefully mean that when it comes to producing views and interfaces for the tags the various uses can be easily accommodated and if appropriate (as in the case of bookmarks) re-branded as a new feature.
One resource on database design for tags is this webinar. I haven’t seen it in action as it appears to be significantly awkward to get hold of a Linux version of the Webex software needed to view it. I have attached the slides which seem to give a good gist. It is written from the point of view of MySQL so it is very much about how to get the most out of the DB no matter the amount of content.
The examples in the slides correlate to a tagging system for a blog, so tags are related to posts. This approach, tagging elements by id in a closed system, is not difficult to map to EPrints where tags can be related to items (eprints) or indeed anything with a unique id (documents, comments, users).
One aspect of this approach that struck me is whether it might be difficult to do searches which combine the various item types. My approach so far has been to maintain an ‘all’ counter in the stas table. Any thoughts on the merits or otherwise of the approach outlined here would be welcome.