Implementing Tags in a Database??

I was considering why del.icio.us doesn’t offer a cloud for tag based queries, while it does offer the cloud for User based queries, and I received the reply that it was (probably) related to query cost, which set me thinking. What would the del.icio.us database look like? The rest of this blog article looks at how one might model the bookmarks and give a clue as to how one might implement it.

How about this?

a bookmark data model

Is this right, look at the para below, the requirement that a tag is used by someone to describe the bookmark is lost in above version of the model. Surely we have only three entities, User, Tag & Bookmark (also known as URL). Both User & Tag have a many to many relationship with each Bookmark. We can resolve these many2manies in two ways, by using an allocation table, or by adopting a meta model model.

The data model must take into account the fact that a User describes a bookmark using one or more tags. i.e. DaveLevy describes http://www.guardian.co.uk as (UK web World News politics). Other people have used different tags, although only a few associated it with journalism. Each of these user, bookmark, tag relationships must be stored separately; otherwise we loose the user’s relationship and ownership of the tag set. This implies/mandates that somewhere a user/bookmark/tag database object (either table or index is required). This means that the bookmark entity must be related to the user by an allocation. This is required to implement the many-to-many, but also to allow the tags to be owned by a user. The tag must also have an allocation between the Bookmark & itself but is owned by the defining user i.e. again we must have a User/Bookmark/Tag intersection entity. The model immediately below meets this requirement, although implementing the Tag attribute as a membership of the tag – user/bookmark relationship and implementing it as a foreign key is also a possibility.

Optimised ERD for a bookmarks file

I have not documented the name of the relationship between tag, bookmark and their allocations. This becomes quite hard because we have transformed the entity into “operational masters” and they are likely to be implemented as indexes. The difficulty in naming the relationship between these entities and the allocation entities implies that we have modelled the problem well. Neither does the diagram above illustrate mandatory/optional attributes of the relationships. The Knows of relationship is optional. During the period between registration and the first bookmark, a User will have zero user/bookmark allocations. I suppose it is possible to enter a Bookmark without tagging it, which makes the Described by relationship optional as well. If either of the allocations exist though, their masters must exist.

It should be noted that any domain of definition can be applied to the bookmark entity (or at least its key). By stating that the bookmark must be (say) a roller article URL, we have a viable tag model for roller articles. In fact, I have considered opening a new del.icio.us account exclusively to act as a blog index and to provide a tag/cloud map for this blog.

I am next going to look at some queries and the relational algebra that can be applied to this model; the reason I developed the model was to examine the performance implications of different entry points.

ooOOOoo

Originally posted on my sun/oracle blog, and reposted in March 2016, the old blog doesn’t have the pictures.

1 Comments.

  1. This seems a bit long for the obviousness of the comments. The model consists of a list of bookmarks with foreign keys to their multiple tags and user. It’s an example of a problem that was the beginning of the end of the relational database.