Crowd-sourced labels, names and IA via external folksonomies

AGIMO - WPG Review Team
The Department of Finance Archive

The content on this page and other Finance archive pages is provided to assist research and may contain references to activities or policies that have no current application. See the full archive disclaimer.


We wanted to investigate the possibilities of attaching tags or folksonomies to our content. Our original Content Management System (CMS) didn’t have the ability to support internal tagging, so we decided to investigate options for externally-sourced tags. During some of our early tinkering, we explored what would be required to support a variety of ways to get tagging of content.

We examined the following four options:

  1. Extracting a 'feed' of tags associated with a given URL and using this as a way of providing additional navigation (e.g., “Your Tags")
  2. Using a controlled, content author-selected vocabulary that encompasses most of Option one (“Our Tags”) – this required a change in CMS, which has now become in scope
  3. Using a combination of Options one and two (“Our Tags” and a separate “Your Tags")
  4. Examining existing user-generated tags from Option one to determine whether:
    • The existing classification vocabulary needs to be broadened
    • Page names and internal structures need to be refined
    • Pages need to be ‘clumped’ or ‘split’ differently
    • Query expansions need to be included (e.g. making sure searches for "Web2.0" automatically include "Web 2.0")

We found that the first option, while technically straightforward, would require ongoing manual monitoring to ensure that tags comply with a moderation policy similar to the one on this blog. During our research, we discovered that most tags for our site were for the front page, with only a few tags for internal pages. We questioned the value to users with such a limited set of tags. The second option would also require ongoing monitoring (checking vocabulary against terms actually used), although this could be scheduled to occur several times a year, with findings fed back into controlled, distributed vocabularies. It would also require a non-trivial once-off effort from content authors to tag existing content and ensure these were applied consistently to new content. Option three – seeking to distinguish between ‘official’ and ‘unofficial’ categories – would require all of the above to occur. The last option was the most involved in the short term and the one we used as part of the testing process. It did increase our understanding of how our users perceive and define the content on our site.

Gleaning value from tags created by users

We primarily used delicious to generate reports on its own holdings related to our content. The initial scan on the home page gave us an overview of which tags we could safely ignore:

  • web
  • guidelines
  • government
  • standards
  • australia
  • webdesign
  • agimo
  • accessibility

These terms – already widely used across the site (often directly in the domain name) – give effective results when used in our internal search as well as external search services. The middle and lower parts of the tail revealed a few more gems (remember, this  is only from tags applied to the home page):

  • css
  • html
  • methodology
  • webdev
  • webstandards

These are tags that only generate a handful of results from the internal search. In the case of 'css', none of our content makes any reference to this very important web technology. Concatenated phrases may also be candidates for inclusion in our query expansion lists. Scanning through some of the tags applied to our frequently visited content revealed a few more previously unidentified tags:

Not all the tags we discovered were quite as useful. We chose to class some tags as "probably very important to the individual who tagged it, but unlikely to be helpful for our general audience". Some of these included:

  • net25
  • 4.2
  • 5.2
  • govx

In the end, we felt that implementing Option one would only provide minimal value. The limited amount of extra information gleaned was not really worth the effort required in the long run. However, tapping into existing external resources and comparing their holdings against our own proved to be a valuable exercise in refining our own terms and handling of 'fuzzy' phrasings (Option four) For the moment, we’ve decided to further explore Option two – using content author-generated tags. Option two will give content authors a chance to group the content in multiple locations – unlike the current 'single home' imposed by a strict hierarchical taxonomy. This in turn should give users a content discovery mechanism, but without the burden of continual moderation. The harder question that persists is: "How do we ensure that we continue to use terms that are being used by our audience?" Scheduled folksonomy comparisons (i.e., performing Option four semi-regularly) would require fewer ongoing resources, but may not result in any increases in user-generated tags. We’re considering whether to recommend it as an ongoing BAU activity and how often it would be worthwhile. Have you faced this issue? Please feel free to share any similar experiences (or dilemmas) you have had below.

Comments (6)

I suggest you look at fixing the problem not "tinkering" (around the edges).
Get a proper CMS that will do the job!

Glad to see that you're looking to approach this from both a top down (AGIMO defined tag/keyword library) and bottom up (user generated tagging, or folksonomic) perspective. The CMS that you use is certainly going to have a significant bearing on what will be possible. In terms of ongoing monitoring, my vote is that yes, it should be a BAU activity to help you continually refine the IA and keyword schema that you're applying to your content. If you structure your CMS/site to take account of linkbacks and follow the threads forward, then you'll have a chance to see how your links are being tagged at the target site (if they are at all). I'd place it as a monthly site hygiene task..will be watching this thread with some interest, see what else emerges..

I just received an email saying there was a new commenton this page (because I am tracking this page). I visit the page only to find no new comments and in fact the last one posted 20th May. Was the new post rejected or deleted after I was notified about it?

Darren, that comment was a piece of spam which passed through our automated filter and made it onto the blog. The team pulled it down as soon as we became aware of it.

We've changed the settings on the email notifications so that they'll now include a brief excerpt of the comment. The idea behind this is to allow our subscribers to quickly see if the comment is of interest or if it's just spam, and to make it easier to track down the comment when they visit the blog.

Thanks for this. Probably helps explain the situation to others also.
This one looked a bit "sus" from minute 1 as it was from the Ugg Boot Co or similar!

Comments on this post are now closed. Please let us know if you would like to discuss this post.

Last updated: 27 July 2016