Metadata – Part II

Author: 
Jacinta - Web Guide Team
Category: 
The Department of Finance Archive

The content on this page and other Finance archive pages is provided to assist research and may contain references to activities or policies that have no current application. See the full archive disclaimer.

 

There were a few questions from the previous post on the metadata we’re planning on using for the Web Guide, so we thought we’d break it down a little and go into the various decisions we made along the way.

Metadata framework

In the previous post, we explained how we choose metadata elements, some of which were mandatory, some of which we had to develop for our own purposes. Once we had chosen what we wanted, we had to break up the metadata into a framework. We put together a spreadsheet to help us map it all out. Listed below is an explanation for each column:

  • Element / Term Name– the name of the metadata tag
  • Definition – what it does
  • Scheme – what listing the metadata tag comes from. This is where we differentiate between existing published schemes (such as AGLS) and our own schemes (WG-something). Do we want to control the list of acceptable entries for that metadata? Controlled vocabularies provide standardised terms, but reduce the flexibility
  • Source – how will the tag been filled in? Either it will be common, something we can pull from the content management system (CMS) or an author will have to complete it manually (we tried to reduce the burden here so that there is a greater chance of useful compliance)
  • Page type metadata used for – we have a number of page types on the site – the home page, a menu page and a content page. Not all of them need the same types of metadata, so this field makes that explicit
  • Repeatable field – sometimes we’ll need more than once instance of a tag. For example when multiple agencies contribute to a piece of content, we’ll have multiple authors, indicated by this field
  • Comment – comments on the tag
  • Example – the syntax of the tag (in both RDFa and [X]HTML)

We ended up with a subset of what you would find at http://www.agls.gov.au/documents/aglsterms/#AGLSMetaTerms, plus a few additional elements.

Specialised metadata

One of the questions was why would we want to create metadata tags and schemas when there are so many already published? Why would we want to alter the standard? As David Bromage from the National Archives says, AGLS is extensible. Where a tag or schema doesn’t meet our needs, we can create our own. The trick is not to start with any particular metadata in mind, but to start with the end in mind. We want to point our information towards a particular set of roles, so we needed to tag content pages with which role they were associated with. The example we discussed in the last post (AGLS audience vs. making up our own) is one where we took an existing element and created our own schema. For others we needed to create the element from scratch. One of the major things people coming to the site want to know is whether a piece of guidance is mandatory or just a good idea. On the current site, we manually generate a list of mandatory items. But we wanted to change that by adding a tag. Because there is no existing element which is close to what we wanted (it’s a fairly unique element to government policy advice sites), we had to create WPGTERMS.compliance from scratch. Using AGLS.mandate would not be appropriate for identifying whether or not a page was mandatory for agencies, but would be useful for containing references to mandates related to all guidance pages. We identify that we created the element by calling it WPGTERMS (stands for Web Publishing Guide TERMS), and then made up a name for that element (compliance). We have to make up the scheme in a similar way that we did for the audience types and lastly we then need to make the controlled vocabulary we can use with that element. There’s a lot there, but it means we can finely control an element we exactly suits our needs. The final tag will look like: <meta name="WPGTERMS.Compliance" scheme="WPG" content=" mandatory, better practice " />

Close the loop

The last thing we are planning on doing with all of these specialisations we are creating is to release them back up to the internet – add to the extensible nature of the standard. That way, should another site need similar tags, we have already created them. They can copy ours. We’d like to provide an RDF Schema declaration on webguide.gov.au or xml.gov.au, referenced by pages on our own site, possibly in the following fashion: <link rel="schema.WPGTERMS" href="http://www.webguide.gov.au/wpgterms/terms/" /> Have you ever needed to extend AGLS or existing metadata standards? If so, why? How did you share the results of your extensions?

Comments (6)

It would be very useful to hear from the Web Guide Team how the AGLS metadata is being used to support the identification, findability and reuse of government content.

As I am now often reporting organic search engine referrals driving in excess of 40% of traffic to some agency sites, however little or no reuse of content based on AGLS scraping, it is hard to justify to management the value of committing resources to tweaking CMSes or having authors take on extra work to support extended AGLS use beyond the mandatory terms.

Cheers,

Craig

(I occasionally work with the Web Guide team, and frequently work on whole-of-government sites within AGIMO)

In addition to generating search engine referrals, agencies may choose to use metadata within their own sites to:

Express content in both human- and machine-readable forms (enabling reuse by other govt. and non-govt. entities)
Provide customisable views of the corpus (tailored by Audience or Agency Function, for example)
Provide dynamic slices of their site (show me only media releases, or only faqs, or only records from this geographic location or jurisdiction)

data.gov.au search – datasets in the NSW Jurisdiction

australia.gov.au search – services for migrants

For some government scenarios, this type of richness is not only useful, it’s required for full functionality. AGLS metadata ROI should be considered on a per-project basis - minimum mandatory application is usually just the homepage.

‘Hidden’ metadata is difficult to generate enthusiasm for – if your AGLS usage is exposed, and helps your users answers questions like:

Who is this service for?
Which agency provides this service?
How do I access this service?
What jurisdiction is the service provided to?
What type of document is this?

..and then goes the extra mile to help them find similar documents / services ‘on the fly’, I’d say that’s a reasonable and desirable outcome.

The other typical what, why, when and where questions are often best handled by Dublin Core.

Example ‘bigger picture’ .gov.au questions that may be trivial to answer if widespread adoption of AGLS/DC were to occur in the .gov.au space:

Show me all the media releases issued today
Show me all grants for non-government organisations
Show me all statistics mentioning jobseekers
Show me all services for the gay and lesbian community near postcode 2000
Show me all government datasets provided under a Creative Commons license

These are all queries that cross agency silos - providing a vertical search across government. ‘All’ is never likely to be truly all – only those who are using the same vocabularies and expressions of their metadata.

It may often be the case where a government site is not targeting more than one AGLS.Audience, or provided by more than one AGLS.Agent, or conduct more than one AGLS.Function, or operate over more than one AGLS.Jurisdiction – in that case, it’s probably not worth manually re-entering the value on subsequent pages.

The AGLS.Document type often comes in handy for ‘chunking’ pages or sections within a site – internal searches may then be able to include or exclude document types on the fly (exclude media releases and policy statements, for example) .

Commercial search engines are increasingly relying on more metadata to add value to their results (Google’s Rich Snippets and Yahoo!’s Search Monkey, for example).

In short - adding rich metadata (AGLS or otherwise) would seem to be a good habit to get into, now that users are becoming increasingly sophisticated in their interactions with search.

Brilliant comment Gordon. You have shone light on the difference between the somewhat esoteric value of hidden metadata and the true usefulness of exposed metadata.

I have been tangentially involved in a project over at DHS that is drawing on this sort of thinking.

The most exciting thing is that metadata is being discussed in this depth. I can smell gov 3.0 coming along. So, claps for Jacinta for your post.

Hi Jacinta, I'm interested in how metadata could be extended to not only include web resources, but how it could be applied in a consistent whole of government approach to all products and services that we work with.

I believe there is a future where taxonomies, dictionaries, thesauri, government specific language and government structures, products and services could all be delivered as a structured metadata language construct via a web services interface.

This integration could provide a backbone for future web publishing and semantic markup that could potentially be auto applied to new online content.

Auto application of metadata to resources (or at least auto suggestion) could alleviate the current problem where the majority of online professionals and content authors are not applying metadata properly or in some cases, like where I currently work, not at all.

I really appreciate and admire your commitment to metadata as I see it as part of the fundamental foundation for which the semantic web, or the semantic government, can possibly exist.

Also, (I forgot to tick the recieve email notifications for this post) I just wanted to agree with everything Gordon has said and to include that we, as Government, could assist agencies in registering their divisions, branches, sections, products and services with a range of probability based metadata that could be extremely useful in a tagging and searching environment - Not only for the use of Government, but also for the public's understanding of the machinery and structure of Government.

Comments on this post are now closed. Please let us know if you would like to discuss this post.

Last updated: 27 July 2016