The UKRISS project is working on representing information about research projects and their outcomes in CERIF, an industry standard in the description of educational activities and assets. The output of this work is a proposal for a harmonised reporting model – a common way of representing research outputs and related pieces of information important to multiple stakeholders in Higher Education.
One of the key areas Cottage Labs’ technical work has focused on is validation – the process of ensuring compliance with the model and the quality of the data represented.
In a previous post on the Cottage Labs blog, we discussed the process and technicalities of validation; but what makes validation so important?
- It can catch errors with metadata at the point that it is created or stored, preventing it from propagating incorrectly
- It can improve your metadata quality at source
- It can help ensure consistency of metadata across the whole community, which in-turn will be of value in other contexts, such as reporting and analytics
Catching errors and preventing them from spreading further
As we learned in Richard’s post on validation, the prototype tools we have built can validate fields in the following ways:
- Format validation – for example, checking that an ISSN is in fact and ISSN by checking if it has the correct form (nnnn-nnnn), and that its checksum digit is legitimate.
- Reality validation – if we can find the value of the field in an external dataset, then we can be more confident that it is genuine. For example, if we have a DOI (Digital Object Identifier) which fits with the overall structure of a DOI (for example, by applying a regular expression), we might then go and look it up in the Crossref database, or follow it to the digital object that it identifies, in order to check that it actually exists in the real world.
Improving metadata quality at source
The process of checking whether the data is real has the further advantage that we may discover other metadata about that fieldset, for example a bibliographic reference or the data embedded in the html meta headers in the publisher’s website. This information can help improve the quality of our own metadata.
Ensuring community-wide consistency
In addition to checking whether the data conforms to the model and making sure that it exists in reality, we can also cross-reference the data against other sources to find out if the sets of fields from our document and the external data are coherent as a whole. For example, if we request information about a DOI from CrossRef, then we also get back a bibliographic metadata record, which we can compare with the one we are validating. This helps make metadata coherent across the whole community.