CERIF elements and vocabularies landscape survey

Landscape survey spreadsheet

Brief notes 

 As part of WP4, particularly in preparation for its core component of CERIF mapping, it was decided to conduct a ‘landscape’ survey of existing practices and technologies. Specifically, this study examines the current element sets in use in a number of major projects (particularly any use they make of CERIF elements) and also what dictionaries or other vocabularies they use to support the semantics of their CERIF applications. This document provides brief contextual information on the results of this survey: there is also a spreadsheet (available here) which represents the results of the survey of data elements and fields.

Element sets

The spreadsheet documents the data elements used in  the following projects, products and resources:-

  •  RIOXX
  • Gateway to Research
  • CASRAI
  • Pure
  • CERIF for Datasets
  • IRIOS

In addition, we looked at Sympletic, although no publicly available information on its CERIF-XML export was available.

Column A in the spreadsheet provides the project or product name, and the next five the hierarchical arrangement of elements, starting with the highest level in column B. Column G provides element definitions when available and column H any notes.

Of these, Gateway to Research, CERIF for Datasets and IRIOS use CERIF, RIOX Simple Dublin Core supplemented by a small number of extra elements (RIOXX Terms) and CASRAI their own element set. In addition, Pure use CERIF (although the exact elements used are not published).

These resources vary considerably in the level of details they record and the manner in which they are  structured. The simplest is the flat-level representation used by RIOXX, which is based on Simple Dublin Core fields and DC Terms for audience, issue date and references, supplemented by two RIOXX elements for project ID and funder,

All of the other projects use more complex, multi-granular levels of description.

Of the three CERIF-based projects Gateway to Research has the most extensive element set. Element sets are defined for:-

  • projects (cfProj)
  • persons (cfPers)
  • research publications (ResPubl)
  • organisational units (cfOrgUnit)
  • funding (cfFund)
  • measures (cfMeas)
  • postal addresses (cdPAddr)
  • research patents (cfRefPat)
  • research products (cfResProd)

    in addition to  CERIF class definition elements (cfClassScheme and cfClass). A standard set of sub-elements and linking elements are used for all of these (designated Second- and Third-level elements in the spreadsheet respectively).

CERIF for Datasets do not have such an extensive element set. They concentrate more on research outputs and also incorporate more Dublin Core elements into their descriptions. Beyond standard description and identification information for outputs, they also include geographic bounding information and spatial and temporal coverage metadata. A number of Dublin Core elements are also used for rights information.

IRIOS use a smaller element set, proving basic information on the project, funding, persons, organisational units, postal and email addresses. Relatively limited sub-elements are deployed within each of these.

Pure state in their documentation that they use CERIF elements internally, although the public documentation does not details the exact implement ion of these. The spreadsheet records the elements listed in this documentation, arranged into the broad categories given. Most categories include statements that more elements are available in addition to these main ones listed. All of these elements would map neatly into CERIF.

CASRAI has an extensive element set arranged over three nested levels, which is detailed in the spreadsheet. All of these should map into CERIF given the use of appropriate semantics.

Dictionaries

 In addition to compiling this element set survey, we also examined what dictionaries (if any) are being used to support CERIF data infrastructures.

Research Fish employs a small dictionary, its controlled terms mainly limited to types of publication. Content rules (for instance on the format of author names) are employed for a number of other fields, such as Title. PubMed is used as the primary source of publication IDs. Controlled lists are used for staff roles, sectors, qualifications, engagement activities, audience for engagement activities, types of influence on policy etc., impact types, types of research methodology, types of product output, product development stages, and types of award/recognition.

 ROS employs more detailed dictionaries, covering publications, other research outputs, collaboration and partners, further funding, staff development, dissemination, intellectual property and exploitation, awards/recognition and impact. Publication types and other research outputs are the most detailed (the latter divided into biological, creative, electronic, physical, research materials and other).

Further dictionaries are employed for languages, roles, beneficiaries of outcomes, broadcast media and coverage, venue types for events (also descriptions of its coverage (national/international etc) and audience size. Dictionaries for information on collaboration include organisation types and sectors. Funding has lists for funding organisations; staff development for project roles, destination for roles and destination sectors; dissemination for types of activity, nature of briefings to government advisers and target audience; intellectual property for other sectors’ involvement, stage of disclosure, exploitation types and employee numbers in spin-out companies; impact for type of impact, the ways in which influence is brought about, target audience and impact type; and key findings for sector.

Gateway to Research publishes a CERIF class dictionary with 236 terms including terms covering the status of a grant (active, closed etc), types of grant, funding schemes and so on.

euroCRIS have published a CERIF vocabulary, currently in version 1.5,  which contains 450 terms, each of which is associated with a classification scheme (Organisation Types, Funder Types,  Output Types, Person Contact Details, Electronic Address Types, Organisation Contact Details,  Media Relations, Research Infrastructure Types, Identifier Types, Person Names, Education Domain Terms, Activity Funding Types, Activity Finance Categories, Activity Finance Category Amounts, Publication Statues, Peer Reviews, Output Quality Levels, Open Science Costs and Verification Statuses). This provides a useful core vocabulary for any CERIF application and show be employed as the primary scheme.

 RIOXX have produced a controlled vocabulary for international and  UK funders (available at http://docs.rioxx.net/funders/) – this could be very useful although it is not clear if this will continue to be updated.

This entry was posted in Background Research, International Perspective, Technical Review. Bookmark the permalink.