Representation information

Definition


The information that maps a data object into more meaningful concepts. An example of representation Information for a bit sequence which is a FITS file might consist of the FITS standard which defines the format plus a dictionary which defines the meaning in the file of keywords which are not part of the standard. Another example is JPEG software which is used to render a JPEG file; rendering the JPEG
file as bits is not very meaningful to humans but the software, which embodies an understanding of the JPEG standard, maps the bits into pixels which can then be rendered as an image for human viewing.

Source: Consultative Committee on Space Data Systems. Recommended practice for an Open Archival Information System (CCSDS 650.0-M-2).

Introduction


Information models described in OAIS and expanded in PAIS adopt the general principle that "data interpreted using its representation information yields information."


Image source

Figure 2-2: Obtaining information from data. Consultative Committee on Space Data Systems. Open Archival Information System (OAIS) recommended practice. CCSDS 650.0-M-2.

Representation information can be any kind of information that maps data into more meaningful concepts, but is usually used to describe information such as file format specifications, standards, and software required to render a data object into information that can be understood by the designated community. Representation information required for the preservation of data objects can vary greatly depending on the nature of the Producer-Archive Project. As noted in CCSDS 652.0-M-1, Section 4.2.5.4:

Sometimes there is both general Representation Information (e.g., format information) and specific Representation Information (e.g., meanings of individual fields within a dataset). Often the general information will be available in an external repository, but the local repository may need to maintain the instance-specific information.

Examples of general representation information made available by external repositories and trusted partners include the PRONOM registry maintained by the UK National Archives and the Format Policy Registry (FPR) maintained by Artefactual Systems. 

Representation information must be understandable using the recipient's knowledge base. An example of representation information for a bit sequence which is a FITS file might consist of the FITS standard which defines the format plus a dictionary which defines the meaning in the file of keywords which are not part of the standard. 

Another example is JPEG software which is used to render a JPEG file; rendering the JPEG file as bits is not very meaningful to humans but the software, which embodies an understanding of the JPEG standard, maps the bits into pixels which can then be rendered as an image for human viewing. 

Related terms


Data object

File format

Information object

Knowledge base

How to meet requirements for representation information


  1. Conduct surveys of the designated community to ensure that the preserved information remains understandable and usable by the designated community.

  2. Conduct informal and/or formal technology watches to gather information that can support the implementation and maintenance of a preservation strategic plan.

  3. Develop a preservation policy that includes a preservation strategic plan.

  4. Develop a preservation strategic plan that includes strategies to specifically address the obsolescence or inadequacy of representation information (including formats) as the knowledge base of the designated community changes.

  5. Develop a policy and procedures for the periodic review and updating of all documentation relevant to the management and operation of the trustworthy digital repository.

  6. Generate and preserve file format metadata using file identification tools that incorporate PRONOM file signatures (e.g., DROID, FIDO, Siegfried).

  7. Use file validation tools

  8. Develop policies and procedures for the use and maintenance of format registries that include general representation information such as file format specifications.

  9. Use PREMIS

  10. Share locally held representation information via trusted format registries.

  11. Develop a model of object for transfer (MOT) for each Producer-Archive Project that specifies the representation information of each SIP component. 

Related technical standards

ISO standardCCSDS recommendationDescription
ISO 20652:2006 (PAIMAS)CCSDS 651.0-M-1, Action P-4Identify the complementary information: The Representation Information and Preservation Description Information (PDI). Draw up an inventory of the available data and information and those which must be created or gathered, and if necessary identify those that are mandatory for the preservation and those that are only useful.
ISO 20652:2006 (PAIMAS)CCSDS 651.0-M-1, Action P-9Make a preliminary identification of the Data Objects: This enables a first list of object categories to be drawn up. These include the Content Data Objects, which contain the primary information to be preserved, the Data Objects containing Representation Information on the primary Data Objects, and the Data Objects describing the context and source of the primary information.
ISO 20652:2006 (PAIMAS)CCSDS 651.0-M-1, Actions P-10 through P-13Define the rules, standards and tools. Identify standards applicable to Data Objects containing the Representation Information of Content Information: simple reference to a standard that should also be archived or use of a syntactic data description language (e.g., EAST), semantic description language (DEDSL, SGML, PVL, XML), etc. 
ISO 20652:2006 (PAIMAS)CCSDS 651.0-M-1, Action F-3

Define the general project context: At this stage the Producer and Archive must agree on all the information elements to be preserved and on the following content to be delivered:

  • Content Information: Data Object and Representation Information (syntactic and semantic)
  • Preservation Description Information (provenance, context, reference, fixity)
  • Descriptive Information
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.1.3The repository shall have adequate specifications enabling recognition and parsing of the SIPs. Repositories can meet this requirement, in part, by Representation Information for the SIP Content Data, including documented file format specifications.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.2.1.2The repository shall have a definition of each AIP that is adequate for longterm preservation, enabling the identification and parsing of all the required components within that AIP. Documentation should identify each class of AIP and clearly show that AIP components such as Representation Information and Provenance can be managed and kept up to date. 
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.2.5

The repository shall have access to necessary tools and resources to provide authoritative Representation Information for all of the digital objects it contains. Sub-sections include:

  • 4.2.5.1 The repository shall have tools or methods to identify the file type of all submitted Data Objects.
  • 4.2.5.2 The repository shall have tools or methods to determine what Representation Information is necessary to make each Data Object understandable to the Designated Community.
  • 4.2.5.3 The repository shall have access to the requisite Representation Information.
  • 4.2.5.4 The repository shall have tools or methods to ensure that the requisite Representation Information is persistently associated with the relevant Data Objects.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.2.7.3The repository shall bring the Content Information of the AIP up to the required level of understandability if it fails the understandability testing.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.3.1The repository shall have documented preservation strategies relevant to its holdings. Preservation strategies should address the obsolescence or inadequacy of Representation Information (including formats) as the knowledge base of the Designated Community changes, and safeguards against accidental or intentional digital corruption. The repository should have mechanisms in place for monitoring and notification when
Representation Information (including formats) approaches obsolescence or is no longer viable, and it should be able to show that it has mechanisms to address such notifications.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.3.2The repository shall have mechanisms in place for monitoring its preservation environment. The repository should show that it has some active mechanism to ensure that the preserved information remains understandable and usable by the Designated Community and that it has mechanisms in place for monitoring and notification when Representation Information (including formats) approaches obsolescence or is no longer viable. 
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.3.2.1The repository shall have mechanisms in place for monitoring and notification when Representation Information is inadequate for the Designated Community to understand the data holdings.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.3.3.1The repository shall have mechanisms for creating, identifying or gathering any extra Representation Information required.
ISO 16363:2012 (TDR)CCSDS 652.0-M-1, Section 4.4.1The repository shall have specifications for how the AIPs are stored down to the bit level. The repository should specify the Representation information down to the bit level of each AIP component and must specify how the separate components are packaged together. The Representation Information must be available for each AIP and must be appropriately linked to the AIP. 
ISO 16919:2014 (requirements for ceritification bodies)CCSDS 651.1-M-2, Annex ACertification body personnel involved with selecting the auditing team, performing auditing activities, and evaluating auditors must possess the knowledge to assess the trustworthy digital repository's procedures and processes when creating Archival Information Packages (AIPs), and its ability to assess ways of defining Designated Communities and how the appropriate amount of Representation Information may be obtained.

References

Consultative Committee on Space Data Systems. Open Archival Information System (OAIS) recommended practice, Section 2.2.1. CCSDS 650.0-M-2.

Consultative Committee on Space Data Systems. Producer-archive interface -- methodology abstract standard (PAIMAS) recommended practice. CCSDS 651.0-M-1.

Consultative Committee on Space Data Systems. Requirements for bodies providing audit and certification of candidate trustworthy digital repositories, recommended practice. CCSDS 651.1-M-2.