Digital Preservation

Selection and Appraisal in the OAIS Model

oais blog

Following our recent podcast about our free OAIS course, here are some further thoughts about the OAIS Model. We’re aware that there is a process of discussion underway hosted by the DPC, and the sceptical view that follows might be another contribution to that process.

Recently I attended the ARA Conference. On 31 August 2016 we heard three very useful presentations in the digital preservation strand from Matthew Addis of Arkivum, Sarah Higgins and Sally McInnes from Wales, and Mike Quinn from Preservica. I recall asking a question about the OAIS model, which was prompted by another question from a fellow archivist in the audience. I was asking something about the skills of selection and appraisal. Can the OAIS Model accommodate them? My worry is that it cannot, and that the Model tends to present an over-simplified view where the Submission Information Package (SIP) arrives in a “perfect state” all ready to preserve, and the process of transforming it into an Archival Information Package (AIP) can begin. Any archivist or records manager who’s ever handled a deposit or transfer of records will tell you that real life isn’t like that. As a result, the OAIS Model alienates the archivist.

I’m aware of those in our community who have advocated a stronger pre-ingest stage in OAIS. Some call it the “long tail” before Ingest. I believe there is a body of work underway to formalise the process as part of the standard: the Producer-Archive Interface Specification. And I’m aware of those contributions to the DPC OAIS wiki where suggestions are made for how to instigate it, and even automate it to some degree.

But that’s not quite what’s worrying me. Let’s get back to the basics of what we mean by Selection and Appraisal. I think these are very strong archivist skills, which could have tremendous value in the field of digital preservation.

The Record / Archive Series

When I worked as an archivist at the General Synod with paper records and paper archives, we would often appraise and select on a Series basis. What that means to me is that we could assess the value of the content in a contextual framework, based on other records which we knew were being created, or other archival series which we had already selected and kept in the archive. The collections strategy would be based on this approach, looking for a Series in the context of provenance. For instance, the originating body might be the Board for Social Responsibility (BSR); the record series could be “Minute Books”. We would always know to accept deposits of BSR Minutes, because we could trust these as being accurate records of the Board’s work. Likewise, if the BSR collected copies of another Board’s Minutes and Documents (e.g. The Central Board of Finance), we could apply a rule that excluded that series from accessioning, on the grounds that BSR were only receiving “copies for information”.

This process I’m describing is second nature to any archives or records management professional. An understanding of context, provenance, record series: all of these things help us identify the potential value of content. Indeed, a Series model is the foundation for all Archival arrangement, and is the cornerstone of our profession. It’s extremely efficient; it saves you from having to examine every single document.

Appraisal in OAIS

I wonder to myself how Series are expressed in the OAIS Model. I often think the Model is predicated to favour the individual digital object, rather than a record series. To put it another way, a Submission Information Package is not an ideal unit on which to carry out an appraisal. At which point you could tell me “here’s 100 related SIPs, there’s your record series”. Or “we’re putting all the PDFs of our Minutes into this single SIP”. But I would still worry. Through the basic action of ingesting a SIP, we’re starting a process where all subsequent preservation actions continue to centre around the individual digital object – checksums, file format identification, file format characterisation, technical metadata extraction, and preservation metadata. And of course, the temptation is strong to automate these AIP-building actions, which has led us into building scripts that are entirely focused on a single characteristic – most commonly, the file format.

Where’s the record / archival series in all this? It’s difficult to make it out. Maybe it gets reinstated or reconstructed at the point of cataloguing. Even so, it’s not hard to see why archivists can feel alienated by this view of what constitutes digital preservation. The integrity and contextual meaning of a collection is being overlooked, in favour of this atomised digital-object view. OAIS, if strictly interpreted, could bypass the Series altogether in favour of an assembly line workflow that simply processes one digital object after another.

I believe we need to rediscover the value of Appraisal and Selection; I call on all archivists to come forward and re-assert its importance in the digital realm.

In the meantime, some questions: Can anyone show me a way that Appraisal and Selection can truly be incorporated in an OAIS Model workflow? Is there room for considering a new “Series Information Package”, or something similar? Am I over-stressing the atomisation of OAIS?

Disclaimer: this blog post represents the personal views of Ed Pinsent, not the DPTP or UoL.

dptp online

DPTP Online – new teaching course launched

What is DPTP Online?

DPTP Online is a new online course that offers paying customers an introduction to digital preservation. It aims to teach students about strategies they can use to make digital preservation possible.

This new offering from ULCC is an online learning version of the award-winning face-to-face Course which we have been teaching since 2005. In terms of the content it offers, it’s pretty much the basic two-Day Course which we have been calling “An Introduction to Digital Preservation”.

However, we took the opportunity to reinstate content and case studies from modules which we’ve always had in reserve, but had retired from the Course in order to keep it under two days. We’ve also added quizzes, case studies, videos, exercises, and forums. The entire contents of the Reading List, which used to be a 16-page PDF, has been added as live links and attachments, under “Further Reading”. All of this means DPTP Online is quite a rich experience.

The AOR toolkit

Preserving Digital Content – Taking first steps with the AOR toolkit

The ART team at ULCC has long had an interest in promoting and selling our digital preservation expertise, in the form of the Digital Preservation Training Programme, and as a consultancy service, and most recently with the the relaunch of the AIDA toolkit as AOR toolkit. However in our work we meet a lot of people in a lot of organisations, for whom “preservation” – in perhaps the traditional archival sense – isn’t necessarily their sole or principle interest.

self-assessment image

Self-assessment as digital preservation training aid

On the Digital Preservation Training Programme, we always like to encourage students to assess their organisation and its readiness to undertake digital preservation. It’s possible that AIDA and the new AOR Toolkit could continue to have a small part in this process.

Self-assessment in DPTP

We have incorporated exercises in self-assessment as digital preservation training aid in the DPTP course for many years. We don’t do it much lately, but we used to get students to map themselves against the OAIS Reference Model. The idea was they could identify gaps in the Functional Entities, information package creation, and who their Producers / Consumers were. We would ask them to draw it up as a flipchart sketch, using dotted lines to express missing elements or gaps.

AIDA’s new name: AOR Toolkit

The hardest part of any project is devising a name for the output. The second hardest thing is devising a name that can also be expressed as a memorable acronym.

I think one of the most successful instances I encountered was the CAMiLEON Project. This acronym unpacks into Creative Archiving at Michigan and Leeds Emulating the Old on the New. It brilliantly manages to include the names of both sponsoring Institutions, and accurately describes the work of the project, and still end up as a memorable one-word acronym.

AIDA toolkit use cases

The AIDA toolkit: use cases

There are a few isolated uses of the old AIDA Toolkit. In this blog post I will try and recount some of these AIDA toolkit use cases.

In the beginning…

In its first phase, I was aided greatly in 2009 by five UK HE Institutions who volunteered to act as guinea pigs and do test runs, but this was mainly to help me improve the structure and the wording. However, Sarah Jones of HATII was very positive about its potential in 2010.

Reworking AIDA: Storage

In the fourth of our series of posts on reworking the AIDA self-assessment toolkit, we look at a technical element – Managed Storage.

Reworking AIDA Storage

In reworking the toolkit, we are now looking at the 11th Technology Element. In the “old” AIDA, this was called “Institutional Repository”, and it pretty much assessed whether the University had an Institutional Repository (IR) system and the degree to which it had been successfully implemented, and was being used.

For the 2009 audience, and given the scope of what AIDA was about, an IR was probably just the right thing to assess. In 2009, Institutional Repository software was the new thing and a lot of UK HE & FE institutions were embracing it enthusiastically. Of course your basic IR doesn’t really do storage by itself; certainly it enables sharing of resources, it does managed access, perhaps some automated metadata creation, and allows remote submission of content. An IR system such as EPrints can be used as an interface to storage – as a matter of fact it has a built-in function called “Storage Manager” – but it isn’t a tool for configuring the servers where content is stored.

Storage in 2016

In 2016, a few things occurred to me thinking about the storage topic.

  1. I doubt I shall ever understand everything to do with storage of digital content, but since working on the original AIDA my understanding has improved somewhat. I now know that it is at least technically possible to configure IT storage in ways that match the expected usage of the content. Personally, I’m particularly interested in such configuration for long-term preservation purposes.
  2. I’m also aware that it’s possible for a sysadmin – or even a digital archivist – to operate some kind of interface with the storage server, using for instance an application like “storage manager”, that might enable them to choose suitable destinations for digital content.
  3. Backup is not the same as storage.
  4. Checksums are an essential part of validating the integrity of stored digital objects.

I have thus widened the scope of Element TECH 11 so that we can assess more than the limited workings of an IR. I also went back to two other related elements in the TECH leg, and attempted to enrich them.

To address (1), the capability that is being assessed is not just whether your organisation has a server room or network storage, but rather if you have identified your storage needs correctly and have configured the right kind of storage to keep your digital content (and deliver it to users). We might add this capability is nothing to do with the quantity, number, or size of your digital materials.

To assess (2), we’ve identified the requirement for an application or mechanism that helps put things into storage, take them out again, and assist with access while they are in storage. We could add that this interface mechanism is not doing the same job as metadata, capability for which is assessed elsewhere.

To address (3), I went back to TECH 03 and changed its name from “Ensuring Availability” to “Ensuring Availability / Backing Up”. The element description was then improved with more detailed descriptions concerning backup actions; we’re trying to describe the optimum backup scenario, based on actual organisational needs; and provide caveats for when multiple copies can cause syncing problems. Work done on the CARDIO toolkit was very useful here.

To incorporate (4), I thought it best to include checksums in element TECH 04, “Integrity of Information”. Checksum creation and validation is now explicitly suggested as one possible method to ensure integrity of digital content.

Managed storage as a whole is thus distributed among several measurable TECH elements in the new toolkit.

In this way I’m hoping to arrive at a measurable capability for managed storage that does not pre-empt the use the organisation wishes to make of such storage. The wording is such that even a digital preservation strategy could be assessed in the new toolkit – as could many other uses. If I can get this right, it would be an improvement on simply assessing the presence of an Institutional Repository.

AIDA toolkit

Reworking AIDA: Legal Compliance

Today we’re looking briefly at legal obligations concerning management of your digital content.
The original AIDA had only one section on this, and it covered Copyright and IPR. These issues were important in 2009 and are still important today, especially in the context of research data management when academics need to be assured that attribution, intellectual property, and copyright are all being protected.

Legal Compliance – widening the scope

For the new toolkit, in keeping with my plan for a wider scope, I wanted to address additional legal concerns. The best solution seemed to be to add a new component to assess them.

What we’re assessing under Legal Compliance:

  1. Awareness of responsibility for legal compliance.
  2. The operation of mechanisms for controlling access to digital content, such as by licenses, redaction, closure, and release (which may be timed).
  3. Processes of review of digital content holdings, for identifying legal and compliance issues.

Legal Compliance – Awareness

The first one is probably the most important of the three. If nobody in the organisation is even aware of their own responsibilities, this can’t be good. My view would be that any effective information manager – archivist, librarian, records manager – is probably handling digital content with potential legal concerns regarding its access, and has a duty of care. But a good organisation will share these responsibilities, and embeds awareness into every role.

Legal Compliance – Mechanisms & Procedures

Secondly, we’d assess whether the organisation has any means (policies, procedures, forms) for controlling access and closure; and thirdly, whether there’s a review process that can seek out any legal concerns in certain digital collections.

Legislation regimes vary across the world, of course, and this makes it challenging to devise a model that is internationally applicable. The new version of the model name-checks specific acts in UK legislation, such as the Data Protection Act and Freedom of Information. On the other hand, other countries have their own versions of similar legislation; and copyright laws are widespread, even when they differ on detail and interpretation.

The value of the toolkit, if indeed it proves to have any, is not that we’re measuring an organisation’s specific point-by-point compliance with a certain Statute; rather, we’re assessing the high-level awareness of legal compliance, and what the organisation does to meet it.

Interestingly, the high-level application of legal protection across an organisation is something which can appear somewhat undeveloped in other assessment tools.

The ISO 16363 code of practice refers to copyright implications, intellectual property and other legal restrictions on use only in the context of compiling good Content Information and Preservation Description Information.

The expectation is that “An Archive will honor all applicable legal restrictions. These issues occur when the OAIS acts as a custodian. An OAIS should understand the intellectual property rights concepts, such as copyrights and any other applicable laws prior to accepting copyrighted materials into the OAIS. It can establish guidelines for ingestion of information and rules for dissemination and duplication of the information when necessary. It is beyond the scope of this document to provide details of national and international copyright laws.”

Personally I’ve always been disappointed by the lack of engagement implied here. To be fair though, the Code does cite many strong examples of “Access Rights” metadata, when it describes instances of what exemplary “Preservation Description Information” should look like for Digital Library Collections.

The DPCMM maturity model likewise doesn’t see fit to assess legal compliance as a separate entity, and it is not singled out as one of its 15 elements. However, the concept of “ensuring long‐term access to digital content that has legal, regulatory, business, and cultural memory value” is embedded in the model.

Reworking the AIDA toolkit: why we added new sections to cover Depositors and Users

Why are we reworking the AIDA toolkit?

The previous AIDA toolkit covered digital content in an HE & FE environment. As such, it made a few basic assumptions about usage; one assessment element was not really about the users at all, but about the Institutional capability for measuring use of resources. To put it another way, an Institution might be maintaining a useless collection of material that nobody looks at (at some cost). What mechanism do you have to monitor and measure use of assets?

