To experience this website in full, please enable JavaScript or upgrade your browser.

Digitisation

Announcing the DPTP Digitisation Course

digncoursebanner

We’re pleased to announce the DPTP Course on Digitisation, which will take place in London on 27th September 2016. It is a one day course, taught through a combination of slides, exercises and discussions, and costs £312.30.

The course will cover the basics of digitisation, from the initial planning through project management to protecting and preserving the resulting digital assets for the long term. It explores preparation, project management, equipment/outsourcing, workflows and policies. It will also look at metadata, copyright and licensing, and managing access to the digitised content.

digibutton

I first taught this Course in Salford in June 2013. It was at the Working Class Movement Library and organised by Elinor Taylor, who wanted a block of learning as part of her two day digital humanities training event for humanities postgraduate researchers. We were keen to help with this project, for several reasons:

  • Postgraduate researchers represent a strand of learners different to the archivists and librarians who usually attend the DPTP
  • Digital Humanities was then, and still is, a growth area that we need to engage with
  • It was an opportunity to create learning materials on a subject which previously had been taught as a single 90-minute module on the DPTP. As we discovered with web-archiving, the subject is too rich for a single module, and requires a day to be understood properly.

Elinor had some very specific expectations from this offering, which we tried our best to meet as we like to keep our customers happy. She had the intention that “those who attend will acquire skills to design their own project and start work”. Her learners wanted to know about:

  • How to get from print resources to digitized resources
  • What are the basic principles of designing a digitisation project
  • Discussion of compliance and copyright issues
  • Integration with existing catalogues
  • Integration with existing digitization strategies
  • Criteria for selection of file formats

In this Library’s case, the students did have digitisation projects planned, but I learned they would be outsourcing the actual scanning to a professional company. This affirms my view, that I still subscribe to, that scanning is only a small part of digitisation.

In offering this Course as part of the 2016 DPTP programme, Steph Taylor and myself have updated the content and included more information to address certain key specialisms and concerns in this field. For instance, we have incorporated what we learned in the last 12 months about DAM systems, metadata, and image libraries, through our liaison with Sarah Saunders of the IPTC Metadata Group. We have also upgraded our workflow model to include strands on OCR, manuscript digitization, and crowd-sourcing. Watch this space for a forthcoming podcast where we discuss these improvements and additions.

digibutton

In the podcast below, hear Steph Taylor and Ed Pinsent discuss the new Digitisation Course.

The tranScriptorium project puts HTR into practice

We are very enthusiastic about our ongoing work on tranScriptorium and thought it was time to share this with you.

tranScriptorium is one of the Specific Targeted Research Projects (STReP) of the Seventh Framework Programme (FP7) created by the European Commission – Research and Innovation.

Digital libraries work and published huge amounts of handwritten historical documents. For typical handwritten text images of historical documents currently available text image recognition technologies are not efficient. Traditional Optical Character Recognition (OCR) is simply not usable since characters cannot be isolated automatically in these images. Therefore, holistic, segmentation-free Handwritten Text Recognition (HTR) techniques are needed. Current HTR approaches still lack the required accuracy, mainly due to poor quality, degradations and writing style variability of historical document images.

tsimageblogpostHence tranScriptorium aims to develop innovative, efficient and cost-effective solutions for the indexing, search and full transcription of historical handwritten document images, using modern, holistic HTR technology.

The project will turn HTR into a mature technology by addressing the following objectives:

  1. Enhancing HTR technology for efficient transcription
  2. Bringing the HTR technology to users: individual researchers with experience in handwritten documents transcription and volunteers who collaborate in large transcription projects.
  3. Integrating the HTR results in public web portals: the outcomes of the tranScriptorium tools will be attached to the published handwritten document images.

 

ts2forblogpostThe project will have an important impact to transcribers for whom HTR technology is not well known, as well as to new non-specialist users accessing the possibility to transcribe complex historical documents. Projects like Transcribe Bentham can certainly make good use of this technology. Also great impact is expected for the content provision of cultural heritage digital collections. tranScriptorium might even help locate sunken ships once the information stored in the General Archive of the Indies were processed!

The tranScriptorium project runs from 1 January 2013 to 31 December 2015.

Digitisation Course at Salford

Recently, I delivered a one-day training course on digitisation to Digital Humanities post graduates in Salford. Elinor Taylor of Salford University won an AHRC grant for a Research Skills Enrichment project, called Issues in the Digital Humanities: A Key Skills Package for Postgraduate Researchers, and one of the strands was about improving digitisation skills; more specifically how best to manage a digitisation project.

Elinor was unable to find anyone who could deliver the course they wanted, and commissioned ULCC to create a bespoke course. They approached us through our Digital Preservation Training Programme, which recently won an award for training and communication. Elinor at first thought a workshop / hands-on event might be best, where a digitisation workflow could be aligned with a real-world case processing papers from the Working Class Movement Library which they were scanning. In the end we agreed that an overview of management principles would be better. I was asked not to dwell on scanners and cameras, since the audience for the course would mostly be outsourcing their origination work to commercial providers. Audio-visual conversion was also out of scope.

My course was structured to follow a start-to-finish narrative. Inevitably this meant spending one-third of the time discussing the planning and preparation. I’m a great believer in asking the question “why” about 15 times before beginning any project, and the same applies to digitisation. Who wants this stuff? Why digitise it? Will it improve their lives if you do?

Read more

The House of Books: Manuscripts and religious identity in Iraq

Father Najeeb Michaeel examines a manuscript

Father Najeeb Michaeel is an Iraqi Christian priest who speaks Arabic, English, French, Aramaic and Syriac, not to mention being able to read Latin and Greek. In the garden of Zaytun library, Erbil I hear this gentle man tell me how his community of friars used to live in Mosul, a traditional centre for Christianity in Iraq, having the highest proportion of Assyrian Christians of all the Iraqi cities. Father Najeeb’s community has  had to leave Mosul due to persecution.  Later on during The House of Books workshop he gives us a presentation of the magnificent early Christian manuscripts they are digitising.  Over coffee he gives us a moving rendition of the ‘Our Father’ sung in Aramaic.  I wasn’t expecting to feel so moved by a  religion I have become increasingly frustrated by, and in Iraq.

Early Christian manuscript, Centre Numerique des Manuscrits Orientaux, Mosul, Iraq.

Iraq has often compared to a mosaic in terms of the diversity of its religious diversity.  Iraq is a Shia majority country and contains the sacred Shia cities of Najaf and Karbala. Most sources estimate that around 65% of Iraqis follow Shia Islam, and around 35% follow Sunni Islam. What is not so well known is that Christians have inhabited what is modern day Iraq for about 2,000 years. The person who is supposed to be respnsible for the transmission of Christianity in Iraq is St Thomas the Apostle. Assyrians (also called Syriacs and Chaldeans) most of whom are adherents of the Chaldean Catholic Church, Syriac Orthodox Church and the Assyrian Church of the East account for most of Iraq’s Christian population, along with Armenians.  Tariq Aziz was born to an Assyrian family and is a member of the Chaldean Catholic church. There are also small populations of Mandaeans, Shabaks, Yarsan and Yezidis. The Iraqi Jewish community, numbering around 150,000 in 1941, almost entirely left the country.There are also Gnostics in the form of Mandeans and sub sects thereof, Yazidis who believe in a god but have a blue peacock angel in their pantheon, and of course the Zoroastrians which the ancient Babylonians followed.

Read more

Scanning is different from digitisation

If you haven’t seen it, can I recommend Kristen Snawder’s recent post on the Library of Congress Digital Preservation blog, Digitization is different than digital preservation. Kristen reiterates familiar points about the long-term commitment necessary for serious digital preservation, contrasted with the quick hit of a scanning project. “In the hurry to meet user expectations, institutions may scan large quantities of materials without having a solid plan for preserving the digital images into the future.”

However another recent find on the Web compels me to make an additional point, namely that we might do equally well to differentiate between scanning and digitisation. Anyone can set to work with a scanner and create a bunch of digital images – but that barely scratches the surface of what I think we should be expecting of a digitisation project in 2011.

First and foremost, we need metadata: the more the merrier, but something at least. Even if we expect to come back later and polish it up (once the images can be browsed and examined on screen). In the absence of any established metadata profiles for a project, at least try to cover as many Dublin Core elements as possible – title, creator, date, subject/keywords… Images, in particular, may prove tricky or time-consuming to find again, especially once there are thousands of them on a disk. We should probably keep the metadata in a database, and perhaps additionally store metadata with the objects. This can be as XML or plain text files stored alongside the digital images, or embedded in the files we create (many common file formats – TIFF, JPEG, MPEG, PDF – support metadata embedding, and there are many free tools available to help).

There is yet more, though, that we should be doing, particularly when we are scanning text-based objects (articles, books, magazines, reports, etc). Most importantly, we really should try and extract the text from the image if possible. [1]

My recent web find was the teaching blog of Dr Toine Bogers at the Royal School of Library and Information Science (RSLIS) in Copenhagen, Denmark. One fascinating post describes a Lab Session exercise, From OCR To NER, a set of comparatively simple command-line processes to get the most out of a scanned-text project.

Read more

Transcribing Bentham

Jeremy Bentham, Bloomsbury WC1 by Ewan-M on Flickr (CC:BY)Did I mention that we are very excited to be contributing to UCL’s Bentham Transcription Initiative. This is an AHRC-funded project to complete the digitisation of the manuscripts of 18th Century philosopher Jeremy Bentham, and transcribe them using a wiki-based collaborative approach. It is being run by the Bentham Project at UCL, with support from ourselves and UCL’s newly-launched Centre for Digital Humanities. You can read an overview of the project on Melissa Terras’s blog.

Obviously, transcription of manuscript materials is an important digitisation activity that can rarely, if ever, be left to computers, in the way that printed texts can be, using OCR. But it’s painstaking and laborious work, and anything that eases the burden is welcome.

The project is already throwing up some very interesting conversations about transcription.  At ULCC we have thought about transcription before, particularly with regard to our ongoing work for the Linnean Society archives, and we hope that there will yet be synergies to exploit. It is a great feeling to be so closely involved with disseminating the work of two such seminal figures as Linnaeus and Bentham.

We’re not naïve enough to think that collaborative web-based transcription is new, but we’ve yet to find any substantial comparable examples. A comment on UCL’s Digital Humanities blog teases us with the prospect of information about other similar projects, but fails to provide even a single link or hint, so is effectively useless: hardly in the collaborative spirit! A more useful lead was Joanne Evans’ link to the National Library of Australia’s Australian Newspapers project, which is crowdsourcing the proof-reading and correcting of OCR outputs, and has an impressive-looking site – I’m sure we’ll be borrowing some ideas from there.

Another useful lead has been from Ben Brumfield of Austin, Texas, directing us to his blog about collaborative manuscript transcription which has been going even longer than DA Blog, and looks like it’s going to make interesting reading. Ben’s recent blog post about a distributed transcription exercise of the US Geological Survey’s Bird Phenology Program includes a link to a training video for volunteers (it even sounds like it’s been recorded in a birdhouse).  In the video we can see a database-form approach to transcription, which is particularly appropriate for transcribing data already entered on structured forms.

For more heterogeneous and free-form texts, such as the Bentham manuscripts, wikis seem to me much more appropriate, being in essence discrete hypertext engines. As for collaborative features, MediaWiki in particular has strong and proven features: there can be few better advertisements for effective virtual, global collaboration and crowdsourcing than Wikipedia.

One thing that is particularly compelling about the BPP video is that it is an excellent example of a thorough approach to online collaboration, giving clear and unequivocal guidance to contributors. Now that screencast tools are so readily available, it’s clear that for many activities like this, video-based instruction is the ideal tool, and often preferable to any number of written instructions. No less than for online teaching and learning environments, the need for effective induction and inclusive management of the online community must never be overlooked.

Farewell ‘TASI’, Hello ‘JISC Digital Media’

Photo by Chad Miller

Photo by Chad Miller

On the 5 March I attended the London launch of the rebranding of ‘TASI’ to ‘JISC Digital Media’. Tables were decked with everything from canapés & wine, to a variety of AV and photographic media on display (on separate tables of course!). Although the former ‘TASI’ was always a JISC-funded venture, it’s now more prominently self-evident in its newly rebranded name.

As of August this year, JISC Digital Media will become part of a consortium of JISC advisory services that aim to provide joined-up solutions for clients. Other aligned services include JISC InfoNet, JISC TechDis, JISC Legal Information, Procureweb and JISC Netskills.

JISC Digital Media’s official brief is “to ensure that digital media resources being created, used and managed within the further and higher education community meet the teaching, learning and research needs of individuals and institutions within the UK.” The recently expanded service now also provides expertise in moving images and sound. (In fact, as I blog, a couple of members of our very own Digitisation team are attending their new training course on Audio Production).

Speakers at the launch touched upon some specific aspirations for the Service, and a few points of interest stood out:

  • JISC Digital Media are keen for the HE and FE sector to use the JISC Digital Media blog and share expertise across the sector;
  • Would like to adopt more web 2.0 technologies, for example, skype-based e-learning that could support some aspects of practical training;
  • Their emphasis will be on helping the HE/FE sector to use images for teaching;
  • There is a recognised need that more must be done to help the FE sector.

Of course, with any newly rebranded organisation, comes a new-look website http://www.jiscdigitalmedia.ac.uk/ Nice bold colours and user friendly too! …So farewell dear ‘TASI’ [now a dirty word that incurs a fine if spoken out-loud by its own staff], and ‘hello’ to the new and improved Advisory Service: ‘JISC Digital Media’.

BT Archives: Digitisation of Historic Posters

We’ve just completed digitisation of a small series of interesting General Post Office posters for the BT Archives; all of which hover around the WW2 period (1930s to 1950s).

These were the six remaining items that still required digitisation and generation of suitable access copies, in a larger batch of posters that will be made available through BT Archives’ online catalogue.

celebration_2.jpg

“Celebration: Send a greetings telegram” poster, circa 1951 (British Telecommunications Archive reference: PRD 981). Approximate dimensions: 38x25cm.

© 2009 “BT” British Telecommunications plc

All rights reserved.

celebration_2_crop.jpg

100% crop showing part of greetings telegram example in the centre

Even though tiny at well under two numerals, this is nevertheless a varied series

Read more

ULCC/Portico/DPC consortium to undertake JISC preservation study

JISC Digitisation ProjectsWe’ve just heard that a consortium of ULCC, Portico and the Digital Preservation Coalition has been awarded the contract by JISC to undertake a Preservation Study of recent digitisation activities.

The JISC Digitisation Programme has made a wide variety of valuable resources digitally accessible, including:

  • British Newspapers (1620-1900)
  • Newsfilm Online
  • First World War Poetry
  • Newspaper Cartoons
  • Welsh Periodicals
  • Pre Raphaelite drawings
  • East London Theatre Archive

More information about these, and other projects, is available on the JISC Digitisation web page.

The project will review the preservation plans and processes of the sixteen projects funded under Phase 2 of the JISC Digitisation Programme, and identify any medium or long-term access risks to the digitised content. It will also produce recommendations – for individual projects and for JISC as a whole – for processes and strategies to mitigate the risks, and case studies which would be helpful to the broader community.

This is an exciting opportunity for us to apply and extend the experience we have gained working on a range of projects in the field, including the European Visual Archive Market-validation Project (EVAMP) and risk assessments for the recently launched Newsfilm Online project. We will shortly be creating an online home to for the project collaboration and development, and will use DA Blog and the DigiPresSurvey Blog (on JISCInvolve) to keep you updated.

Digitising William Morris’s Lantern Slides (Part 3)

We stumbled upon fascinating facets of lantern slide creation, assembly and ageing processes during digitisation of the William Morris collection. Use of forensic resolutions, true colour and high bit depths in the capture process (2400ppi true optical/RGB 48bit) allowed us to pick out what we think are some singularly remarkable hand painted slides (slides that were generated by direct application of ink to the glass), and unearth an array of decay and fading patterns.

This 1st example of a hand-coloured slide depicts the Tudor Kelmscott Manor, “The Country Home of William Morris”, and surrounding scenery. (Click on images to enlarge in a new page and click the back button to return to the post)

 

5_screen_res-no-md.jpg

Of notice are the rudimentary nature of the colouring work and the 2 occurrences highlighted in pink (enlarged below).

Read more