To OR11 I took a presentation, jointly with Imma Subirats, from UN FAO in Rome, which we called Changing Platforms. The aim of the presentation was to discuss the subject of migrating repositories between different software platforms.
In addition to her work at FAO, Imma is Chief Executive for the E-LIS repository, a major international and multi-lingual repository of articles about Library and Information Science. E-LIS has operated since 2003 on EPrints, but last year migrated to DSpace, because CILEA in Italy, who generously donate support and hosting, now focuses exclusively on working with DSpace. The E-LIS migration has been largely successful, however a number of EPrints features on which the E-LIS editors and users depended, have been difficult to replicate in DSpace, or had to be put on ice. This is no reflection on the specialists at CILEA, but perhaps indicative of more profound differences between EPrints and DSpace, that aren’t always reflected in the usual comparisons of repository platforms, such as the otherwise informative JISC RSP Repository Software survey.
ULCC of course has just completed a repository migration from DSpace to EPrints for the School of Advanced Study. Our motivation was in many respects the same as that of CILEA – our expertise lies firmly in the EPrints camp. But I think the outcomes for our end-user community are more demonstrably positive: in fact I don’t think there’s a single feature of the new SAS-Space-on-EPrints that isn’t a major improvement over its previous incarnation.
Migration of metadata and data (at least from DSpace to EPrints) presented few issues (that weren’t of my own making!) – export, transform, import. Here the similarities between the models of the two platforms was extremely valuable. But we did encounter other significant differences, some of which are set out in more detail below.
Issues in EPrints
Perhaps the most significant issue we encountered with re-implementing SAS-Space on EPrints was the absence of built-in support for Handle persistent identifiers. Handle support comes out-of-the-box with DSpace, but not with EPrints, so the choice we faced was between re-implementing Handle support, or dropping it. We chose the latter, since the benefits of Handles to a relatively small IR like SAS-Space were not obvious, and so it was hard to justify the extra cost and effort. By ensuring that items kept the same ID when migrated from DSpace to EPrints, and implementing a simple rewrite rule, we have ensured that Handle URIs created while DSpace was operational continue to point to the same item – but for items added since EPrints went live, no new Handle URIs are coined.
(Shortly after we returned from OR11, an extended discussion broke out on Twitter, amongst several well-respected gentlemen in our field, about the benefits of using Handles. A considerable amount of scepticism was expressed about their usefulness.)
Issues in DSpace
Imma described some workflow issues encountered with the new implementation of her repository. The E-LIS team is accustomed to a very flexible EPrints-based workflow that allows items to have their workflow status changed quite freely. DSpace, by contrast, has a unidirectional workflow model, so that items cannot (for example) be reverted from Live to Pending, if some kind of error is spotted, but effectively need to be deleted and resubmitted. This is obviously a significant divergence between the superficially similar repository platforms.
Another example Imma gave of a perplexing feature of the default DSpace UI is the button on each abstract page that says “View Full Item Record”. It leads to a rather intimidating web page displaying the item metadata as Qualified Dublin Core. It’s not a very attractive display, nor is it actually a “data” rendering of the metadata (as you would get by explicitly choosing to Export As XML, or from some new-fangled Linked Data features). It’s not clear why this view would be of interest to general users of the repository: why is it there?
At OR11 I talked to several people working with DSpace, and all agreed that there’s room for improvement in the default Web UI. In some cases they have completely reimplemented the web templates. It’s also worth noting that the page layout in the default JSP UI is entirely implemented using HTML tables, and doesn’t pass W3C validation. For a Web application that’s nearly 10 years old, this is disappointing. (The alternative Manakin XML UI implements an attractive vision of UI abstraction using XSLT, but reports suggest that configuring/maintaining it is not for the faint-hearted.)
Quite a few Web design infelicities are perpetrated in the default Community, Collection and Abstract page templates. (During the conference, many of us enjoyed and applauded Simeon Warner’s timely rant,”Don’t bold the field name”.) Of course we can change them – it’s Open Source, isn’t it? – but is it unreasonable to expect default Web templates that are at least potentially usable as is? Of course the natural and reasonable response of the DSpace community is to ask that we report the issue as a bug or feature-request to the development team. Or fix it ourselves and share the fix. But where an absent feature is really important to a user (by which I probably mean a repository manager), then the choice faced is between “getting by” until it’s implemented in the core distribution, or doing it themselves (which probably means hiring a specialist developer to implement it for them).
At the DSpace User Group meeting in 2007, I described how we considered that, back in 2005, DSpace offered a better “out-of-the-box” experience than EPrints. I never thought it was anything to write home about – in fact I remember being disappointed by the very UI issues I’ve described above – but to my untrained eye it did seem better than EPrints, at the time. But, as I’ve mentioned elsewhere, EPrints has improved remarkably since.
Of course a lot of people we admire have proved that you can create impressive repository systems using DSpace. It performs and provides a lot of essential repository functionality. Its Lucene search engine is certainly better than anything EPrints currently offers. But I’m still surprised how much more work seems to be necessary to make a DSpace installation as readily useful and usable as EPrints, and this seems to represent considerable additional cost in setting up DSpace.
I’ve heard it sometimes argued – in both EPrints and DSpace camps – that Repository setup shouldn’t be too easy, lest repository managers get in a mess and endanger the integrity of their system. In my opinion, as developers and solution providers, our job is to provide as many features and tools as possible to enable Repository Managers to manage their collections effectively and easily – not act as as gatekeepers to their systems and data.
By way of contrast, we have recently supported the Institute of Education (IOE) in setting up an EPrints repository of UK government publications, and we were pleased to see the repository manager called on us very little, other than to answer some questions and apply a few small configuration changes. The experience with SAS-Space has also confirmed to me that EPrints now has strong out-of-the-box appeal, and a rich set of features available through the Web UI, that enable a reasonably confident repository manager to get to work without needing to initiate a major technical project.
In the current climate, of straitened library budgets, this could make a considerable difference to the viability of a repository startup project. For a growing number of libraries and information services – not least at smaller research institutions, or in developing countries – that could be the difference between having a repository, or not.