We deployed a major update to the CDR this morning that includes enhancements we’ve been working on for the past several months. Here is a list of the highlights.
Created iRODS rule for delayed replication, improving ingest performance
Automated iRODS rule for quarterly fixity check of all objects preserved in the CDR
In consultation with the Digital Preservation Steering Committee, developed and implemented policies for event logging of ongoing preservation actions
Designed and implemented improved navigation within collections in the administrative interface
Integrated ingest and status monitors into the admin interface
Integrated searching and faceting within admin interface
Allow sorting in admin interface by title, author, date submitted, and date updated
Users can assign access controls and embargoes
Users can filter by access control attributes
Users can move objects between containers via intuitive drag and drop actions
Users can add/create/update descriptive metadata for any object or container
Users can upload single objects and containers via admin interface
Users can upload METS submissions via admin interface
The interface provides feedback on the progress of operations, and whether operations succeeded
Created standardized, cross-platform staging locations on networked storage resources
Streamlined the process of staging materials in place via the curator’s workbench
Added the ability to switch staging locations while working on a project
Added the ability to export-share-import projects across Mac, PC, Linux
Indicate in collection browse view which collections have access restrictions
Structure browse facet shows structure back to the nearest collection to maintain context
Cleaned up search query syntax and urls as the groundwork for API work
Upgraded to Solr 4.3
Implemented partial updates in Solr indexing pipeline for more efficient publishing and access control updating
Eliminated previous version of CDR administrative interface
Removed the requirement that depositors and owners be objects in Fedora, simplifying user management and streamlining the ingest pipeline
Removed the requirement to prepare a MODS record when ingesting collections or single objects via the admin interface, lowering the barrier for users to create simple objects in the repository.
Reduced browser latency through fewer file requests by adding a build script for combining css and js files
Applied metadata editor bug fixes
Resolved a bug affecting faceting at the collection view level
Fixed concurrency issues with access control cache
The newest version of the Curator’s Workbench is now available at our download site. Mac users see that we now distribute an .app bundled application, which includes Java 7. This means that Mac OS X users can treat the workbench just like any other .app on their computer. Previously we had quite a few Mac install issues, especially in finding the locally installed Java runtime environment (JRE).
This new version also includes a pre-configured connection to our update site. Within the workbench you can select “Check for Updates” from the Help menu. Then a wizard will guide you through the install of any available updates. This makes it possible to keep the workbench fully updated without replacing the install directory. You therefore keep all your settings intact.
Please contact me if you have questions or feedback.
We have just posted the latest release of the Curator’s Workbench to our download site. This release was focused mostly on metadata mapping tools, with enhancements to deposit forms, crosswalks and dictionaries. At UNC we are beginning to use deposit forms in our production work flow for theses and dissertations. This created demands for new features to support review, aggregate works and email notifications. You will also see better support for the reuse of whole crosswalks and dictionary blocks.
Lastly, we now record the file create date for everything that is captured in METS. Though not yet recorded in our own repository, this was a widely requested feature.
- Support for “Save as..” for crosswalks, dictionaries and forms.
- Original file create date is recorded in METS upon capture.
- Fixed various UI issues for recent versions of Mac OSX.
- Sanitized the developer setup and documentation.
- Add or remove a dictionary preference without restarting.
- Arrangement container type radio buttons work correctly and reflect current setting.
- Support for upload of multiple files, i.e. deposit of aggregate works.
- Forms can include contact information.
- Forms application sends deposit receipts and notifications.
- Forms application provides more user friendly error reporting and admin notification.
- Fix for obscure form file corruption issue when dictionary element dropped into an empty form.
- You can now change the delimited file data source for a crosswalk via a file dialog.
- Support for mapping text to mixed content elements.
- Trimming of any empty output XML elements.
- Fixed a record matching issue that occurred when folders were added manually to arrangement.
- Editor redesigned and various bugs fixed
- Publication control added
Please give the new version a spin and let us know what you think. You can discuss the workbench in our Google™ group.
The jquery.xmleditor is an open source plugin designed to enable simple editing of metadata records for digital objects in the user’s web browser. In the CDR we primarily use it for MODS editing, but the XML editor works with most schema-based metadata standards, including METS and EAD. The editor automatically provides basic XML structure validation and only allows use of defined elements, subelements, and attributes.
Try the jquery.xmleditor demo.
The display is strongly visual, providing both a graphical representation laid out in blocks and a syntax highlighted text editor. Elements, subelements and attributes are easily added via a menu on the right in either view. The graphical editor also provides many of the standard tools expected in an editor, such as click and drag rearrangement of elements, undo, and keyboard navigation.
It can be easily embedded into an existing web page, and is able to work with existing documents either provided in-line in the page it is embedded in or from a separate address. For example, in the CDR we embed it in our administrative interface with it pointed to a SWORD 2 endpoint to provide the starting metadata record. The editor submits the modified documents back to the SWORD repository to perform updates, but can also export the document to a file, browser willing.
The source code and documentation for the editor are available from the jquery.xmleditor GitHub repository.
We deployed a new version of the Carolina Digital Repository software yesterday that includes many enhancements we’ve been working on for the past several months. Here is a list of the highlights.
- A completely redesigned access control scheme is the centerpiece of this release. We call our new access control scheme FRACAS (Fedora Role-based Access Control and Security). It takes advantage of Fedora’s Enhanced Security Layer (FeSL) to apply role-based access controls to individual datastreams in the repository. With FRACAS we can consistently enforce sophisticated permissions at the Fedora API level, making it possible to expand access to administrative functionalities to collection managers, and to open up repository data to third-party applications.
- Enhancements to the public interface include the ability to display collection-level metadata for restricted collections. Along with this we have created a form that enables users to request access to these collections from collection owners.
- We improved UI support for aggregate objects. The interface now rolls up a default object and its child objects into a unified display.
- We added the ability to support custom forms for user-initiated deposit of materials into the repository. We piloted this service with UNC’s School of Information and Library Science in the fall, and are opening up this feature to the Art department and the Undergraduate Honors Program this spring.
This release marks the beginning of a transformed administrative interface that is much more stable, intuitive, and functional than our early administrative application.
- To support the creation and updating of descriptive MODS metadata, we built an XML editor that is capable of supporting any schema-based XML format. We will release this tool to the community as a standalone open-source application. Look for more information about the jQuery MODS editor in the coming weeks.
- We created a review and publication tool to facilitate the review of user-initiated deposit by collection managers.
- We built a set of status monitors to allow collection managers to track the progress of large batch ingest processes, from initial acceptance through technical metadata extraction and indexing.
- The new administrative interface takes advantage of our Solr index, enabling more efficient searching and navigation for administrative users.
Solr index improvements
- We updated our Solr schema to support future interface enhancements, including scoped views into the CDR, and custom data elements for specific collections.
- We modified our Solr schema to support rollup for aggregate objects.
- Our Solr ingest pipeline is more efficient and modular. It now offers support for partial index updates, when only some of the information about an object must change.
- The index has enhanced support for access control. It is aware of the current user’s authorizations and displays only what the user is authorized to see.
- We have refined our reindexing process and are now able to complete reindexes with very little downtime.
- Upgraded to Fedora 3.6.2
- Upgraded to Solr 4
- iRODS now uses PAM authentication to interact with UNC’s LDAP service.
General improvements to code base
- We created code libraries for common components to support reuse of software components between the public and administrative applications.
I’d like to extend a special thanks to those that helped us diagnose and resolve several issues on the Mac OS. This point release addresses the problems we had launching the software via the “workbench.app” file. It will no longer be necessary to use the separate startup script. It also addresses a problem related to File Dialog windows. In certain cases these interactions with the Mac file picker resulted in an immediate software crash.
As it turns out both issues were really addressed upstream, in the Eclipse project. I was able to pull updated versions of the Eclipse plugins and rebuild the workbench with the fixes. At this point I’ll tip my hat to the Eclipse Project folks. It is wonderful to be part of such a large and vibrant developer community.
These fixes are in the current stable release, available at the usual download site:
Articles by UNC researchers that are available in BioMed Central are now available in the Carolina Digital Repository. The collection contains nearly 700 articles, and will grow over time.
UNC Libraries worked with BioMed Central to automate the deposit into the CDR of BioMed Central articles written by UNC researchers. The transfer is made possible through the use of the Simple Web-Service Offering Repository Deposit (SWORD) protocol. Once established, the SWORD implementation allows the deposit to take place without intervention on the part of the researcher, the Library, or the publisher.
BioMed Central is a publisher of over 200 open access journals in Science, Technology, and Medicine.
We recently released an updated version of the Creator’s Workbench with a number of significant additions to functionality (available for download here). The new version includes the ability to reuse crosswalks, create data dictionaries, and create mapped metadata ingest forms, among other things. Below is a summary of the major changes.
In previous releases each crosswalk was a separate effort, involving deep knowledge of the MODS schema and the user supplied metadata. Now crosswalks can be copied between projects and used again. They can also take advantage of common MODS mappings from a shared data dictionary. This means that not everyone creating crosswalks needs to be a MODS expert. It also makes building crosswalks far less time consuming.
Dictionaries streamline the process of mapping custom metadata to objects in crosswalks and forms. Dictionaries can conveniently package up the most common mappings and patterns for MODS elements for a set of users, allowing users and groups to share and reuse those standard patterns without having to build complex crosswalks for each project. Dictionaries can be designed for blocks of metadata and for the crosswalk connections that are used to create them. Dictionaries include labels and descriptive text to guide their use. They can be stored on network drives and shared by teams.
Deposit Form Designer
The forms feature allows the creation of web deposit forms suitable for a particular content stream. The forms use dictionary and crosswalk mapping components to map the input fields to the MODS schema or dictionary elements. Form designs also include explanatory text and designation of required fields. The forms work in tandem with a server-side form-hosting application, which can be configured to put uploads and MODS records into a folder or to deposit them into the repository via SWORD. The forms feature simplifies the creation of deposit forms, shifting form design from software developers to curators, who have greater familiarity with both the depositor community and with descriptive standards.
Originals and Drives
We’ve rebuilt the originals part of the workbench to better track drives and maintain the connection between groups of originals and original media. This means we have a clearer interface and better support for removable drives.
When you highlight an image file, either in the originals tree or the arrangement, you can see a preview of the image in the workbench. This saves a lot of time for those working with image collections. The preview window can be arranged and sized according to your workflow needs.
We added export support for a variety of identified work flow needs. A BagIt ZIP export function delivers project-based packages meeting this widely used standard. The project arrangement can now also be exported as a comma-separated file. This was implemented to support a local EAD authoring scenario, but may be useful to others. You can also now export entire projects to a ZIP file or a shared drive. This was done to support multi-user work flows and projects with a significant wait between the capture step and arrangement or description. Individual files, such as crosswalks and data dictionaries, can also be exported and imported across project and workbench installations.
The CDR development team has been busy this spring and summer building enhancements to both back-end and public-facing components of the repository. The result is a much stronger CDR this fall. Here’s a list of the major new features we’ve released since June.
- The repository’s user interface has been completely redesigned to be more visually appealing and to make it easier for users to preview and download the content they find
- Integrated dynamic content on the CDR home page, such as featured collections, news items from the blog, and a feed of the most recently added materials
- Enabled the display of thumbnail images throughout the site and introduced jpeg 2K previews on full record pages
- Implemented inline viewer for jpeg 2K images using the Djatoka image server
- Implemented inline.mp3 and .mp4 playback using incremental download
- All available descriptive information now displayed on an object’s full record page
- Enabled search engine crawling–CDR materials available through web search engines
- Improved local indexing returns more relevant search results
- Created advanced search interface
- Enabled faceted browsing of repository materials. Users can browse by the repository or collections structure, or limit results by collection, academic department, format, language, or subject
- Group-based access control applied at the object level
- Integrated with public user interface to display only content that is available to a given user
- Support for embargoes at the object level
- Uses Fedora’s internal access controls
- Integrates with campus Shibboleth and Grouper services
- Supports for access control in metadata, including embargo
- Improved schema-based METS and MODS support
- Includes support for multiple networked staging areas, including iRODS and the Library’s digital archive, reducing the amount of time spent moving large files over the network for ingest staging
- Derivative image generation and processing on ingest
- Technical metadata extraction using FITS
- Indexing of new or modified metadata is faster
- Mid-tier processing “catch-up” services are now on a scheduler that runs nightly instead of having to be manually invoked
Solr index redesign
- Index schema completely re-designed to drive enhanced access through the new UI
- Solr ingest is multi-threaded and several orders of magnitude faster than previous implementation.
- Virtually all reindexes happen without system downtime
- More data is indexed, improving retrieval
- Search algorithm is improved, resulting in fewer false hits in search results
- Consolidated multiple build files into one master build file. Reorganized code to streamline code integration process.
- Audited code for sensitive information and placed configuration files in private repository to prepare for release of CDR on github.
- Implemented Jenkins for nightly integration of code updates into the repository. This reduces the problems that can occur when multiple developers are working on divergent local code bases.
- CDR code released on GitHub in September 2011.
Well, I have finally resolved that last tricky bug in the new version and posted the new ZIP files for download. You can get your brand new 1.2 workbench software here:
A complete list of all the enhancements and fixes is on the GitHub site under the Release 1.2 Milestone. Please finish all current projects *before* you upgrade your workbench. Not all the features added in this release are backward compatible.
This release adds a great deal of flexibility to the crosswalk editor, but at the cost of making it somewhat more complex. The crosswalk editor now requires more familiarity with the output schema, i.e. MODS. The next release will attempt to address this complexity in the interface by providing reusable output templates of some sort.
In crosswalks the output elements now map directly to XML elements and attributes. You can arrange XML elements within each other to any depth you like. This allows you to use compound MODS elements, such as titleInfo and subject, in exactly the way you wish. The editor will prompt you with element and attribute choices, based on the schema and the context element (drop target). This means that when you drop a new XML element inside of a titleInfo, you will be asked if you want the title, subTitle, partName, partNumber, or nonSort sub-element. The same schema-driven behavior applies to XML attributes.
- Set default values for crosswalk XML elements and attributes
- Date makers now support multiple date formats, using the first one to match.
- ISO8601 date outputs retain only the precision of the format that is recognized.
- Staging is faster and has better progress bars.
- Various interface improvements
- Support for adding access controls within the arrangement
- Support for linking folders and collections to surrogate images
- Staging can be disabled and re-enabled on any project
- More standard save and delete behavior
- Auto-detection of character encoding for delimited files (in crosswalks)
- Open, close and delete projects