[We’re currently preparing a Technical Whitepaper on our new DC-X product, and will write parts of it on this blog. Feedback is welcome. Please note: While we wouldn’t usually edit regular blog posts, we’ll probably modify the parts of this series regularly to mirror the evolving whitepaper.]

This chapter summarizes the most important features and the technical foundation of DC-X:

Digital Collections DC-X is a Digital Asset Management system that can be used to store just one department’s image, PDF and video files, but is also designed to be the central content repository for the whole company, holding millions of files and text documents.

A powerful role-based permission system allows multiple user groups to share or exclusively access content within a single installation. Licensing information can be attached to any asset so that users are aware of usage restrictions for rights managed content. Publication data can be stored in DC-X, making it the central reference point for published content.

Users can remember and structure content through (private or group-level) tagging, and can comment on any asset. Stored searches can be automatically run by a “search agent”, notifying the user when new content arrives. DC-X will automatically display related content, making it easier to discover content.

The theme planning module allows planning of content to be published across the boundaries of publications or publication channels (cross-media), assigning and monitoring tasks, and selecting matching assets.

DC-X is backed by a database (Oracle or MySQL), which holds all data (textual, metadata, administrative data) except for the actual asset files – images, previews, videos and other files are stored in the filesystem.

Asset text and metadata (IPTC, EXIF, XMP etc.) is read during the import process (ingestion) and copied into the database record representing the asset. The original asset file always remains unchanged in the filesystem, edits within DC-X are being made on the database record. During export, the modified metadata can be written into the exported copy of the file.

DC-X is not limited to handling file-based assets; text records (like blog postings, newspaper articles or news agency texts) are also supported. Text can be formatted (the internal storage format is XHTML), and multi-lingual text and metadata are supported. All textual content is stored as Unicode (in UTF-8 encoding). Any kind of files can be imported into DC-X. For a lot of file formats, it can automatically extract text and metadata and render preview image files.

The user interface is web-based and requires a current version of Mozilla Firefox, Microsoft Internet Explorer or Apple Safari on Windows, Mac OS X or Linux. The server side is implemented using PHP and the Apache web server, running on Linux (Windows servers are not supported). Searches within DC-X are performed using Solr (based on Lucene) using a simple, Google-like query syntax. User accounts and user group definitions are not held within DC-X, they’re read from an LDAP-compliant directory service (like Microsoft Active Directory or OpenLDAP – the latter will be installed by us if there’s no existing directory service).

A tiny but capable workflow engine is built into DC-X that allows for mixed human and automated workflows (like routing an import error onto an administrator’s to-do list), easily scalable parallel processing of import and export jobs and flexible configuration and tracking of workflows.

DC-X is made to operate 24/7 in production environments. Nagios monitoring is supported out of the box. DC-X is a very open environment, based on standards (XML, XSLT, Unicode, LDAP), providing Atom feeds (similar to RSS feeds), a full web service API (based on the Atom Publishing Protocol) and powerful Unix command line tools. A lot of things are configurable (custom fields, forms, workflows), and DC-X is architected to be extensible: It will be possible to program plug-ins that provide additional functionality or automate common operations.

Integration of DC-X with other software (editorial systems, web content management systems, Adobe InDesign) is already available or being worked on.

Differences compared to DC5 [this section provided for those who know our Digital Collections DC5 asset management product]: The permission system is much more powerful than in DC5 (details follow in a dedicated post). Publication data and licensing information handling are now available out of the box. Collections have been replaced by tagging. Comments, search agents and related content display are new, as well as the theme planning module. The PostgreSQL database is not supported, MySQL can be used instead if you’re looking at a less expensive solution. Formatted and multi-lingual text support have been added. Servers must run Linux (Solaris support will probably be added.) We’ve been moving from Oracle Text to Solr. LDAP is now always required. A workflow engine has been embedded. The new web service API is built on AtomPub (but most of DC5’s REST API will be available for legacy integration as well).

Tim Strehle
About Tim Strehle

Tim was part of Digital Collections' Research & Development team from 1999 to 2017. He is an expert for Metadata and Thesauri.


Leave a Reply