C-Span Video Archives

16 03 2010

Researchers, political satirists and partisan mudslingers, take note: C-Span has uploaded virtually every minute of its video archives to the Internet. The archives, at C-SpanVideo.org, cover 23 years of history and five presidential administrations and are sure to provide new fodder for pundits and politicians alike. The network will formally announce the completion of the C-Span Video Library on Wednesday. Read more from the NYTimes.


Installing EPrints on Windows

12 03 2010

Updated manual for installing EPrints on your Windows system. Current manuals seem to be lacking in details. Here is a consolidation of instructions that have worked. Total install time should be around 30-45 minutes, depending on your technical experience. So if you ever wanted to play with a digital repository system – have fun.

Required Software

Apache 2.2.15-win32

ActivePerl 5.10.1 1007-MSWIN32-x86-291969 http://downloads.activestate.com/ActivePerl/releases/

MySQL 5.1.44-win32 http://dev.mysql.com/downloads/mysql/

EPrints v3.2.0 Windows Installer http://files.eprints.org/494/1/eprints-3.2.0.tar.gz

Optional Software

GhostScript 8.60 http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/gs861/gs861w32.exe

Catdoc 0.94.2 http://hpux.connect.org.uk/hppd/hpux/Text/catdoc-0.94.2/

ImageMagick  6.3.5-6 http://linux.wareseeker.com/Multimedia/imagemagick-6.3.5-6.zip/321889

Install Apache

Run the Apache .msi file that you downloaded. The .msi is a self installer and will guide you through the process. Install Apache on port [80] as a service for all users. Name your server name (localhost), Domain name (localhost) and administrative email account (any email.com). Apache will install in the C:Program Files\Apache Foundation\Apache2.2 directory by default. Change the directory to C:EPrints\Apache2.

After installation Apache automatically starts. The  icon in the system tray means Apache has started. The icon means that the Apache Monitor Servers are running but not Apache.

Install ActivePerl

Run the ActivePerl.msi. Install into the C:\EPrints\Perl directory. When the installation of ActivePerl is complete, you will need to install 2 additional ppd components (DBD-mysql.ppd and mod_perl.ppd) from the command line. Open a command prompt (Command line 101: Start Menu – Run – type “cmd”) and enter:

ppm install http://capn.uwinnipeg.ca/PPMPackages/10xx/DBD-mysql.ppd
ppm install   http://capn.uwinnipeg.ca/PPMPackages/10xx/mod_perl.ppd

The mod-perl installer will prompt you for the Apache module path. Enter:


You will now need to add mod_perl support to Apache. Locate and edit the Apache configuration file, C:\EPrints\Apache2\conf\httpd.conf. Open the file in a text editor and add the following lines:

LoadFile   “C:/EPrints/Perl/bin/perl510.dll”
LoadModule   perl_module modules/mod_perl.so

Configuring Apache and Perl

Configuring Apache and Perl requires you to set environment variables so EPrints can find Perl and its libraries. To set environment variables, use Control Panel – System – Advanced System Settings – Advanced – Environment Variables…

Locate the Path variable and edit it. Make sure both C:\Prints\perl\bin and C:\EPrints\Apache2\bin are included in the Path variable. Use a semicolon (;) to separate the variables.

Create a new variable PERL5LIB, with the value C:/EPrints/EPrints/perl_lib (note the forward slashes).

Install MySQL

Now run the MySQL installer and choose a Custom installation in the directory C:\EPrints\MySQL. You will need to set the following options:

Install the server and client programs. The C+ files are not needed. Skip the registration.

Configure MySQL

When the installation of MySQL completes, you will be prompted to configure the server. The configuration is simple and straightforward. You should accept most of the default settings.

When MySQL configuration has finished, you will need to set an option manually in MySQL’s configuration file by editing C:\EPrints\MySQL\my.ini in a text editor.

Remove the option NO_AUTO_CREATE_USER from the my.ini file.

Now restart MySQL so the new option will take effect. In the Control Panel – Administrative Tools – Services – MySQL and choose restart.

Install optional components

Install GhostScript, ImageMagick, and catdoc. These tools are not essential to EPrints, but provide extra functionality.

Run the GhostScript executable and install in C:\EPrints\GhostScript.

Catdoc is a zip file.  Unzip the file and place the contents into the EPrints directory. The file path should be C:\EPrints\catdoc-0.94.2.

Run the ImageMagick executable and install in C:\EPrints\ImageMagick . Select the options “Update executable search path” and “Install PerlMagick for ActiveState Perl”. Other options can be deselected.

Install EPrints 3

Run the EPrints installer. This will install files into C:\EPrints\EPrints.

When the installer has finished copying files, it will prompt you for server SMTP information.

Configure EPrints 3

First open a command prompt and change directory to C:\EPrints\EPrints. Now you can run epadmin to configure the archive.

cd \EPrints\EPrints

To start the EPrints creation process, run:

perl bin/epadmin create

Note: Whenever you need to run an EPrints command line tool, it must be prefixed with perl.

Run epadmin and fill out the prompts. You will get the following prompts (note that when you see something in [square brackets], it’s the default value and can be selected by simply hitting enter)

Archive ID – the system name for your archive. Once entered, an archive/<archive_id> directory will be created where the configuration files will be copied.

Configure vital settings – Hit enter to say ‘yes’. This will lead to more prompting about core settings:

Hostname – Since I am testing EPrints on my Laptop  I chose to run EPrints locally thus my hostname is is your computer’s default IP address. If you are directing to a live webserver, ensure that your IT can set the DNS.

Webserver Port – Which port to you want to serve the archive on? The default is 80, so unless you can think of a good reason not to, just hit enter to accept the default.

Alias – I created no aliases. You can enter any number of aliases that will take users to this archive. Enter a ‘#’ when you don’t want to enter any more. You could have your archive served on eprints.myorganisation.org and eprints.myorg.org. As with the Hostname, your systems team need to be informed about these aliases too.

Administrator Email – Enter the email address of the repository administrator.

Archive Name – The full name of your archive. By default, this will be used on the header of the webpage and in the title bar of the browser.

Write these core settings – Enter ‘yes’.

Configure database –  Enter ‘yes’.

Database Name – epadmin will create the database for you. By default, epadmin uses your Archive ID for database name.

MySQL Host – The address of the server that the database is running on. If the database is on the same machine as the EPrints installation, enter ‘localhost’.

MySQL Port – You probably don’t need to enter a value.

MySQL Socket – As with MySQL Port, it’s unlikely that you need to enter anything.

Database User – The username with which to log into the MySQL Database. You don’t need to create this user, epadmin will do it for you. If you enter a MySQL username that already exists, it will be overwritten by epstats.

Database Password – The password for the Database User.

Write these database settings – Choose ‘yes’.

Create database <Database Name> – Choose ‘yes’, and epadmin can create the database.

MySQL Root Password – To create the database and the user, epadmin needs the MySQL Root Password.

Create database tables – say yes to have epadmin create all the database tables.

Create an initial user – Choose ‘yes’.

Enter a username – The username you will use to log into EPrints in your browser. Epadmin defaults to admin.

Select a user type (user|editor|admin) – There are three levels of user in EPrints. You probably want to be an administrator, so enter ‘admin’.

Enter Password – Enter a password

Email – Enter your email address.

Important: Note that, although you are prompted to build the static web pages, import LOC subject headings and update the apache config files, epadmin will FAIL to run them. Look above the message “That seemed to more or less work…” and See the error messages “…not recognized as an internal or external command…

You must run generate_static *Archives ID*, import_subjects *Archives ID*, and generate_apacheconf manually from the command prompt according to the standard instructions. *Archives ID* should match the Archives ID entered when you ran epadmin.

perl bin/generate_static *Archives ID*
perl bin/import_subjects *Archives ID*
perl bin/generate_apacheconf

Finally you need to add the EPrints configuration file to Apache. Edit C:\EPrints\Apache2\conf\httpd.conf and add at the bottom of the file:

PerlPassEnv PERL5LIB
Include C:/EPrints/EPrints/cfg/apache.conf

Starting Apache

Control Apache from the Services panel. Stop and start the service before testing, to reload the configuration file.


EPrints should now be accessible from your browser, at the hostname (localhost or you specified in epadmin.

Digitizing Dissertations for an Institutional Repository: A Process and Cost Analysis

10 03 2010

A very nice article published in 2008 about digitization, workflow, policy development, cost analysis, and end user access by Mary Piorun, Associate Director for Technology Initiatives and Resource Management ; Email: mary.piorun@umassmed.edu.


This paper describes the Lamar Soutter Library’s process and costs associated with digitizing 300 doctoral dissertations for a newly implemented institutional repository at the University of Massachusetts Medical School.


Project tasks included identifying metadata elements, obtaining and tracking permissions, converting the dissertations to an electronic format, and coordinating workflow between library departments. Each dissertation was scanned, reviewed for quality control, enhanced with a table of contents, processed through an optical character recognition function, and added to the institutional repository.


Three hundred and twenty dissertations were digitized and added to the repository for a cost of $23,562, or $0.28 per page. Seventy-four percent of the authors who were contacted (n=282) granted permission to digitize their dissertations. Processing time per title was 170 minutes, for a total processing time of 906 hours. In the first 17 months, full-text dissertations in the collection were downloaded 17,555 times.


Locally digitizing dissertations or other scholarly works for inclusion in institutional repositories can be cost effective, especially if small, defined projects are chosen. A successful project serves as an excellent recruitment strategy for the institutional repository and helps libraries build new relationships. Challenges include workflow, cost, policy development, and copyright permissions.


All About Repositories Webinar

10 03 2010

Stuart Lewis, the DSpace 1.6 community release manager will offer an overview of the lastest release on March 17. This free DuraSpace/SPARC “All About Repositories” web seminar will highlight new Fedora 3.3 and DSpace 1.6 features on March 17, 2010 at 2:00 p.m. ET. You may register for the web seminar here: http://www.arl.org/sparc/meetings/event_registration.shtml.

Digital Repository Management Uncovered

4 03 2010

Digital Repository Management Uncovered is a WEBWISE 2010 preconference presentation by Jessica Branco Colati and Sarah Shreeves.  Colati and Shreeves provide a great primer for understanding digital repositories. They discuss the components of  a DR management framework to include key areas, functions, and policies that provide for the  drive and sustainability DRs.  6 key components of DRs include (1) Hardware (2) Software (3) Content (4) Relationships (5) Controls & (6) Trust.  The abstract of the presentation reads:

“More and more libraries are establishing repository manager positions – either full time or as a piece of another position, but because of the newness of this area, the responsibilities of a repository manager are sometimes not well defined. This session will give an overview of the major areas of repository management institutions should be aware of and offer strategies and tools for participants. This session is platform agnostic and focuses on issues around preservation policies and activities, access and dissemination, and intellectual property of repository management, as well supporting sustainability and growth. The session will be useful whether or not your repository is in-house or hosted elsewhere.”

JISC Digital Repositories InfoKit covers the same ground as Colati & Shreeves’ presentation. It also contains information on a broad range of topics running from the initial idea of a digital repository and the planning process to the maintenance and ongoing management of the repository. The main focus is on institutional repositories.

Thanks to  IDEALS for providing access to the presentation. IDEALS collects, disseminates, and provides persistent and reliable access to the research and scholarship of faculty, staff, and students at the University of Illinois at Urbana-Champaign. Faculty, staff, and graduate students can deposit their research and scholarship – unpublished and, in many cases, published – directly into IDEALS. Departments can use IDEALS to distribute their working papers, technical reports, or other research material. Contact Sarah Shreeves, IDEALS Coordinator, for more information.