Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-settings.php on line 512 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-settings.php on line 527 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-settings.php on line 534 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-settings.php on line 570 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-includes/cache.php on line 103 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-includes/query.php on line 61 Deprecated: Assigning the return value of new by reference is deprecated in /export/web/virtual/www_mcdu_unt_edu/wp-includes/theme.php on line 1109 MARC Content Designation Utilization

MCDU Home

February 22nd, 2005

PROJECT NEWS

As of December 31, 2007, the MARC Content Designation Utilization (MCDU) Project has come to completion. This site, however, will remain available. New information may be added to the site. We appreciate the generous funding from the Institute of Museum and Library Services through a National Leadership Grant that made this research study possible.

Amy Eklund, former member of the MCDU Project Team and now employed at Georgia Perimeter College as a cataloger, presented an analysis of Georgia catalogers’ use of MARC at the 2007 Georgia Council of Media Organizations (GaCOMO) Conference . The presentation and handout are available in Published Papers and Presentations.


This research study is dedicated to the memory of
Henriette D. Avram, 1919-2006

HenrietteAvramPhoto

Welcome to the MCDU project. The U.S. Federal Institute of Museum and Library Services (IMLS) awarded a National Leadership Grant to the Texas Center for Digital Knowledge at the University of North Texas to carry out an empirical investigation of catalogers’ use of the MARC bibliographic format’s content designation. This empirical data will contribute to the broader library community’s current discussion of bibliographic control and information access.

Empirical evidence regarding the utilization of MARC content designation in our current library information retrieval systems can contribute to discussions regarding the future of MARC and its place in the rapidly evolving networked information environment The absence of any solid empirical analysis in the past 30 years, beyond that of frequency of MARC tag use, is a major motivation for this study.

Findings revealed in a preliminary analysis of 400,000 MARC records as part of the IMLS-funded Z39.50 Interoperability Testbed Project under the direction of Dr. Moen motivate us to take a deeper look by focusing exclusively on catalogers use of MARC. It is the cataloger’s use of the structures in MARC records that affect both the perceived performance of the vendors’ systems and end users’ success in searching, selecting, and retrieving information.

Contributing resources for the project are the School of Library and Information Sciences at the University of North Texas and OCLC. We have invited a group of national and international experts to serve as an advisory group to the project.

The project began in December 2004 and is scheduled for completion in August 2006. We have received a one-year extension from IMLS to continue our work. Go to the Documents to access project documents, including the project proposal. Questions and communications about the project should be addressed to Dr. William E. Moen or Dr. Shawne D. Miksa.

Identifying and Understanding Factors Affecting Catalogers’ Utilization of MARC

January 1st, 2008

The MCDU Project investigated catalogers’ use of MARC fields, subfields, and other content designation structures (CDS) in over 56 million MARC bibliographic records from OCLC’s WorldCat database. As we confirmed through our analysis that a relatively small percentage of available CDS are typically used, the recurring question was: Why do catalogers do what they do? The MCDU Project did not study this question but the Project provide a wealth of empirical data about the records created in the cataloging enterprise. The results and findings about catalogers’ use of MARC CDS can be further informed by studying the factors that affect catalogers’ utilization of MARC.

In our original proposal to the Institute of Museum and Library Services for the MCDU Project, we included as one of the project goals: Investigate a methodological approach to understand the factors contributing to current levels of MARC content designation use and relationships with the cataloging enterprise. A specific objective of the MCDU Project was: Develop a methodological approach to identify and understand factors contributing to catalogers’ use of MARC content designation. We have addressed this objective by producing the following document:

MARC Content Designation Use over Time

January 1st, 2008

The MCDU Project focused on analyzing cataloger’s use of MARC fields, subfields, and other content designation structures (CDS). The first phase of that analysis examined the use of MARC CDS based on format of material and source of cataloging. We know, however, that over time the CDS available to catalogers changed as MARC evolved. New CDS were added, some were deleted, and format integration brought together similar CDS from various formats.

The MCDU Project Team carried out a second phase of analysis that attempted to take into account the dynamic and evolving nature of MARC and the CDS available to catalogers over time. A document listed below describes the methodology for this analysis. In addition, the frequency counts and tabulation upon which this summary is based are available in a set of data reports listed below.

Before this analysis could be carried out, it was first necessary to document when new fields, subfields, and other CDS were added, deleted, and/or changed. No existing database or resource existed that documented the evolution of MARC CDS. The MCDU Project created HistoriMARC, a database that stores in structured form all available information about MARC bibliographic CDS from 1972 through 2004. HistoriMARC was the enabling tool for carrying out the analysis of MARC use over time since it allowed us to manipulate and reuse data about CDS additions, deletions, and other changes in our analysis. A summary of the results and findings is provided in the document, Catalogers’ Use of MARC Content Designation over Time: An Analysis of MARC Records from 1972 to 2004.

Catalogers’ Use of MARC Content Designation over Time: An Analysis of MARC Records from 1972 to 2004.

Extent of Support for FRBR User Tasks

December 19th, 2006

The frequency count analysis and the identification of commonly used elements provide the basis for addressing another project objective, namely, examining the extent to which the four user tasks identified in the Functional Requirements for Bibliographic Records (FRBR) are supported by catalogers’ utilization of MARC elements in the MCDU dataset.

We first developed a methodology for comparing the MCDU results with the data elements of the MARC record shown by Delseys research to support the four resource discovery tasks: find, identify, select, and obtain. We then carried out the analysis and developed a report of our findings. These documents, along with a spreadsheet used for analyzing the frequency data are listed below. The documents are still in draft form, as we continue working on this project activity.

Back to Analysis Reports and Results

Commonly Used MARC Elements

December 19th, 2006

One of the research goals of the project, to provide empirical evidence to document MARC21 content designation use by catalogers, has been partially met by the objective of conducting frequency counts of all fields and subfields used in the OCLC WorldCat database. Another research objective is to identify commonly used elements in bibliographic records based on the analysis of format-specific record sets and comparing these elements with existing recommendations by Library of Congress agencies for national level, core level, and minimal level records.

In support of the research goals, we carried out a set of analyses to address the following research questions: What are the sets of commonly used elements per format, and how do these compare with the elements prescribed in current national, core, and minimal level recommendations or guidelines for cataloging? Conversely, are there elements which are frequently used by catalogers but are not prescribed in current national, core, and minimal level recommendations or guidelines for cataloging?

We first analyzed and compared the utilization of MARC content designation in the MCDU dataset with two bibliographic record standards, Program for Cooperative Cataloging (PCC) BIBCO Core Record Standards, and CONSER Record Standards. We then analyzed and compared the utilization with National and Minimal Level Requirements for Bibliographic Records. The methodology and reports of the analysis are linked below. In addition, we provide the spreadsheet file used in the comparison of MCDU utilization with the National and Minimal Level Requirements for Bibliographic Records. The first worksheet in the file gives instructions on use of the spreadsheet and its embedded filters.

Finally, we identified the most commonly occurring elements, or those that occur within a defined threshold, in the MCDU dataset. These commonly occurring elements are given context by comparing them with the National and Minimal Level Bibliographic Record Requirements, and the Program for Cooperative Catalogings (PCC) BIBCO Core Record Standards and CONSER Record Requirements for Full, Minimal and Core Level Records for Serials.

Through this comparison of recommendations of elements with actual usage of elements by catalogers, we hope the results can illuminate the intersection of standards and practice in cataloging, and to inform future development of standards.

Back to Analysis Reports and Results

Results of Frequency Counts Analysis

December 18th, 2006

The first phase of our analyses focused on determining the frequency of occurrences of all the MARC content designation structures. We profiled the entire dataset, and the results are listed in the first document below.

The entire dataset was separated into 20 databases based on format of material described (e.g., electronic resources, sound recordings, and graphic materials) and the original cataloging source (either Library of Congress or OCLC member libraries). We then carried out frequency count analysis on each of the twenty subsets of the dataset. The links below connect to the resulting 20 data reports. The first link is to a spreadsheet that contains much of the data used in the analysis. The spreadsheet has embedded filters that allow various views of the data, including the content designation in the various subsets of records that are within the calculated threshold we use to determine commonly occurring elements.

  • Format Content Designation Analysis: Data ReportGeneral Profiles .pdf icon

    This document provides an overall picture of the MCDU dataset of 56,177,383 MARC 21 records. The profiles of the entire set of records as well as some of the subsets of records (based on either source of cataloging or format of material described) offer a view of some key characteristics such as length of records, date of records, type of records, etc.

Frequency Count Results for Records Created by the Library of Congress

The dataset contains 8,713,665 records (approximately 15.5% of the total MCDU dataset) that can be attributed to catalogers at the Library of Congress or cooperative cataloging projects supervised by the Library of Congress. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).

Frequency Count Results for Records Created by the OCLC Member Libraries

The dataset contains 47,463,718 records (approximately 84.5% of the total MCDU dataset) that can be attributed to catalogers at OCLC member libraries. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).

Back to Analysis Reports and Results

Preparation of the Data for Analyses

December 18th, 2006

The MCDU Project Team’s first challenge was to prepare the 56 million MARC records for analysis. This page has links to documents that provide information about the decomposition of the MARC records, the design of a MySQL database to hold the decomposed records, the database loading, and the validation of the parsing software that decomposed the records. It also includes a document that details the procedures for creating different subsets of the records for analysis by format. The final document presents the analytical questions that guided our analysis.

Decomposing the Records, Database Structure, Database Loading, Validation Procedures

  • .pdf icon MCDU Project MARC Records Dataset: Decomposition Specification, Database Design, and Parser Software
    This document provides information about the MARC dataset, the specifications for decomposing the MARC record, the design of the database to hold the decomposed records, and the parsing software that was designed to decompose the records and load the data to the database. Date Posted: July 15, 2005.

  • .pdf icon Validation Procedures for MARC Record Parsing Software
    This document describes the procedures for testing the parsing scripts used to decompose the MARC records. A sample of the raw MARC records from the dataset and the resulting parsed records are subjected to the validation procedures detailed below to verify the integrity of the software and ascertain that the data from the MARC records are correctly represented prior to loading into the database. Date Posted: July 15, 2005.

The MCDU MARC Parser and Database Loader

  • The MCDU Project Team developed a custom application using the PHP scripting language to decompose the MARC records and to load the data into the MySQL database. The parsing software reads MARC records from the ASCII text file, parses the records in memory according to specifications described in the first document listed above, and inserts the records in to MySQL database. The following is a brief description of the software functionality. A user selects the source file containing MARC records, the number of records to skip at the beginning of the file, the number of records to process in a group, and the number of records to process from the file. The user then selects the output destination, which can be either Browser or Database. Choosing Browser displays the results of the parsing in the browser user interface. Browser output is used to validate the parsing function. If Database is chosen for the output, the system launches the database loader and moves the parsed data into the MySQL database. After a user clicks the Start Processing button, the parser performs processing, and the results are output to the browser or inserted to the database. For demonstration of this tool, we have disabled writing the data to the database. Click on the link below to see a demonstration of the parser tool. Date Posted: July 15, 2005.
  • Access the MCDU MARC Parser

Preparation of the Databases Containing MARC Records for Analyses

  • Format Content Designation Analysis: Set Definition and Extraction Queries .pdf icon
  • This document contains the procedures in the form of structured query language (SQL) queries designed to create the 20 format-specific MCDU project databases containing approximately 56 million MARC records from the OCLC WorldCat database. Natural language queries were developed and translated into SQL. Test queries were initially run against samples of the data and analyzed to ensure that the queries are properly formed and produce expected results.

  • Format Content Designation Analysis: Set Profiling and Analysis Queries .pdf icon
  • This document describes the questions we are asking of the data to address project research questions. Questions are transformed into SQL queries that result in reports produced by the MySQL database management system. The basic questions asked of the sets are similar, and the organization for basic analysis queries of the MCDU record sets are detailed in this document. The queries are organized into three categories: General Profile Queries, Frequency Counts Queries, and Second Level Analysis Queries.

Back to Analysis Reports and Results

Analysis Reports & Results

December 15th, 2006

Throughout the MCDU Project. we have publish on this site reports and other documents from the various analyses completed. This page links to documents that contain results of our analysis and other project deliverables. During the project we addressed a variety of project objectives and activities. We have organized documents and reports into several categories (follow the links to find the documents and reports associated with these categories):

Preparation of the Data for Analyses
Results of Frequency Count Analyses
Commonly Used MARC Elements
Extent of Support for FRBR User Tasks
MARC Content Designation Use over Time
Identifying and Understanding Factors Affecting Catalogers’ Utilization of MARC

Preparation of the Data for Analyses addresses the various methods and procedures used in preparing the MCDU dataset of more than 56 million MARC bibliographic records for analyses.

Results of Frequency Count Analyses lists the reports that contain the fundamental data upon which subsequent analyses were conducted. One report provides an overview profile of the entire MCDU dataset, and 20 separate reports present the frequency count on subsets of the entire dataset based on format of material described and the source of cataloging.

Commonly Used MARC Elements reports on our analysis that determined commonly used elements (based on the frequency count analyses) for each of the ten formats. We provide context for these commonly occurring elements by comparing the elements with guidelines and recommendations fromthe Program on Cooperative Cataloging BIBCO core records, CONSER records, and also recommendations published by the Library of Congress for national level (full and minimal) records.

Extent of Support for FRBR User Tasks examined the extent to which MARC elements used in the MCDU dataset support the four user tasks identified in the Functional Requirements for Bibliographic Records. We use a mapping by Tom Delsey of MARC elements that support each task, along with our frequency count analyses as the basis for the results reported.

MARC Content Designation Use over Time reports on a second phase of analysis that attempted to take into account the dynamic and evolving nature of MARC and the CDS available to catalogers over time. We know that the CDS available to catalogers changed as MARC evolved. New CDS were added, some were deleted, and format integration brought together similar CDS from various formats.

Identifying and Understanding Factors Affecting Catalogers’ Utilization of MARC presents the final deliverable for the MCDU Project. We have developed a research plan that addresses the stakeholders involved in the cataloging enterprise to gain an understanding of the factors that influence and shape catalogers’ decisions regarding the use/nonuse of available MARC CDS. The MCDU Project only looked at the artifacts of the cataloging enterprise (i.e., the actual MARC records). An important next step in a future project will be to engage with the people involved in the cataloging enterprise to understand the behaviors, technologies, policies, and other factors that produced the MARC records examined in the current project.

Published Papers and Presentations

December 15th, 2006

The project team is committed to disseminating information about the MCDU Project. This page has links to presentations and published papers.

Presentations

Published Papers

In Memory of Henriette Avram, 1919-2006

June 23rd, 2006

The MCDU Project is dedicated to Henriette D. Avram, fondly known as the Mother of MARC. During the 1960s, Henriette let a team of librarians at the Library of Congress in a development effort that resulted in the Machine-Readable Catalog record. An obituary from the New York Times provides a wonderful portrayal of this amazing librarian and information scientist.

MCDU Press Release, May 2005

May 12th, 2005

Contact:
William E. Moen, Ph.D. (wemoen@unt.edu)
Shawne D. Miksa, Ph. D. (smiksa@unt.edu)
Corrie Marsh (cmarsh@unt.edu)
Texas Center for Digital Knowledge
PO Box 311068
Denton TX 76203
Tel: 940.565.4552
Fax: 940.565.3101

FOR IMMEDIATE RELEASE (11 May 2005)

LIBRARY CATALOG RECORDS UNDER THE MICROSCOPE

DENTON, TX University of North Texas School of Library and Information Science (SLIS) Professors Dr. William E. Moen and Dr. Shawne D. Miksa are studying library catalog records, but not for the purpose of finding books. They are examining how books and other library materials are represented through electronic codes in online library catalogs. The project, entitled MARC Content Designation Utilization: Inquiry and Analysis, is the largest scientifically-based study of coding practices in electronic library catalogs. During the course of the 2-year project, Drs. Moen and Miksa, Fellows at the Texas Center for Digital Knowledge (TxCDK), will investigate the extent of catalogers use of MARC 21, the mark-up language used by catalogers worldwide to create electronic catalog records. SLIS Ph.D student Serhiy Polyakov and Masters students Amy Eklund and Gregory Snyder serve as Research Assistants on the project.

The Institute of Museum and Library Services (IMLS), an independent Federal grant-making agency dedicated to creating and sustaining a nation of learners by helping libraries and museums serve their communities, is funding the project with a National Leadership Grant of $233,115 to the University of North Texas-TxCDK.

MARC (Machine Readable Cataloging) records provide bibliographic information and descriptions of items in library collections, including books, sound recordings, computer files, and visual materials. The data elements of MARC records form the foundation of most electronic library catalogs used in North America today, as well as in libraries around the world. Development of the MARC format was begun almost forty years ago by an initiative of the Library of Congress and evolved into the current format, MARC 21, which emerged in the late 1990s. The MARC 21 format for bibliographic data is maintained by the Library of Congress’s Network Development and MARC Standards Office and the National Library of Canada’s Standards and Support Office.

To obtain catalog records for the study the researchers turned to the Online Computer Library Center (OCLC, www.oclc.org), the largest online catalog record source in the world. This nonprofit computer library service and research organization maintains the WorldCat database which contains unique bibliographic records shared and contributed by more than 50,000 libraries in 84 countries and territories around the world. OCLC initially agreed to supply 1 million library catalog records from the WorldCat database for the researchers. After recent discussions, OCLC agreed to provide the researchers with all of its approximately 55 million catalog records. This new development will significantly increase the accuracy of the research results.

Dr. Moen explains that current MARC 21 specifications define nearly 2000 fields and subfields available to library catalogers working to create catalog records. In the 2003 IMLS-funded study entitled the Z-Interoperability project, Moen discovered that very few of these fields are being used. In fact, Moen discovered that only 36 of the available MARC fields accounted for 80% of all utilization. These preliminary findings have important implications for library catalogers and other library and information science professionals, and form the basis for the current study.

An important goal of the project is to create tools for the future study of catalog records. Dr. Miksa describes how this project will provide research strategies to examine MARC records as artifacts of the cataloging process. She emphasizes that resulting data will greatly inform cataloging education and curricula which is critical to the continued development and improvement of information retrieval systems in libraries worldwide.

The projects findings ultimately will lead to improved access to information in library catalogs. Dr. Samantha Hastings, SLIS Interim Dean, describes this project as just the type of funded research that leads to core developments in our fieldthe first of its kind that will be an important contribution to what we know about how content in library catalogs is actually being coded for organization and access. How the information is organized directly influences how people get the information they need.

Details of the project can be found at http://www.mcdu.unt.edu, a website created and maintained by SLIS Masters student Bryce Benton.