Archive for December, 2006

Extent of Support for FRBR User Tasks

Tuesday, December 19th, 2006

The frequency count analysis and the identification of commonly used elements provide the basis for addressing another project objective, namely, examining the extent to which the four user tasks identified in the Functional Requirements for Bibliographic Records (FRBR) are supported by catalogers’ utilization of MARC elements in the MCDU dataset.

We first developed a methodology for comparing the MCDU results with the data elements of the MARC record shown by Delsey’s research to support the four resource discovery tasks: find, identify, select, and obtain. We then carried out the analysis and developed a report of our findings. These documents, along with a spreadsheet used for analyzing the frequency data are listed below. The documents are still in draft form, as we continue working on this project activity.

Back to Analysis Reports and Results

Commonly Used MARC Elements

Tuesday, December 19th, 2006

One of the research goals of the project, to provide empirical evidence to document MARC21 content designation use by catalogers, has been partially met by the objective of conducting frequency counts of all fields and subfields used in the OCLC WorldCat database. Another research objective is to identify commonly used elements in bibliographic records based on the analysis of format-specific record sets and comparing these elements with existing recommendations by Library of Congress agencies for national level, core level, and minimal level records.

In support of the research goals, we carried out a set of analyses to address the following research questions: What are the sets of commonly used elements per format, and how do these compare with the elements prescribed in current national, core, and minimal level recommendations or guidelines for cataloging? Conversely, are there elements which are frequently used by catalogers but are not prescribed in current national, core, and minimal level recommendations or guidelines for cataloging?

We first analyzed and compared the utilization of MARC content designation in the MCDU dataset with two bibliographic record standards, Program for Cooperative Cataloging (PCC) BIBCO Core Record Standards, and CONSER Record Standards. We then analyzed and compared the utilization with National and Minimal Level Requirements for Bibliographic Records. The methodology and reports of the analysis are linked below. In addition, we provide the spreadsheet file used in the comparison of MCDU utilization with the National and Minimal Level Requirements for Bibliographic Records. The first worksheet in the file gives instructions on use of the spreadsheet and its embedded filters.

Finally, we identified the most commonly occurring elements, or those that occur within a defined threshold, in the MCDU dataset. These commonly occurring elements are given context by comparing them with the National and Minimal Level Bibliographic Record Requirements, and the Program for Cooperative Cataloging’s (PCC) BIBCO Core Record Standards and CONSER Record Requirements for Full, Minimal and Core Level Records for Serials.

Through this comparison of recommendations of elements with actual usage of elements by catalogers, we hope the results can illuminate the intersection of standards and practice in cataloging, and to inform future development of standards.

Back to Analysis Reports and Results

Results of Frequency Counts Analysis

Monday, December 18th, 2006

The first phase of our analyses focused on determining the frequency of occurrences of all the MARC content designation structures. We profiled the entire dataset, and the results are listed in the first document below.

The entire dataset was separated into 20 databases based on format of material described (e.g., electronic resources, sound recordings, and graphic materials) and the original cataloging source (either Library of Congress or OCLC member libraries). We then carried out frequency count analysis on each of the twenty subsets of the dataset. The links below connect to the resulting 20 data reports. The first link is to a spreadsheet that contains much of the data used in the analysis. The spreadsheet has embedded filters that allow various views of the data, including the content designation in the various subsets of records that are within the calculated threshold we use to determine commonly occurring elements.

  • Format Content Designation Analysis: Data Report—General Profiles .pdf icon

    This document provides an overall picture of the MCDU dataset of 56,177,383 MARC 21 records. The profiles of the entire set of records as well as some of the subsets of records (based on either source of cataloging or format of material described) offer a view of some key characteristics such as length of records, date of records, type of records, etc.

Frequency Count Results for Records Created by the Library of Congress

The dataset contains 8,713,665 records (approximately 15.5% of the total MCDU dataset) that can be attributed to catalogers at the Library of Congress or cooperative cataloging projects supervised by the Library of Congress. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).

Frequency Count Results for Records Created by the OCLC Member Libraries

The dataset contains 47,463,718 records (approximately 84.5% of the total MCDU dataset) that can be attributed to catalogers at OCLC member libraries. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).

Back to Analysis Reports and Results

Preparation of the Data for Analyses

Monday, December 18th, 2006

The MCDU Project Team’s first challenge was to prepare the 56 million MARC records for analysis. This page has links to documents that provide information about the decomposition of the MARC records, the design of a MySQL database to hold the decomposed records, the database loading, and the validation of the parsing software that decomposed the records. It also includes a document that details the procedures for creating different subsets of the records for analysis by format. The final document presents the analytical questions that guided our analysis.

Decomposing the Records, Database Structure, Database Loading, Validation Procedures

  • .pdf icon MCDU Project MARC Records Dataset: Decomposition Specification, Database Design, and Parser Software
    This document provides information about the MARC dataset, the specifications for decomposing the MARC record, the design of the database to hold the decomposed records, and the parsing software that was designed to decompose the records and load the data to the database. Date Posted: July 15, 2005.

  • .pdf icon Validation Procedures for MARC Record Parsing Software
    This document describes the procedures for testing the parsing scripts used to decompose the MARC records. A sample of the raw MARC records from the dataset and the resulting parsed records are subjected to the validation procedures detailed below to verify the integrity of the software and ascertain that the data from the MARC records are correctly represented prior to loading into the database. Date Posted: July 15, 2005.

The MCDU MARC Parser and Database Loader

  • The MCDU Project Team developed a custom application using the PHP scripting language to decompose the MARC records and to load the data into the MySQL database. The parsing software reads MARC records from the ASCII text file, parses the records in memory according to specifications described in the first document listed above, and inserts the records in to MySQL database. The following is a brief description of the software functionality. A user selects the source file containing MARC records, the number of records to skip at the beginning of the file, the number of records to process in a group, and the number of records to process from the file. The user then selects the output destination, which can be either Browser or Database. Choosing Browser displays the results of the parsing in the browser user interface. Browser output is used to validate the parsing function. If Database is chosen for the output, the system launches the database loader and moves the parsed data into the MySQL database. After a user clicks the Start Processing button, the parser performs processing, and the results are output to the browser or inserted to the database. For demonstration of this tool, we have disabled writing the data to the database. Click on the link below to see a demonstration of the parser tool. Date Posted: July 15, 2005.
  • Access the MCDU MARC Parser

Preparation of the Databases Containing MARC Records for Analyses

  • Format Content Designation Analysis: Set Definition and Extraction Queries .pdf icon
  • This document contains the procedures in the form of structured query language (SQL) queries designed to create the 20 format-specific MCDU project databases containing approximately 56 million MARC records from the OCLC WorldCat database. Natural language queries were developed and translated into SQL. Test queries were initially run against samples of the data and analyzed to ensure that the queries are properly formed and produce expected results.

  • Format Content Designation Analysis: Set Profiling and Analysis Queries .pdf icon
  • This document describes the questions we are asking of the data to address project research questions. Questions are transformed into SQL queries that result in reports produced by the MySQL database management system. The basic questions asked of the sets are similar, and the organization for basic analysis queries of the MCDU record sets are detailed in this document. The queries are organized into three categories: “General Profile Queries”, “Frequency Counts Queries”, and “Second Level Analysis Queries”.

Back to Analysis Reports and Results

Analysis Reports & Results

Friday, December 15th, 2006

Throughout the MCDU Project. we have publish on this site reports and other documents from the various analyses completed. This page links to documents that contain results of our analysis and other project deliverables. During the project we addressed a variety of project objectives and activities. We have organized documents and reports into several categories (follow the links to find the documents and reports associated with these categories):

Preparation of the Data for Analyses
Results of Frequency Count Analyses
Commonly Used MARC Elements
Extent of Support for FRBR User Tasks
MARC Content Designation Use over Time
Identifying and Understanding Factors Affecting Catalogers’ Utilization of MARC

Preparation of the Data for Analyses addresses the various methods and procedures used in preparing the MCDU dataset of more than 56 million MARC bibliographic records for analyses.

Results of Frequency Count Analyses lists the reports that contain the fundamental data upon which subsequent analyses were conducted. One report provides an overview profile of the entire MCDU dataset, and 20 separate reports present the frequency count on subsets of the entire dataset based on format of material described and the source of cataloging.

Commonly Used MARC Elements reports on our analysis that determined commonly used elements (based on the frequency count analyses) for each of the ten formats. We provide context for these commonly occurring elements by comparing the elements with guidelines and recommendations fromthe Program on Cooperative Cataloging BIBCO core records, CONSER records, and also recommendations published by the Library of Congress for national level (full and minimal) records.

Extent of Support for FRBR User Tasks examined the extent to which MARC elements used in the MCDU dataset support the four user tasks identified in the Functional Requirements for Bibliographic Records. We use a mapping by Tom Delsey of MARC elements that support each task, along with our frequency count analyses as the basis for the results reported.

MARC Content Designation Use over Time reports on a second phase of analysis that attempted to take into account the dynamic and evolving nature of MARC and the CDS available to catalogers over time. We know that the CDS available to catalogers changed as MARC evolved. New CDS were added, some were deleted, and format integration brought together similar CDS from various formats.

Identifying and Understanding Factors Affecting Catalogers’ Utilization of MARC presents the final deliverable for the MCDU Project. We have developed a research plan that addresses the stakeholders involved in the cataloging enterprise to gain an understanding of the factors that influence and shape catalogers’ decisions regarding the use/nonuse of available MARC CDS. The MCDU Project only looked at the artifacts of the cataloging enterprise (i.e., the actual MARC records). An important next step in a future project will be to engage with the people involved in the cataloging enterprise to understand the behaviors, technologies, policies, and other factors that produced the MARC records examined in the current project.

Published Papers and Presentations

Friday, December 15th, 2006

The project team is committed to disseminating information about the MCDU Project. This page has links to presentations and published papers.

Presentations

Published Papers