The first phase of our analyses focused on determining the frequency of occurrences of all the MARC content designation structures. We profiled the entire dataset, and the results are listed in the first document below.
The entire dataset was separated into 20 databases based on format of material described (e.g., electronic resources, sound recordings, and graphic materials) and the original cataloging source (either Library of Congress or OCLC member libraries). We then carried out frequency count analysis on each of the twenty subsets of the dataset. The links below connect to the resulting 20 data reports. The first link is to a spreadsheet that contains much of the data used in the analysis. The spreadsheet has embedded filters that allow various views of the data, including the content designation in the various subsets of records that are within the calculated threshold we use to determine commonly occurring elements.
- Format Content Designation Analysis: Data Report—General Profiles
This document provides an overall picture of the MCDU dataset of 56,177,383 MARC 21 records. The profiles of the entire set of records as well as some of the subsets of records (based on either source of cataloging or format of material described) offer a view of some key characteristics such as length of records, date of records, type of records, etc.
Frequency Count Results for Records Created by the Library of Congress
The dataset contains 8,713,665 records (approximately 15.5% of the total MCDU dataset) that can be attributed to catalogers at the Library of Congress or cooperative cataloging projects supervised by the Library of Congress. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).
- Frequency Counts for Books, Pamphlets, and Printed Sheets (Set 01_B_LC)
- Frequency Counts for Cartographic Materials Records (Set 02_CM_LC)
- Frequency Counts for Electronic Resources Records (Set 03_ER_LC)
- Frequency Counts for Continuing Resources Records (Set 04_CR_LC)

- Frequency Counts for Manuscripts (including Collections) (Set 05_MS_LC)

- Frequency Counts for Music Records (Set 06_MU_LC)

- Frequency Counts for Sound Recording Records (Set 07_SR_LC)

- Frequency Counts for Projected Media Records (Set 08_PM_LC)

- Frequency Counts for Graphic Materials Records (Set 09_GM_LC)

- Frequency Counts for Three Dimensional Objects and Realia (Set 10_3D_LC)

Frequency Count Results for Records Created by the OCLC Member Libraries
The dataset contains 47,463,718 records (approximately 84.5% of the total MCDU dataset) that can be attributed to catalogers at OCLC member libraries. The following reports present the results of frequency count analyses on these sets of MARC records distinguished by the type of material being described. These are final versions of these data reports (March 2006).
- Frequency Counts for Books, Pamphlets, and Printed Sheets (Set 01_B_nonLC)

- Frequency Counts for Cartographic Materials Records (Set 02_CM_nonLC)

- Frequency Counts for Electronic Resources Records (Set 03_ER_nonLC)

- Frequency Counts for Continuing Resources Records (Set 04_CR_nonLC)

- Frequency Counts for Manuscripts (including Collections) (Set 05_MS_nonLC)

- Frequency Counts for Music Records (Set 06_MU_nonLC)

- Frequency Counts for Sound Recordings (Set 07_SR_nonLC)

- Frequency Counts for Projected Media (Set 08_PM_nonLC)

- Frequency Counts for Graphic Materials (Set 09_GM_nonLC)

- Frequency Counts for Three Dimensional Objects and Realia (Set 10_3D_nonLC)

Back to Analysis Reports and Results