DAHR Now Utilizing Linked Data

DAHR is taking advantage of a growing feature of the Internet known as “Linked Data.” The obvious advantage that users will immediately notice is a richer talent record for many names in the database, with contextual data about artists and groups, such as Wikipedia biographies and photographs, and links to many other databases. Benny Carter’s record is a good example. A revamped name browse feature also provides entry to all the DAHR names with added faceting for the availability of streaming audio. But what is Linked Data?

At its core, Linked Data is a mechanism for identifying relationships between different online data sets to establish that an entity in dataset A is the same as (or has a relationship to) an entity in dataset B. These relationships constitute the building blocks of what is called the “Semantic Web.” DAHR uses the Library of Congress Name Authority File (LCNAF) number as an entry point to establish these identities and linkages. Of the 60,000 unique names in DAHR, nearly 18,000 now have established linkages to the LCNAF, and by extension, all the interlinked databases.

The LCNAF is part of the larger Semantic Web, which connects to other authority files and to datasets, databases, and vocabularies online that utilize Linked Data, including Wikipedia and Wikidata, VIAF (Virtual International Authority File), MusicBrainz, and Getty ULAN (Union List of Artist Names), as well as commercial services like Discogs, Spotify, AllMusic, Apple Music, and others. DAHR is now providing links to the URI (Uniform Resource Identifier) for names in these datasets as well as harvesting data from Wikipedia to bring in the biographies, photographs, and birth and death dates mentioned above.

To do this, our old data structure had to be replaced. In that structure, each manifestation of a name (e.g., Duke Ellington the composer and Duke Ellington the pianist) had an entry that had to be collapsed into a “Master Talent” record. (The roles are preserved, so users can still do complex searching and filtering on them.) The LCNAF number for each Master Talent record was used to automatically harvest data from other databases and to bring in the corresponding URIs from other Linked Data sources.

This provides several specific advantages for DAHR users. For one, it can help contextualize our data through the inclusion of biographies and other data. It also positively identifies a person or group in our database that may be difficult to distinguish by name alone through links to other known authorities. Researchers can also click links to other databases that have information outside the scope of DAHR, such as LP reissues in Discogs, record reviews on Allmusic, or streaming audio on Spotify.

For catalogers, you can also traverse across the datasphere to almost any library authority file in the world through the VIAF link. If DAHR has an artist's VIAF number (and it does for over 18,000 names), you can determine who this person is in the authority files used by nearly all other libraries around the world.

Finally, making use of Linked Data in this way establishes DAHR as its own formal and specialized authority file which we will be maintaining. There is now a DAHR Name URI, which is a permanent numerical identifier for a name entity in DAHR. We have an established Wikidata property that we will be populating with these identifiers to complete the linkages of the Semantic Web and state the relationship between DAHR and many other databases. And for the other 42,000 names not linked to other datasets, our new URI can be used to cite DAHR’s authority file. As research continues on these people or groups, they will get pulled into the web too as Wikipedia articles are written or their names are added to other data sets.

We look forward to your feedback on these new features. Send us a message.

(UCSB is grateful to the Library of Congress National Recording Preservation Board for funding this project.)