Metadata (Coursera, Fall 2013)
Metadata Coursera Taught by Jeffery Pomerantz
Unit 1: Organizing Information
metadata -- "data about data"; description
world divided into natural and artificial objects; physical and digital
describing: make a statement about something -- subject, object, and predicate (relationship between subject and object)
data and information are not interchangeable terms
metadata is description
instructions are not necessarily descriptive
what is descripton?
access points to materials: title, author, subjects
administrative metadata: how to manage or care for something
subject analysis: figuring out the subject ("significant characteristics") of the thing you're describing
- how to describe something that doesn't have a subject, like music?
aboutness: word used sometimes instead of "subject"
item: single object
collection: collection of objects
LCSH: Library of Congress Subject Headings; data about subject headings on copyright page; attempts to be comprehensive; changes over time; thesaurus or controlled vocabulary; includes relationship but not synonymy and antonymy
subject headings, index term, descriptor: all mean "subject"
LOC classification used outside the US; call number of books
medical subject headings (MeSH): used in medicine
BT: broader term
NT: narrower term
UF: "use for"
USE: "use"
faceted classification: can describe using multiple controlled vocabularies
ontology: formal representation of a set concept within a domain; defining categories and relationships (including inferences -- one relationship implies another, as in parent-child)
relationships are more complicated in ontologies than in thesauri -- is more than a thesaurus, is also about relationships and inferences
uncontrolled vocabulary: no thesaurus exists
hashtags: ride the line between metadata and content itself
- tagdef, defining hashtags
vocabularies as maps -- simply the world
Alfred Korzybski: "The map is not the territory." -- but the map is more useful under certain conditions
types of metadata:
- descriptive: information about a resource
- structural: how an object is organized (often used for compound objects, like a book [chapters, sections pages])
- administrative: how an object should be stored or cared for (copyright, access permission, origin)
distinctions in metadata record:
- item vs collection
- embedded vs linked metadata records; copyright page in printed book is embedded metadata; library card catalogue is linked metadata
- human-readable vs machine-readable audience; MARC records, machine-readable cataloguing
'data' has flexible definition
information science: intersection of information, technology, people
what is information?
"Where is the wisdom we have lost in knowledge / Where is the knowledge we have lost in information?" -- T. S. Eliot, "The Rock"
"Information as Thing," by Michael Buckland
- three types of information: information-as-thing; information-as-knowledge; information-as-process
- information as thing is evidence
is information subjective objective?
- DNA example
Michael Buckland, "What is a Document?"
- antelope is not a document, but becomes a document when the subject of research
perception generates metadata
Gregory Bateson, information is "a difference that makes a difference"
Unit 2: Dublin Core
Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption
goals: simple, shared semantics, extensible, international
characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order
15 elements of DC:
- contributor
- coverage
- creator
- date
- description
- format
- identifier
- language
- publisher
- relation
- rights
- source
- subject
- title
- type
elements: category of statement (like 'creator'); also "field"
value: data provided in the statement (like 'William Shakespeare')
record: set of element-value pairs
metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource."
principles of DC:
- "dumb-down principle": if an element is not relevant, don't use it
- "one-to-one": one record per object
HTML "meta" tag: name = element, content = value
"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML
qualified DC -- modifying; through element refinement (adding "created" to date -- or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment")
DCMI communities: working on extending DC for their schemas
terms -- expand on the basic 15 core elements
DCMI Abstract Model
- independent of any particular encoding syntax
- shows all the things needed to be included in any metadata scheme
- i.e. written to be generic model, but is model upon which DC is built
- resource model
- diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using"
- property-value pair has both property (element) and value, which can be literal and/or non-literal
- element-value or property-value pairs make up a statement
- how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded
- models are way of determining basic ontological categories that can be used for ny metadata schemes
Namespace -- conceptual space in which a set of identifiers are defined
four levels of interoperability
- level 1: use shared term definitions
- level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF
- level 3: use shared formal vocbularies in exchangeable records
- level 4: shared formal vocabularies and constraints in records
- each level leans more heavily on RDF -- more readable by machines, less readable by humans
Unit 3: How to Build a Metadata Schema
eXtensible Markup Language
Document Type Definition (DTD)
Resource Description Framework (RDF)
HTML is metadata, in that it describes the formatting of the page
XML provides information about the structure of the document -- gives metadata about the content
XML, like DC, made up of elements and values
elements can have child elements -- data structure is a tree
elements can have attributes -- so that child elements become attributes of parent
e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient>
in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want
thus, must create a DTD to declare elements and attributes
<?xml version="1.0" encoding="UTF-8"?>
possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD
child elements declared at top usually, parent at bottom
<!ELEMENT recipe (child elements, child elements,)>
? = 0 or 1
- = 0+
+ = 1+
- PCDATA -- parsed character data; any block of text can be used
HTML5 does not have a DTD
entity declaration ex. -- <!ENTITY % fontstyle "TT | I | B | BIG | SMALL">
attribut list -- <!ATTLIST
CDATA -- character data
RDF -- Resource Description Framework; data model for describing resources
resource: anything with an address on a network
descriptive metadata is made up of statements describing the object or resource
triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting
can create complex networks of triples
RDF file: declare xml; declare rdf and two namespaces; describe object
RDFS: RDF syntax
DC doesn't care what your container element is
prefix:element; e.g. "dc:subject"
HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material
XML --> RDF --> DTDs & Namespaces --> Metadata schemes
Unit 4: Alphabet Soup
descriptive metadata
structural & administrative metadata
descriptive example:
Categories for Description of Works of Art (CDWA)
administrative metadata:
PREMIS; core set of elements for the preservation of digital objects
preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity
PREMIS doesn't describe intellectual entities, but objects that instantiate them
provenance: record describing entities and processes involved in producing and delivering that resource
OPM: Open Provenance Model
METS: Metadata Encoding and Transmission Standard; metadata about metadata
crosswalks: translate between metadata schemas