Metadata (Coursera, Fall 2013)
Metadata Coursera Taught by Jeffery Pomerantz
https://class.coursera.org/metadata-001
Unit 1: Organizing Information
metadata -- "data about data"; description
world divided into natural and artificial objects; physical and digital
describing: make a statement about something -- subject, object, and predicate (relationship between subject and object)
data and information are not interchangeable terms
metadata is description
instructions are not necessarily descriptive
what is descripton?
access points to materials: title, author, subjects
administrative metadata: how to manage or care for something
subject analysis: figuring out the subject ("significant characteristics") of the thing you're describing
- how to describe something that doesn't have a subject, like music?
aboutness: word used sometimes instead of "subject"
item: single object
collection: collection of objects
LCSH: Library of Congress Subject Headings; data about subject headings on copyright page; attempts to be comprehensive; changes over time; thesaurus or controlled vocabulary; includes relationship but not synonymy and antonymy
subject headings, index term, descriptor: all mean "subject"
LOC classification used outside the US; call number of books
medical subject headings (MeSH): used in medicine
BT: broader term
NT: narrower term
UF: "use for"
USE: "use"
faceted classification: can describe using multiple controlled vocabularies
ontology: formal representation of a set concept within a domain; defining categories and relationships (including inferences -- one relationship implies another, as in parent-child)
relationships are more complicated in ontologies than in thesauri -- is more than a thesaurus, is also about relationships and inferences
uncontrolled vocabulary: no thesaurus exists
hashtags: ride the line between metadata and content itself
- tagdef, defining hashtags
vocabularies as maps -- simply the world
Alfred Korzybski: "The map is not the territory." -- but the map is more useful under certain conditions
types of metadata:
- descriptive: information about a resource
- structural: how an object is organized (often used for compound objects, like a book [chapters, sections pages])
- administrative: how an object should be stored or cared for (copyright, access permission, origin)
distinctions in metadata record:
- item vs collection
- embedded vs linked metadata records; copyright page in printed book is embedded metadata; library card catalogue is linked metadata
- human-readable vs machine-readable audience; MARC records, machine-readable cataloguing
'data' has flexible definition
information science: intersection of information, technology, people
what is information?
"Where is the wisdom we have lost in knowledge / Where is the knowledge we have lost in information?" -- T. S. Eliot, "The Rock"
"Information as Thing," by Michael Buckland
- three types of information: information-as-thing; information-as-knowledge; information-as-process
- information as thing is evidence
is information subjective objective?
- DNA example
Michael Buckland, "What is a Document?"
- antelope is not a document, but becomes a document when the subject of research
perception generates metadata
Gregory Bateson, information is "a difference that makes a difference"
Unit 2: Dublin Core
Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption
goals: simple, shared semantics, extensible, international
characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order
15 elements of DC:
- contributor
- coverage
- creator
- date
- description
- format
- identifier
- language
- publisher
- relation
- rights
- source
- subject
- title
- type
elements: category of statement (like 'creator'); also "field"
value: data provided in the statement (like 'William Shakespeare')
record: set of element-value pairs
metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource."
principles of DC:
- "dumb-down principle": if an element is not relevant, don't use it
- "one-to-one": one record per object
HTML "meta" tag: name = element, content = value
"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML
qualified DC -- modifying; through element refinement (adding "created" to date -- dc.date.created) or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment")
DCMI communities: working on extending DC for their schemas
terms -- expand on the basic 15 core elements
DCMI Abstract Model
- independent of any particular encoding syntax
- shows all the things needed to be included in any metadata scheme
- i.e. written to be generic model, but is model upon which DC is built
- resource model
- diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using"
- property-value pair has both property (element) and value, which can be literal and/or non-literal
- element-value or property-value pairs make up a statement
- how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded
- models are way of determining basic ontological categories that can be used for ny metadata schemes
Namespace -- conceptual space in which a set of identifiers are defined
four levels of interoperability
- level 1: use shared term definitions
- level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF
- level 3: use shared formal vocbularies in exchangeable records
- level 4: shared formal vocabularies and constraints in records
- each level leans more heavily on RDF -- more readable by machines, less readable by humans
Unit 3: How to Build a Metadata Schema
eXtensible Markup Language
Document Type Definition (DTD)
Resource Description Framework (RDF)
HTML in XML --> XHTML
HTML is metadata, in that it describes the formatting of the page
XML provides information about the structure of the document -- gives metadata about the content
XML, like DC, made up of elements and values
elements can have child elements -- data structure is a tree
elements can have attributes -- so that child elements become attributes of parent
e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient>
in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want
thus, must create a DTD to declare elements and attributes
<?xml version="1.0" encoding="UTF-8"?>
possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD
child elements declared at top usually, parent at bottom
<!ELEMENT recipe (child elements, child elements,)>
? = 0 or 1
- = 0+
+ = 1+
- PCDATA -- parsed character data; any block of text can be used
HTML5 does not have a DTD
entity declaration ex. -- <!ENTITY % fontstyle "TT | I | B | BIG | SMALL">
attribut list -- <!ATTLIST
CDATA -- character data
RDF -- Resource Description Framework; data model for describing resources
resource: anything with an address on a network
descriptive metadata is made up of statements describing the object or resource
triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting
can create complex networks of triples
RDF file: declare xml; declare rdf and two namespaces; describe object
RDFS: RDF syntax
DC doesn't care what your container element is
prefix:element; e.g. "dc:subject"
HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material
XML --> RDF --> DTDs & Namespaces --> Metadata schemes
Unit 4: Alphabet Soup
descriptive metadata
structural & administrative metadata
crosswalks
descriptive example:
Categories for Description of Works of Art (CDWA)
EXIF
administrative metadata:
PREMIS; core set of elements for the preservation of digital objects
preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity
PREMIS doesn't describe intellectual entities, but objects that instantiate them
provenance: record describing entities and processes involved in producing and delivering that resource
OPM: Open Provenance Model
METS: Metadata Encoding and Transmission Standard; metadata about metadata
crosswalks: translate between metadata schemas