Metadata (Coursera, Fall 2013)

From Whiki
Jump to navigation Jump to search

Metadata Coursera Taught by Jeffery Pomerantz

Unit 1: Organizing Information

metadata -- "data about data"; description

world divided into natural and artificial objects; physical and digital

describing: make a statement about something -- subject, object, and predicate (relationship between subject and object)

data and information are not interchangeable terms

metadata is description

instructions are not necessarily descriptive

what is descripton?

access points to materials: title, author, subjects

administrative metadata: how to manage or care for something

subject analysis: figuring out the subject ("significant characteristics") of the thing you're describing

  • how to describe something that doesn't have a subject, like music?

aboutness: word used sometimes instead of "subject"

item: single object

collection: collection of objects

LCSH: Library of Congress Subject Headings; data about subject headings on copyright page; attempts to be comprehensive; changes over time; thesaurus or controlled vocabulary; includes relationship but not synonymy and antonymy

subject headings, index term, descriptor: all mean "subject"

LOC classification used outside the US; call number of books

medical subject headings (MeSH): used in medicine

BT: broader term

NT: narrower term

UF: "use for"

USE: "use"

faceted classification: can describe using multiple controlled vocabularies

ontology: formal representation of a set concept within a domain; defining categories and relationships (including inferences -- one relationship implies another, as in parent-child)

relationships are more complicated in ontologies than in thesauri -- is more than a thesaurus, is also about relationships and inferences

uncontrolled vocabulary: no thesaurus exists

hashtags: ride the line between metadata and content itself

  1. tagdef, defining hashtags

vocabularies as maps -- simply the world

Alfred Korzybski: "The map is not the territory." -- but the map is more useful under certain conditions

types of metadata:

  • descriptive: information about a resource
  • structural: how an object is organized (often used for compound objects, like a book [chapters, sections pages])
  • administrative: how an object should be stored or cared for (copyright, access permission, origin)

distinctions in metadata record:

  • item vs collection
  • embedded vs linked metadata records; copyright page in printed book is embedded metadata; library card catalogue is linked metadata
  • human-readable vs machine-readable audience; MARC records, machine-readable cataloguing

'data' has flexible definition

information science: intersection of information, technology, people

what is information?

"Where is the wisdom we have lost in knowledge / Where is the knowledge we have lost in information?" -- T. S. Eliot, "The Rock"

"Information as Thing," by Michael Buckland

  • three types of information: information-as-thing; information-as-knowledge; information-as-process
  • information as thing is evidence

is information subjective objective?

  • DNA example

Michael Buckland, "What is a Document?"

  • antelope is not a document, but becomes a document when the subject of research

perception generates metadata

Gregory Bateson, information is "a difference that makes a difference"

Unit 2: Dublin Core

Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption

goals: simple, shared semantics, extensible, international

characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order

15 elements of DC:

  • contributor
  • coverage
  • creator
  • date
  • description
  • format
  • identifier
  • language
  • publisher
  • relation
  • rights
  • source
  • subject
  • title
  • type

elements: category of statement (like 'creator'); also "field"

value: data provided in the statement (like 'William Shakespeare')

record: set of element-value pairs

metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource."

principles of DC:

  • "dumb-down principle": if an element is not relevant, don't use it
  • "one-to-one": one record per object

HTML "meta" tag: name = element, content = value

"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML

qualified DC -- modifying; through element refinement (adding "created" to date -- or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment")

DCMI communities: working on extending DC for their schemas

terms -- expand on the basic 15 core elements

DCMI Abstract Model

  • independent of any particular encoding syntax
  • shows all the things needed to be included in any metadata scheme
  • i.e. written to be generic model, but is model upon which DC is built
  • resource model
    • diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using"
    • property-value pair has both property (element) and value, which can be literal and/or non-literal
  • element-value or property-value pairs make up a statement
  • how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded
  • models are way of determining basic ontological categories that can be used for ny metadata schemes

Namespace -- conceptual space in which a set of identifiers are defined

four levels of interoperability

  • level 1: use shared term definitions
  • level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF
  • level 3: use shared formal vocbularies in exchangeable records
  • level 4: shared formal vocabularies and constraints in records
  • each level leans more heavily on RDF -- more readable by machines, less readable by humans

Unit 3: How to Build a Metadata Schema

eXtensible Markup Language

Document Type Definition (DTD)

Resource Description Framework (RDF)


HTML is metadata, in that it describes the formatting of the page

XML provides information about the structure of the document -- gives metadata about the content

XML, like DC, made up of elements and values

elements can have child elements -- data structure is a tree

elements can have attributes -- so that child elements become attributes of parent

e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient>

in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want

thus, must create a DTD to declare elements and attributes

<?xml version="1.0" encoding="UTF-8"?>

possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD

child elements declared at top usually, parent at bottom

<!ELEMENT recipe (child elements, child elements,)>

? = 0 or 1

  • = 0+

+ = 1+

  1. PCDATA -- parsed character data; any block of text can be used

HTML5 does not have a DTD

entity declaration ex. -- <!ENTITY % fontstyle "TT | I | B | BIG | SMALL">

attribut list -- <!ATTLIST

CDATA -- character data

RDF -- Resource Description Framework; data model for describing resources

resource: anything with an address on a network

descriptive metadata is made up of statements describing the object or resource

triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting

can create complex networks of triples

RDF file: declare xml; declare rdf and two namespaces; describe object

RDFS: RDF syntax

DC doesn't care what your container element is

prefix:element; e.g. "dc:subject"

HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material

XML --> RDF --> DTDs & Namespaces --> Metadata schemes

Unit 4: Alphabet Soup

descriptive metadata

structural & administrative metadata


descriptive example:

Categories for Description of Works of Art (CDWA)


administrative metadata:

PREMIS; core set of elements for the preservation of digital objects

preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity

PREMIS doesn't describe intellectual entities, but objects that instantiate them

provenance: record describing entities and processes involved in producing and delivering that resource

OPM: Open Provenance Model

METS: Metadata Encoding and Transmission Standard; metadata about metadata

crosswalks: translate between metadata schemas

Unit 5: Metadata for the Web