Metadata (Coursera, Fall 2013)

From Whiki
Jump to navigation Jump to search

Metadata Coursera Taught by Jeffery Pomerantz

https://class.coursera.org/metadata-001

Unit 1: Organizing Information

metadata -- "data about data"; description

world divided into natural and artificial objects; physical and digital

describing: make a statement about something -- subject, object, and predicate (relationship between subject and object)

data and information are not interchangeable terms

metadata is description

instructions are not necessarily descriptive

what is descripton?

access points to materials: title, author, subjects

administrative metadata: how to manage or care for something

subject analysis: figuring out the subject ("significant characteristics") of the thing you're describing

  • how to describe something that doesn't have a subject, like music?

aboutness: word used sometimes instead of "subject"

item: single object

collection: collection of objects

LCSH: Library of Congress Subject Headings; data about subject headings on copyright page; attempts to be comprehensive; changes over time; thesaurus or controlled vocabulary; includes relationship but not synonymy and antonymy

subject headings, index term, descriptor: all mean "subject"

LOC classification used outside the US; call number of books

medical subject headings (MeSH): used in medicine

BT: broader term

NT: narrower term

UF: "use for"

USE: "use"

faceted classification: can describe using multiple controlled vocabularies

ontology: formal representation of a set concept within a domain; defining categories and relationships (including inferences -- one relationship implies another, as in parent-child)

relationships are more complicated in ontologies than in thesauri -- is more than a thesaurus, is also about relationships and inferences

uncontrolled vocabulary: no thesaurus exists

hashtags: ride the line between metadata and content itself

  1. tagdef, defining hashtags

vocabularies as maps -- simply the world

Alfred Korzybski: "The map is not the territory." -- but the map is more useful under certain conditions

types of metadata:

  • descriptive: information about a resource
  • structural: how an object is organized (often used for compound objects, like a book [chapters, sections pages])
  • administrative: how an object should be stored or cared for (copyright, access permission, origin)

distinctions in metadata record:

  • item vs collection
  • embedded vs linked metadata records; copyright page in printed book is embedded metadata; library card catalogue is linked metadata
  • human-readable vs machine-readable audience; MARC records, machine-readable cataloguing

'data' has flexible definition

information science: intersection of information, technology, people

what is information?

"Where is the wisdom we have lost in knowledge / Where is the knowledge we have lost in information?" -- T. S. Eliot, "The Rock"

"Information as Thing," by Michael Buckland

  • three types of information: information-as-thing; information-as-knowledge; information-as-process
  • information as thing is evidence

is information subjective objective?

  • DNA example

Michael Buckland, "What is a Document?"

  • antelope is not a document, but becomes a document when the subject of research

perception generates metadata

Gregory Bateson, information is "a difference that makes a difference"

Unit 2: Dublin Core

Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption

goals: simple, shared semantics, extensible, international

characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order

15 elements of DC:

  • contributor
  • coverage
  • creator
  • date
  • description
  • format
  • identifier
  • language
  • publisher
  • relation
  • rights
  • source
  • subject
  • title
  • type

elements: category of statement (like 'creator'); also "field"

value: data provided in the statement (like 'William Shakespeare')

record: set of element-value pairs

metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource."

principles of DC:

  • "dumb-down principle": if an element is not relevant, don't use it
  • "one-to-one": one record per object

HTML "meta" tag: name = element, content = value

"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML

qualified DC -- modifying; through element refinement (adding "created" to date -- dc.date.created) or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment")

DCMI communities: working on extending DC for their schemas

terms -- expand on the basic 15 core elements

DCMI Abstract Model

  • independent of any particular encoding syntax
  • shows all the things needed to be included in any metadata scheme
  • i.e. written to be generic model, but is model upon which DC is built
  • resource model
    • diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using"
    • property-value pair has both property (element) and value, which can be literal and/or non-literal
  • element-value or property-value pairs make up a statement
  • how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded
  • models are way of determining basic ontological categories that can be used for ny metadata schemes

Namespace -- conceptual space in which a set of identifiers are defined

four levels of interoperability

  • level 1: use shared term definitions
  • level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF
  • level 3: use shared formal vocbularies in exchangeable records
  • level 4: shared formal vocabularies and constraints in records
  • each level leans more heavily on RDF -- more readable by machines, less readable by humans

Unit 3: How to Build a Metadata Schema

eXtensible Markup Language

Document Type Definition (DTD)

Resource Description Framework (RDF)

HTML in XML --> XHTML

HTML is metadata, in that it describes the formatting of the page

XML provides information about the structure of the document -- gives metadata about the content

XML, like DC, made up of elements and values

elements can have child elements -- data structure is a tree

elements can have attributes -- so that child elements become attributes of parent

e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient>

in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want

thus, must create a DTD to declare elements and attributes

<?xml version="1.0" encoding="UTF-8"?>

possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD

child elements declared at top usually, parent at bottom

<!ELEMENT recipe (child elements, child elements,)>

? = 0 or 1

  • = 0+

+ = 1+

  1. PCDATA -- parsed character data; any block of text can be used

HTML5 does not have a DTD

entity declaration ex. -- <!ENTITY % fontstyle "TT | I | B | BIG | SMALL">

attribut list -- <!ATTLIST

CDATA -- character data

RDF -- Resource Description Framework; data model for describing resources

resource: anything with an address on a network

descriptive metadata is made up of statements describing the object or resource

triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting

can create complex networks of triples

RDF file: declare xml; declare rdf and two namespaces; describe object

RDFS: RDF syntax

DC doesn't care what your container element is

prefix:element; e.g. "dc:subject"

HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material

XML --> RDF --> DTDs & Namespaces --> Metadata schemes

Unit 4: Alphabet Soup

descriptive metadata

structural & administrative metadata

crosswalks

descriptive example:

Categories for Description of Works of Art (CDWA)

EXIF

administrative metadata:

PREMIS; core set of elements for the preservation of digital objects

preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity

PREMIS doesn't describe intellectual entities, but objects that instantiate them

provenance: record describing entities and processes involved in producing and delivering that resource

OPM: Open Provenance Model

METS: Metadata Encoding and Transmission Standard; metadata about metadata

crosswalks: translate between metadata schemas

Unit 5: Metadata for the Web