Metadata (Coursera, Fall 2013): Difference between revisions
(→Unit 1) |
|||
(6 intermediate revisions by the same user not shown) | |||
Line 97: | Line 97: | ||
Gregory Bateson, information is "a difference that makes a difference" | Gregory Bateson, information is "a difference that makes a difference" | ||
== Unit 2: Dublin Core == | |||
Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption | |||
goals: simple, shared semantics, extensible, international | |||
characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order | |||
15 elements of DC: | |||
* contributor | |||
* coverage | |||
* creator | |||
* date | |||
* description | |||
* format | |||
* identifier | |||
* language | |||
* publisher | |||
* relation | |||
* rights | |||
* source | |||
* subject | |||
* title | |||
* type | |||
elements: category of statement (like 'creator'); also "field" | |||
value: data provided in the statement (like 'William Shakespeare') | |||
record: set of element-value pairs | |||
metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource." | |||
principles of DC: | |||
* "dumb-down principle": if an element is not relevant, don't use it | |||
* "one-to-one": one record per object | |||
HTML "meta" tag: name = element, content = value | |||
"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML | |||
qualified DC -- modifying; through element refinement (adding "created" to date -- dc.date.created) or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment") | |||
DCMI communities: working on extending DC for their schemas | |||
terms -- expand on the basic 15 core elements | |||
DCMI Abstract Model | |||
* independent of any particular encoding syntax | |||
* shows all the things needed to be included in any metadata scheme | |||
* i.e. written to be generic model, but is model upon which DC is built | |||
* resource model | |||
** diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using" | |||
** property-value pair has both property (element) and value, which can be literal and/or non-literal | |||
* element-value or property-value pairs make up a statement | |||
* how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded | |||
* models are way of determining basic ontological categories that can be used for ny metadata schemes | |||
Namespace -- conceptual space in which a set of identifiers are defined | |||
four levels of interoperability | |||
* level 1: use shared term definitions | |||
* level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF | |||
* level 3: use shared formal vocbularies in exchangeable records | |||
* level 4: shared formal vocabularies and constraints in records | |||
* each level leans more heavily on RDF -- more readable by machines, less readable by humans | |||
== Unit 3: How to Build a Metadata Schema == | |||
eXtensible Markup Language | |||
Document Type Definition (DTD) | |||
Resource Description Framework (RDF) | |||
HTML in XML --> XHTML | |||
HTML is metadata, in that it describes the formatting of the page | |||
XML provides information about the structure of the document -- gives metadata about the content | |||
XML, like DC, made up of elements and values | |||
elements can have child elements -- data structure is a tree | |||
elements can have attributes -- so that child elements become attributes of parent | |||
e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient> | |||
in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want | |||
thus, must create a DTD to declare elements and attributes | |||
<?xml version="1.0" encoding="UTF-8"?> | |||
possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD | |||
child elements declared at top usually, parent at bottom | |||
<!ELEMENT recipe (child elements, child elements,)> | |||
? = 0 or 1 | |||
* = 0+ | |||
+ = 1+ | |||
#PCDATA -- parsed character data; any block of text can be used | |||
HTML5 does not have a DTD | |||
entity declaration ex. -- <!ENTITY % fontstyle | |||
"TT | I | B | BIG | SMALL"> | |||
attribut list -- <!ATTLIST | |||
CDATA -- character data | |||
RDF -- Resource Description Framework; data model for describing resources | |||
resource: anything with an address on a network | |||
descriptive metadata is made up of statements describing the object or resource | |||
triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting | |||
can create complex networks of triples | |||
RDF file: | |||
declare xml; declare rdf and two namespaces; describe object | |||
RDFS: RDF syntax | |||
DC doesn't care what your container element is | |||
prefix:element; e.g. "dc:subject" | |||
HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material | |||
XML --> RDF --> DTDs & Namespaces --> Metadata schemes | |||
== Unit 4: Alphabet Soup == | |||
descriptive metadata | |||
structural & administrative metadata | |||
crosswalks | |||
descriptive example: | |||
Categories for Description of Works of Art (CDWA) | |||
EXIF | |||
administrative metadata: | |||
PREMIS; core set of elements for the preservation of digital objects | |||
preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity | |||
PREMIS doesn't describe intellectual entities, but objects that instantiate them | |||
provenance: record describing entities and processes involved in producing and delivering that resource | |||
OPM: Open Provenance Model | |||
METS: Metadata Encoding and Transmission Standard; metadata about metadata | |||
crosswalks: translate between metadata schemas | |||
== Unit 5: Metadata for the Web == |
Latest revision as of 15:43, 16 February 2014
Metadata Coursera Taught by Jeffery Pomerantz
https://class.coursera.org/metadata-001
Unit 1: Organizing Information
metadata -- "data about data"; description
world divided into natural and artificial objects; physical and digital
describing: make a statement about something -- subject, object, and predicate (relationship between subject and object)
data and information are not interchangeable terms
metadata is description
instructions are not necessarily descriptive
what is descripton?
access points to materials: title, author, subjects
administrative metadata: how to manage or care for something
subject analysis: figuring out the subject ("significant characteristics") of the thing you're describing
- how to describe something that doesn't have a subject, like music?
aboutness: word used sometimes instead of "subject"
item: single object
collection: collection of objects
LCSH: Library of Congress Subject Headings; data about subject headings on copyright page; attempts to be comprehensive; changes over time; thesaurus or controlled vocabulary; includes relationship but not synonymy and antonymy
subject headings, index term, descriptor: all mean "subject"
LOC classification used outside the US; call number of books
medical subject headings (MeSH): used in medicine
BT: broader term
NT: narrower term
UF: "use for"
USE: "use"
faceted classification: can describe using multiple controlled vocabularies
ontology: formal representation of a set concept within a domain; defining categories and relationships (including inferences -- one relationship implies another, as in parent-child)
relationships are more complicated in ontologies than in thesauri -- is more than a thesaurus, is also about relationships and inferences
uncontrolled vocabulary: no thesaurus exists
hashtags: ride the line between metadata and content itself
- tagdef, defining hashtags
vocabularies as maps -- simply the world
Alfred Korzybski: "The map is not the territory." -- but the map is more useful under certain conditions
types of metadata:
- descriptive: information about a resource
- structural: how an object is organized (often used for compound objects, like a book [chapters, sections pages])
- administrative: how an object should be stored or cared for (copyright, access permission, origin)
distinctions in metadata record:
- item vs collection
- embedded vs linked metadata records; copyright page in printed book is embedded metadata; library card catalogue is linked metadata
- human-readable vs machine-readable audience; MARC records, machine-readable cataloguing
'data' has flexible definition
information science: intersection of information, technology, people
what is information?
"Where is the wisdom we have lost in knowledge / Where is the knowledge we have lost in information?" -- T. S. Eliot, "The Rock"
"Information as Thing," by Michael Buckland
- three types of information: information-as-thing; information-as-knowledge; information-as-process
- information as thing is evidence
is information subjective objective?
- DNA example
Michael Buckland, "What is a Document?"
- antelope is not a document, but becomes a document when the subject of research
perception generates metadata
Gregory Bateson, information is "a difference that makes a difference"
Unit 2: Dublin Core
Dublin Core; named after Dublin, OH, where OCLC is; first developed in 1995; developed to be simple and have a low cost of adoption
goals: simple, shared semantics, extensible, international
characteristics of a DC record: all elements are option; all elements are repeatable; elements may be displayed in any order
15 elements of DC:
- contributor
- coverage
- creator
- date
- description
- format
- identifier
- language
- publisher
- relation
- rights
- source
- subject
- title
- type
elements: category of statement (like 'creator'); also "field"
value: data provided in the statement (like 'William Shakespeare')
record: set of element-value pairs
metadata scheme: controls the kinds of statements you can make; it is a "formally efined set of metadata elements. The meaning (semantics) of the elements are pre-defined, constraining the kinds of statements that can be made about a resource."
principles of DC:
- "dumb-down principle": if an element is not relevant, don't use it
- "one-to-one": one record per object
HTML "meta" tag: name = element, content = value
"DC.element", i.e. "DC.creator" -- standard method of representing DC metadata in meta tag in HTML
qualified DC -- modifying; through element refinement (adding "created" to date -- dc.date.created) or encoding schemes (add "scheme" to meta tag; i.e. name="dc.subject scheme="MESH" content="Posterior Eye Segment")
DCMI communities: working on extending DC for their schemas
terms -- expand on the basic 15 core elements
DCMI Abstract Model
- independent of any particular encoding syntax
- shows all the things needed to be included in any metadata scheme
- i.e. written to be generic model, but is model upon which DC is built
- resource model
- diamond arrows, "has a," regular triangle arrow, "is a," line arrow, "described using"
- property-value pair has both property (element) and value, which can be literal and/or non-literal
- element-value or property-value pairs make up a statement
- how resources are described; description is encoded in a vocabulary; how terms in a vocabulary are encoded
- models are way of determining basic ontological categories that can be used for ny metadata schemes
Namespace -- conceptual space in which a set of identifiers are defined
four levels of interoperability
- level 1: use shared term definitions
- level 2: use shared vocabularies based on formal semantics -- implicit or explicit use of RDF
- level 3: use shared formal vocbularies in exchangeable records
- level 4: shared formal vocabularies and constraints in records
- each level leans more heavily on RDF -- more readable by machines, less readable by humans
Unit 3: How to Build a Metadata Schema
eXtensible Markup Language
Document Type Definition (DTD)
Resource Description Framework (RDF)
HTML in XML --> XHTML
HTML is metadata, in that it describes the formatting of the page
XML provides information about the structure of the document -- gives metadata about the content
XML, like DC, made up of elements and values
elements can have child elements -- data structure is a tree
elements can have attributes -- so that child elements become attributes of parent
e.g.: <ingredient><fooditem="milk"></fooditem></ingredient> vs. <ingredient fooditem="milk"></ingredient>
in HTML, all elements and attributes are predefined; in XML, you can create any elements or attributes you want
thus, must create a DTD to declare elements and attributes
<?xml version="1.0" encoding="UTF-8"?>
possible to put DTD in XML file itself, however usually you'll be using a schema that already exists, so you point elsewhere to its DTD
child elements declared at top usually, parent at bottom
<!ELEMENT recipe (child elements, child elements,)>
? = 0 or 1
- = 0+
+ = 1+
- PCDATA -- parsed character data; any block of text can be used
HTML5 does not have a DTD
entity declaration ex. -- <!ENTITY % fontstyle "TT | I | B | BIG | SMALL">
attribut list -- <!ATTLIST
CDATA -- character data
RDF -- Resource Description Framework; data model for describing resources
resource: anything with an address on a network
descriptive metadata is made up of statements describing the object or resource
triple: subject (value) - predicate - object (resource); e.g. leonardo da vinci - creator - Mona Lisa painting
can create complex networks of triples
RDF file: declare xml; declare rdf and two namespaces; describe object
RDFS: RDF syntax
DC doesn't care what your container element is
prefix:element; e.g. "dc:subject"
HTML is for human consumption; goal of semantic web is enable automated algorithms to sort material
XML --> RDF --> DTDs & Namespaces --> Metadata schemes
Unit 4: Alphabet Soup
descriptive metadata
structural & administrative metadata
crosswalks
descriptive example:
Categories for Description of Works of Art (CDWA)
EXIF
administrative metadata:
PREMIS; core set of elements for the preservation of digital objects
preservation metadata is "the information a repository uses to support the digital preservation process"; requires viability, renderability, understandability, authenticity, identity
PREMIS doesn't describe intellectual entities, but objects that instantiate them
provenance: record describing entities and processes involved in producing and delivering that resource
OPM: Open Provenance Model
METS: Metadata Encoding and Transmission Standard; metadata about metadata
crosswalks: translate between metadata schemas