← Back Published on

A Research Assistant’s Guide to Metadata Creation

Assignments are based on DataONE Education Modules found here: https://dataoneorg.github.io/Education/.


Audience: This will be a hands-on and informative presentation for new undergraduate and graduate research assistants working on metadata creation and description for research data.

Script:

The Scope of today’s agenda includes…

  • what metadata is.
  • Illustrate the value of metadata to data users, researchers, and the organization.
  • Examine what information should be included in a metadata record.
  • Provide an overview of the different metadata standards that exist, and the core elements that are consistent across them.
  • Introduces the best practices needed for writing high quality metadata records.
  • And, finally, give a few tips for writing a quality metadata record to get you started.

Learning Objectives

After completing this lesson, participants will be able to:

  • Identify the types of information typically included in metadata records for datasets, to be able to create them yourself.
  • Understand and analyze the value of metadata to data users, researchers, and the organization.
  • Understand and recall uses for metadata.
  • Identify metadata standards and evaluate the most appropriate ones for any given dataset.
  • Prepare to apply what you learned and create a quality metadata record.

So, what is metadata?

Sometimes it is referred to as “data about data.” Metadata is the documentation or recording of data that describes and provides context for the data content. Metadata does not include the full-text of the content itself, just information about the content.

It describes the content, quality, condition, and other characteristics of a dataset. Metadata allows data to be discovered, accessed, reused, and understood. Metadata helps us understand the details of a dataset. By creating and providing good descriptive metadata for our data, we enable others to efficiently discover and use the data products from our research. As stated, good metadata is the key to data sharing and reuse. Many tools such as standards and software are available to help describe data.

Metadata answers the questions…

  • WHO created and collected the data?
  • WHAT is the content of the data? What do the gaps in the data mean?
  • WHEN was the data created?
  • WHERE was the data collected?
  • HOW was the data developed? How should it be attributed?
  • WHY was the data developed?

What about this makes metadata so valuable you ask?

Metadata helps Data creators, Data users, and Organizations.

  • For Creators: it helps to avoid duplication, share reliable information, and promote contributions to their field. Metadata provides accountability. It allows you to repeat a scientific process if: methodologies, variables, and analytical parameters are defined. Metadata allows you to defend your scientific process. The process is documented, and an increasingly data savvy public requires metadata for consumer information.
  • For Users: it makes it easier to find data, easier to evaluate data content and applicability, provides information on how to acquire, process, and use data.
  • For Organizations: it ensures investment in data by allowing later reuse for other purposes, offers data permanence and creates institutional memory, and advertises research. Further, policy decisions based on data can only be defended if the metadata is good quality. Regulatory decisions based on undocumented data are not defensible. Metadata accuracy and details are important as supporting evidence for the science and policy, so controversies arise when metadata is incomplete and/or absent. Creating robust metadata is in your best interest! It will help you find what you need quicker and easier.

Metadata is also useful for:

  • Data distribution, including discovery, catalogs, publication, and data portals. You can enable data discovery based on metadata keywords, geographic location, time period, and other attributes.
  • Data management, including maintenance and updating, accountability, liability, and reuse.
  • Project management, including project planning, monitoring, coordination, and deliverables.

And now it's your turn…What would you want others to know about data you PROVIDE to them? What would you want to know about data you RECEIVE from others?

Here are some examples to get you started.

  • Why was the data created?
  • What does the data mean?
  • What are the gaps in the data?
  • What processes were used in data creation?
  • Is your dataset clear? What do the values in the tables mean?
  • What software is needed to read the data?
  • What projection was used for GIS data?
  • Can these data be given to someone else?
  • How should the data be cited?

This is all to make you think about what is necessary in a record. Record all the information needed for you and others to understand, evaluate, and reuse the data in the future.

Now, Metadata standards: Metadata standards provides a specific structure and consistency to describe data with. Standards and tools vary, as it depends on data type, guidance from your organization, and available resources. Some aspects of standards include…

  • Common terms to allow consistency between records
  • Common definitions for easier interpretation
  • Common language for ease of communication
  • Common structure to quickly locate information

Standards provide:

  • Structure documentation in a reliable and predictable format for computer interpretation.
  • A uniform summary description of the dataset.

A metadata standard is made up of defined elements, including the type of information the user should enter. Examples of these elements include: title, abstract, keywords, usage rights variables, numbers, date.

Metadata Schemas used in our research include Metadata Object Description Schema (MODS), which is very similar to MARC and a more robust version of DublinCore, which we will also use. Dublin Core Metadata Element Set is a set of fifteen core elements for describing resources.

There are also tools that you will be using to clean up data, like OpenRefine and Excel, and using to create metadata like Oxygen XML Editor and DC Assist.

Now let’s get started. What does a metadata record look like?

Each record must have at least the five required metadata fields: title, link to material/resource/image, link or DOI to site, content provider (institution name, author name), rights statement

Further, you will also use our standard terminology to enable discovery (Name Authority Files, controlled vocabularies, subject and genre terms), appropriately describe the extent of the data, and, later, begin to add on to our implemented data dictionary.

Here is an example record for Dublin Core that I created. Note the title, subject terms, web links and resources, contributor, and rights statement fields.

Here are a few tips to remember when creating good quality Metadata and a few tricks you can use when needing help.

  1. Organize the information. Always pay attention to the details.
  2. Write your metadata using a metadata tool. This will help you spot errors.
  3. Review for accuracy and completeness.
  4. Have someone else read your record in a Quality Control check.
  5. Revise the record, based on comments from your reviewer.
  6. Review once more before you publish.

Provide all of the critical information for discovery, understanding, and reuse.

Metadata is developed continuously throughout the entire data lifecycle. Consistency with commonly used fields is key.

Use Titles as they are critical in helping readers find your data. While individuals are searching for the most appropriate datasets, they are most likely going to use the title as the first criteria to determine if a dataset meets their needs.

Use descriptive & clear writing in text based fields. Provide the information exactly as it is in the original document. Don’t use jargon or slang.

Fully document geographic locations if applicable.

Select keywords wisely. Use controlled vocabulary for keywords whenever possible.

Be detailed, but not unnecessarily so.

Considerations

  • Spell out acronyms with first use. Many acronyms have multiple meanings.
  • Use widely known acronyms only when it corresponds to specific metadata fields such as file formats
  • Keep in mind legal considerations for sharing data: Always include a rights statement and creative commons statement.
  • Remember a computer will read your metadata. Avoid using special characters or symbols that could be misinterpreted by software.
  • When copying and pasting from other sources, use a text editor (e.g., Notepad) to eliminate hidden characters.

When doing Quality Control checks for others…

  • look for the necessary information regarding author or institution
  • Ensure correct entities & attributes
  • Data Quality (Accuracy, Consistency, Completeness)
    • Look for complete and consistent information, explanation of limitations to support reuse, persistent identifiers like DOIs

    Next steps

    Be specific and quantify when you can! The goal of a metadata record is to give the user enough information to know if they can use the data without contacting the dataset owner.

    Think… Could someone search for this information and locate the data? Can others assess the usefulness of the data? Could a novice understand it? Is it specific enough? Is there enough here to facilitate re-use? Is the information unambiguous? Is it all explained thoroughly?

    Remember…

    Metadata is documentation of data, not the actual data. A metadata record captures critical information about the content of a dataset.

    Metadata allows data to be discovered, accessed, and re-used. It is a vital part of the data management process. Metadata completes a dataset.

    A metadata standard provides structure and consistency to data documentation.

    Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources.

    Metadata is of critical importance to data developers, data users, and organizations.

    Thank you for coming!