A readme file provides information about a data file and is intended to help ensure that the data can be correctly interpreted, by yourself at a later date or by others when sharing or publishing data. Standards-based metadata is generally preferable, but where no appropriate standard exists, for internal use, writing “readme” style metadata is an appropriate strategy.
Github Readme Formatting
Want a template? Download one and adapt it for your own data!
- Note that length of the README, or graphics, did not determine the quality of a README - the most important characteristic was that all conclusions were justified. README 1 This was a good readme because it very quickly jumped past details (such as how to compile/call the program, or lengthy introductions of known materials, like how an AVL.
- ## Blockquotes Markdown is a lightweight markup language with plain-text-formatting syntax, created in 2004 by John Gruber with Aaron Swartz. Markdown is often used to format readme files, for writing messages in online discussion forums, and to create rich text using a plain text editor.
- Recommended content
StackEdit can sync your files with Google Drive, Dropbox and GitHub. It can also publish them as blog posts to Blogger, WordPress and Zendesk. You can choose whether to upload in Markdown format, HTML, or to format the output using the Handlebars template engine.
Best practices
Create readme files for logical 'clusters' of data. In many cases it will be appropriate to create one document for a dataset that has multiple, related, similarly formatted files, or files that are logically grouped together for use (e.g. a collection of Matlab scripts). Sometimes it may make sense to create a readme for a single data file.
Name the readme so that it is easily associated with the data file(s) it describes.
Write your readme document as a plain text file, avoiding proprietary formats such as MS Word whenever possible. Format the readme document so it is easy to understand (e.g. separate important pieces of information with blank lines, rather than having all the information in one long paragraph).
Format multiple readme files identically. Present the information in the same order, using the same terminology.
Use standardized date formats. Suggested format: W3C/ISO 8601 date standard, which specifies the international standard notation of YYYY-MM-DD or YYYY-MM-DDThh:mm:ss.
Follow the scientific conventions for your discipline for taxonomic, geospatial and geologic names and keywords. Whenever possible, use terms from standardized taxonomies and vocabularies, a few of which are listed below.
Source | Content | URL |
Getty Research Institute Vocabularies | geographic names, art & architecture, cultural objects, artist names | http://www.getty.edu/research/tools/vocabularies/ |
Integrated Taxonomic Information System | taxonomic information on plants, animals, fungi, microbes | http://www.itis.gov/ |
NASA Thesauri | engineering, physics, astronomy, astrophysics, planetary science, Earth sciences, biological sciences | https://www.sti.nasa.gov/nasa-thesaurus/ |
GCMD Keywords | Earth & climate sciences, instruments, sensors, services, data centers, etc. | https://earthdata.nasa.gov/earth-observation-data/find-data/gcmd/gcmd-keywords |
The Gene Ontology Vocabulary | gene product characteristics, gene product annotation | http://amigo.geneontology.org/amigo/dd_browse |
USGS Thesauri | agriculture, forest, fisheries, Earth sciences, life sciences, engineering, planetary sciences, social sciences etc. | https://www1.usgs.gov/csas/biocomplexity_thesaurus/index.html |
IUPAC Gold Book | compendium of chemical terminology from the International Union of Pure and Applied Chemistry (IUPAC) | https://goldbook.iupac.org |
Recommended content
Recommended minimum content for data re-use is in bold.
General information
- Provide a title for the dataset
- Name/institution/address/email information for
- Principal investigator (or person responsible for collecting the data)
- Associate or co-investigators
- Contact person for questions
- Date of data collection (can be a single date, or a range)
- Information about geographic location of data collection
- Keywords used to describe the data topic
- Language information
- Information about funding sources that supported the collection of the data
Data and file overview
- For each filename, a short description of what data it contains
- Format of the file if not obvious from the file name
- If the data set includes multiple files that relate to one another, the relationship between the files or a description of the file structure that holds them (possible terminology might include 'dataset' or 'study' or 'data package')
- Date that the file was created
- Date(s) that the file(s) was updated (versioned) and the nature of the update(s), if applicable
- Information about related data collected but that is not in the described dataset
Sharing and access information
- Licenses or restrictions placed on the data
- Links to publications that cite or use the data
- Links to other publicly accessible locations of the data (see best practices for sharing data for more information about identifying repositories)
- Recommended citation for the data (see best practices for data citation)
Methodological information
- Description of methods for data collection or generation (include links or references to publications or other documentation containing experimental design or protocols used)
- Description of methods used for data processing (describe how the data were generated from the raw or collected data)
- Any software or instrument-specific information needed to understand or interpret the data, including software and hardware version numbers
- Standards and calibration information, if appropriate
- Describe any quality-assurance procedures performed on the data
- Definitions of codes or symbols used to note or characterize low quality/questionable/outliers that people should be aware of
- People involved with sample collection, processing, analysis and/or submission
Data-specific information
*Repeat this section as needed for each dataset (or file, as appropriate)*
- Count of number of variables, and number of cases or rows
- Variable list, including full names and definitions (spell out abbreviated words) of column headings for tabular data
- Units of measurement
- Definitions for codes or symbols used to record missing data
- Specialized formats or other abbreviations used
Want a template? Download one and adapt it for your own data!
References
The preceding guidelines have been adapted from several sources, including:
Best practices for creating reusable data publications. Dryad. 2019. https://datadryad.org/stash/best_practices
Introduction to Ecological Metadata Language (EML). The Knowledge Network for Biocomplexity. 2012. https://web.archive.org/web/20120424124714/http://knb.ecoinformatics.org/eml_metadata_guide.html
Related information
Document and Store Data Using Stable File Formats. DataONE. http://www.dataone.org/best-practices/document-and-store-data-using-stable-file-formats. Useful information about file formats.
File formats. Cornell Research Data Management Service Group. http://data.research.cornell.edu/content/file-formats
File management. Cornell Research Data Management Service Group. http://data.research.cornell.edu/content/file-management
Introduction to Intellectual Property Rights in Data Management. Cornell Research Data Management Service Group. http://data.research.cornell.edu/content/intellectual-property
Metadata and Describing Data. Cornell Research Data Management Service Group. http://data.research.cornell.edu/content/writing-metadata
-->Azure Repos | Azure DevOps Server 2020 | Azure DevOps Server 2019 | TFS 2018 | TFS 2017 | TFS 2015
Your Git repo should have a readme file so that viewers know what your code does and how they can get started using it.Your readme should speak to the following audiences:
- Users that just want to run your code
- Developers that want to build and test your code. Developers are also users.
- Contributors that want to submit changes to your code. Contributors are both developers and users.
Write your readme in Markdown instead of plain text. Markdown makes it easy to format text, include images, and link as needed to additional documentation from your readme.
Here are some great readmes that use this format and speak to all three audiences, for reference and inspiration:
Create an intro
Start your readme off with a short explanation describing your project. Add a screenshot or animated GIF in your intro if your project has a user interface.If your code relies on another application or library, make sure to state those dependencies in the intro or right below it.Apps and tools that run only on specific platforms should have the supported operating system versions noted in this section of the readme.
Help your users get started
Readme Formatting Guide
Guide users through getting your code up and running on their own system in the next section of your readme.Stay focused on the essential steps to get started with your code.Link to the required versions of any prerequisite software so users can get to them easily.If you have complex setup steps, document those outside your readme and link to them.
Point out where to get the latest release of your code. A binary installer or instructions on using your code through packaging tools is best.If your project is a library or an interface to an API, put a code snippet showing basic usage and show sample output for the code in that snippet.
Provide build steps for developers
Use the next section of your readme to show developers how to build your code from a fresh clone of the repo and run any included tests.
Give details about the tools needed to build the code and document the steps to configure them to get a clean build.
Break out dense or complex build instructions into a separate page in your documentation and link to it if needed.
Run through the instructions as you write them in order to verify the instructions would work for a new contributor.
Remember, the developer relying on these instructions could be yourself after not working on a project for some time.
Provide the commands to run any test cases provided with the source code after the build is successful.Developers lean on these test cases to ensure that they don't break your code as they make changes.Good test cases also serve as samples developers can use to build their own test cases when adding new functionality.
Help users contribute
The last section of your readme helps users and developers get involved to report problems and suggest ideas to make your code better.Users should be linked to channels where they can open up bugs, request features, or get help using your code.
Readme Formatting Git
Developers need to know what rules they need to follow to contribute changes, such as coding/testing guidelines and pull request requirements.If you require a contributor agreement to accept pull requests or enforce a community code of conduct, this process should be linked to or documented in this section.State what license the code is released under and link to the full text of the license.