Guide to self-preservation
Learn how to manage your data, keep it accessible and usable, and extend its lifespan for as long as possible.
About the guide
There are many risks that can negatively impact your digital records. These include:
- Software and hardware obsolescence or failure
- File degradation or corruption
- Physical damage to storage media, such as flash drives or hard drives
- Human error, such as accidental deletion of files
Considering these risks, your digital records have an average lifespan of 5-15 years, depending on the software and hardware you choose to maintain your data and a number of other variables. You can help extend the life of your digital records through a variety of general and format specific strategies. Useful strategies for creating and keeping good data are included in this guide.
Who should use this guide?
The Libraries’ guide to self-preservation aims to support your data management at work, at home, and in your research. It may be useful for:
- Researchers who are creating data with the end-goal of depositing it in a repository to comply with requirements set by their funder, publisher, institution, or other party should review General strategies and the Research data section under Content-specific considerations.
- University faculties, offices, departments and units who are creating records that need to be retained for over 5 years, or that are defined as archival in the University’s Common Records Schedule, should review the General strategies section.
- Private donors who are creating records that they intend to transfer to the University of Manitoba Archives & Special Collections should review General strategies and the Personal archiving section under Content-specific considerations.
This guide may be of further interest for general purposes, such as managing and preserving your own personal records.
Storage and back-ups
Consider where all of your records are stored: on your computer, your phone, your tablet, social media accounts, as e-mail attachments, in Dropbox, Google Drive, and other file storage services, or on external media such as flash drives, optical media, or external hard drives.
The copies you maintain and back-up should consolidate important records from across these storage areas. File storage services and social media have changed their policies in the past to limit how much users can upload and have even deleted user content in response to these policy changes.
Third party platforms are not required to maintain your data. External media has a shelf life and will become unreadable and inaccessible over time. Make sure to have a local copy of all of your important or meaningful records and refresh your hardware as needed.
Having copies of your data will ensure you have a back-up in the event of data loss. The 3-2-1 rule is a good approach to follow: Keep 3 copies, on 2 different types of storage media (example: 1 laptop, 2 external hard drives) and store one copy in a separate location.
Copies should be checked routinely, including after files are transferred and/or backed-up. If you are accessing a copy of your files without the intention to edit them, do not save files after viewing them to avoid unintentional changes on these copies. Consider the type of back-up to perform:
- Full back-ups facilitate comprehensive data recovery by creating a copy of all data , but they may require significant storage space.
- Incremental back-ups involve creating a copy of all new and modified data since the last incremental back-up They require less storage space, but a full back-up may still be required periodically to ensure comprehensive data recovery.
- Differential back-ups involve creating a copy of all data since the last Full Backup. They require more storage space than incremental back-ups. For comprehensive data recovery, an initial full back-up and and the last differential back-up is required.
Hardware, software, and format sustainability
Keeping your software and hardware up to date will help keep your data accessible by improving your ability to open and read your files. It also makes your data more easily accessible to others who may require access.
Replace your storage media every 5-7 years to stay ahead of technological changes and keep your data accessible.
Choosing commonly used and accessible formats will help ensure your files remain accessible. Where possible, avoid proprietary formats as copyright protections limit the accessibility of such formats outside of their designed software. Overall, file formats should be non-proprietary and/or commonly used, either more broadly or within a specific discipline or research community. File formats should also be uncompressed and unencrypted.
It’s also important to think about your long-term plans before choosing a format. Lossless formats, such as tiffs are more stable, but can use up more storage space and be less portable, meaning that they are not ideal formats if you intend to share files as e-mail attachments or through other methods. Jpegs are lossy, but are still relatively stable and commonly used and will be better suited for file sharing.
Using software that can be rendered in different operating systems, and that is regularly upgraded to the latest stable version will further ensure that your files remain usable and can be more widely accessed.
Metadata and file naming
Metadata, or data about data, is descriptive information that tells you what a file is and helps you better manage, store, and find your data. It’s important to have a folder system and file naming system that is clear and intuitive and that is applied consistently across your records.
If you are organizing your research files, documentation such as readme files, data dictionaries, and code books can also be a useful method of recording metadata about the research data.
File names should be no longer than 25 characters, and should facilitate file identification and retrieval. How you name your files will also impact the order in which they appear in any given directory, as files typically sort alphabetically by file name.
Avoid non-alphanumeric characters such as punctuation, ampersands or asterisks in your file names. Spaces can also sometimes be problematic in file and folder names. Consider the following alternatives:
- kebab-case: where hyphens are used instead of spaces. Example: file-name
- camelCase: where the first letter of each word, with the exception of the first word, is capitalized. Example: fileName
- snake_case: where underscores are used instead of spaces. Example: file_name
- PascalCase: where the first letter of each word is capitalized. Example: FileName
You can also use a combination of these alternatives to separate values in a file name. For example, if your file name includes the author name and a date, snake case could be used as a separator between name and date, while kebab case could be used as a separator within each separate value. Example: smith-jane_2021-09-28
If you are keeping different versions of a document, you should also make sure the file name clearly expresses the version. Example: fileName-v1
Selection and retention
Periodically clean your files to ensure you’re using your space efficiently. Some questions that can help you decide what should be kept include:
- Is there a business or research purpose that requires me to keep these records?
- Are these records unique or meaningful?
- What resources in terms of time and money would be involved in replacing these files if they were lost?
- Are these files duplicated or available elsewhere?
- Do I have similar content, such an earlier draft of the same record, or multiple photographs of the same thing? If yes, which is the best version to keep?
Keeping everything may cost you more over time in terms of storage costs, and may make it more difficult to find and retrieve relevant data if you are intending to deposit it with an institution for long-term preservation in the future.
Avoid creating dependencies between files that might be lost when transferring records over. For example, if you hyperlink text in one document so that it links to another document on your computer, this link will only be functional on your local computer, and only as long as the files remain in the same location. If you save a copy of those files to a separate storage device, or move the file to a different directory on your computer, these links will break.
As an alternative, use a descriptive hyperlink so that you, or other potential users of your data know how to retrieve the content if the link breaks. For example, include the title of the file and other contextual information so that you or a secondary user can retrieve the file.
Files can be altered inadvertently, both through human error, and through other processes that may be operating in the background without your knowledge. You can monitor these changes in numerous ways.
Anti-virus software will ensure you are alerted of any malicious actions that may cause changes to your files. You can also monitor file integrity by installing software that monitors and reports on data integrity. Free versions of these tools, such as Fixity by AVP, can be installed on your storage devices. Routinely running this software will help alert you to unintentional changes to your files.
Ensuring your data is secure is particularly important for those working with sensitive and/or private data that could cause harm to others if unintentionally released.
Consider data security not only when storing your data on your computer or other storage devices, but also when sharing your data with relevant parties. If multiple people need access to the data for various reasons, make sure their access permissions are specific to their needs. For example, if a person only needs to view a file, make sure they do not have editing permissions.
Encrypting your data with a strong password can also help keep your data secure, but it is important to remember your encryption key to avoid being locked out of your own data.
For University of Manitoba researchers, the Libraries supports an instance of Dataverse, which can be used as a secure file share system, where specific user permissions can be assigned. For the wider University of Manitoba community, the Access and Privacy Office has a Quick Reference guide for Data Sharing and Storage Guidelines.
Alongside general considerations, you should also consider content-specific issues that might impact the lifespan and preservability of your digital records. The resources below may help you to better format your content.
Research data includes raw data, curated data, published data, as well as metadata related to each of these record types.
Depending on your research, the University, funders, publishers, and other parties may have expectations or requirements about how you manage your data across the research lifecycle. Such requirements should be addressed in a Data Management Plan (DMP) and may include:
File and folder organization
Consistent file naming strategies are an essential part of managing and documenting data. These strategies include:
- As part of an data management plan (DMP) and/or at the beginning of the project, spend time considering both the folder hierarchy and file naming conventions for the project. Consider how you or others will look for and access the files at a later date: would you think about them by type, location, study name or something else?
- Create a folder hierarchy that aligns with the project, considering all aspects of the project.
- Develop a file naming scheme that includes important metadata.
- Use a readme file to document the meaning of or abbreviations used in file names, and as a method for easy adoption.
- Check and adopt established file naming conventions. Many disciplines have recommendations.
Documentation and metadata
Good metadata helps you to understand your data in detail and helps other researchers discover, use and properly cite your data.
Your metadata should help you and all those accessing your data understand how the data is organized, how it was generated (including software and equipment used), and how the data has been altered or processed.
Your documentation should provide an explanation of codes, abbreviations, or variables used in the data or in the file naming structure in a readme file or code book that accompanies the data management plan. It's also important to keep notes about sources of data so that you and others can find and cite it. Metadata should be documented in accessible file formats, such as txt or csv files. Some research software applications may auto-create metadata and record it in an xml format.
Some metadata should be applied broadly, while others may be discipline specific. The following are recommended elements for research data across disciplines:
- Title: Name of the project or collection of data
- Creator: Names of the data creator(s) and collector(s)
- Dates: Dates associated with the data such as creation, modification and transfer
Description: An overview of the research project (methodology, instruments, sample(s), validation, etc.)
- Keywords: Keywords that describe the content of the data or datasets
- Identifier(s): Unique alphanumeric codes used to identify data collections
- Location: The physical and digital location(s) of the research data
- Language(s): Languages of the data
- Repository: Identification of the repository where your data is stored or will be stored
- Funding agencies: Agency who has funded the research project
- Access restrictions: Information about who can access the data and at what stage in research project
- Format(s): The format(s) in which your data resides
In addition to these elements, file formats and discipline specific data repositories often dictate which metadata standard is referenced/used. Factors such as data volume, complexity, as well as financial, human, and material resources may also influence which metadata standard is chosen for a research project. Additional data may include data dictionaries, field notes, code/lab books, and record layouts.
Metadata standards vary, but many data repositories, disciplines and organizations have developed specific metadata standards such as:
- Darwin Core for the biological sciences containing data elements specific to the discipline
- QuDEx (Qualitative Data Exchange Format) an exchange model for the archiving and interchange of data
Data storage and back-ups
For storage and processing of your data during the active phase of research, there are several options available via Information Services and Technology's Research Computing.
For secure data capture, the Centre for Healthcare Innovation provides REDCap to meet the need for cloud-based data capture for highly sensitive data related research. The University of Manitoba Libraries supports UM Dataverse, a secure data repository that can be used as a file share system for research teams.
A singular and/or irregular back-up of research data will not satisfy the requirements of a Data Management Plan, nor will it support the recovery of lost or corrupted data. For research data, a reliable back-up strategy involves regularly scheduled full back-ups occurring on a weekly to monthly basis, with daily incremental or differential back-ups. Solutions for backing-up your research data include
- Hard drives (either external to or built into your desktop workstation)
- Institutional server storage
- Office/lab Network Attached Storage (NAS).
For more information, review IST's "Using SharePoint and OneDrive for Research".
Data security is a critical element of any University research project and it is a major component of all Data Management Plans to protect your research from unauthorized access, modification, and disclosure. Researchers should be aware of their institutional policies on data security. More information is available through IST.
For research data, data security happens at three levels:
- Online: Keep confidential data offline. Do not upload sensitive data to cloud storage or file transfer applications. Ensure your research network is secure.
- Hardware: Consider encrypting your data on your workstation, and on other storage devices (at rest and in transit) whenever possible. Implement Two-Factor Authentication for your research desktop(s) and laptop(s). Acknowledge the inherent risks associated with using multiple removable storage media and adapt your security measures accordingly.
- Physical spaces: Restrict or limit access to physical research areas such as offices and labs.
Research data should remain accessible after project completion. Data preservation is different from data storage in that preservation involves the ongoing maintenance of data over time (5 to 10 years). This maintenance includes: fixity checking (ensuring data remains unchanged), data repair when needed, content and system auditing and monitoring, and file format migration.
To ensure long-term access, it is best practice to choose open file formats, whether you intend or are required to make your data open. As file specifications are publicly available, the open-source software community can ensure that data stored in these file formats remain accessible into the future.
CARE principles for Indigenous Data Governance
CARE principles provide guiding principles for research data related to Indigenous Peoples and are intended to complement FAIR principles (see below). Note that data specific to First Nations should comply both with CARE principles and OCAP® principles.
Data curation guides
The Data Curation Network has created a series of peer-reviewed, format-specific guides on how to curate data to ensure it remains usable and understandable over time. These guides cover a wide range of formats and file types used in various subject and disciplinary areas, including:
Data Curation Network CURATED checklist
The Data Curation Network has created a list of steps that help those creating datasets to ensure that their data will remain findable, accessible, interoperable and re-usable over time in accordance with FAIR principles.
Dataverse Curation Guide
The Curation Expert Group's Dataverse Curation Guide Working Group for the New Digital Research Infrastructure Organization has created a guide adapted from the Data Curation Network's CURATED steps. The guide provides instructions on curating data at three different levels: Unmediated, semi-mediated, and mediated curation.
FAIR principles provide guidance on how to improve your data's Findability, Accessibility, Interoperability, and Reusability.
First Nations principles of OCAP®
Research data about or related to First Nations should comply with OCAP® principles of ownership, control, access, and possession.
Get support from the Libraries
The Libraries offers research data management support to help you in this process.
When planning a digitization project, using best practices can help ensure you produce high quality images. These practices include:
- Scanning documents and photographs at 600 dpi
- Scanning negatives, slides, and smaller content at 2400 dpi
- Choosing 24-bit colour
- Formatting digitized images as TIFFs for preservation and/or JPEGs for access. If you are digitizing textual images, you may want to select PDFs so that the content may be OCR’d (optical character recognition), which will allow you to search for key words in the text more easily.
These guidelines are useful when digitizing images for publication purposes, to reduce wear and tear on the records you are digitizing, or for other long-term uses; but there may be cases where scanning at a lower resolution is acceptable depending on the purpose of your digitization work.
Low resolution JPEGs are acceptable for short-term reference or working copies. If digitizing at a lower resolution, you should consider all future uses, as a high resolution image can be converted to a lower resolution later on as needed, but a low resolution image cannot be converted to a higher resolution.
The Library of Congress Personal Archiving Guide includes tips on how to preserve digital photographs, audio, video, e-mail, websites, and other records.
Some file formats better support long-term preservation, or access. Archivematica’s format policies provide a list of preservation and access formats for various types of content such as images, audio, video, and textual records. The formats outlined by Archivematica are widely used in archival institutions to support long-term digital preservation and access of the records they maintain.