Organising your data and archiving

Working with data

Organising data correctly can be something that can easily get overlooked when ideas are flowing. However, organising your data will make it easier to access when needed, for both you and others on your project.

As well as being clear where you will store the data, agreeing how data will be catalogued early on will make things easier long term and prevent duplication. Even if someone leaves the project, you’ll be able to retrieve the data you need with ease.

Records Management have provided some good practice guides for organising your data, including version control to help you avoid confusion and ensure everyone is working from the same document. The filing systems and naming conventions document contains some useful guidance on creating a comprehensive filing system. Further information can be found from JISC on choosing a file name that is compatible with different operating systems.

Whilst actively working with a dataset, you should use a file format that best suits the way you work. This will be dictated by the software that you prefer to use and the formats used in your particular discipline.  Where you have flexibility, in that your software supports several formats or you are writing your own software, it is best practice to use archive-suitable formats (see ‘Ready to archive data?’ below).

Best practice whilst collating and working with data is to agree on an anonymisation protocol amongst your colleagues and collaborators. This will avoid additional time and expense later on near the end of a project. Anonymisation is very important when working with personal and sensitive data.

Have you ever lost your data?

How to avoid a data management nightmare

Further guidance from UK Data Archive


What to Keep

It is important to be aware that you do not need to archive everything. Preserving data costs time and money, as does reproducing it. Careful consideration of which can be kept and which can be discarded is essential.

You will need to consider the following:-

  • what is needed to validate findings in your thesis/publications?
  • what might others conceivably find useful?
  • how expensive will it be to reproduce?
  • how expensive will it be to preserve?
  • are you obliged to destroy anything?

How-to guide "Five steps to decide what data to keep" (DCC)

Selecting what data to keep and what to bin

University of Cambridge FAQ about Data Formats


Ready to archive data?

Finished working with a particular dataset? Then you should transform it to a more stable, standard format for archive. A common problem is finding old files which are unreadable now, because the software that created them is no longer available.

Your archival format should be at least one of the following:

  • readable using free tools (ideally plain text): so it can be accessed without a potentially-expensive license
  • a well-documented standard: so a wide variety of software is available to access it
  • a de facto standard in your research area: so the majority of researchers you share it with can be expected to have access to the right software

The UK Data Archive provides guidance about the quality assurance of data


Metadata & Keywords

When depositing your data into a repository, ensure that you complete the necessary fields. This will provide the minimum standard metadata for your record and aids discoverability.

The Liverpool Data Catalogue and other repositories also include a field to provide keywords, describing your data.  Keywords associated with your data and project are very useful.

To ensure that related datasets are discoverable, add the same relevant keywords to all deposits.

If you have a large project coming to an end and no protocol has been agreed, then adding the project name and any reference number will ensure that all related items (often deposited by different sub groups/ projects and collaborators) can be connected.

Data from large projects may also be deposited elsewhere. In this case, creating a metadata-only record in the Liverpool Data Catalogue, with the appropriate keywords, will also ensure that all related items are discoverable.