Data Storage Facilities

The DCCN has many data storage facilities - each with their own advantages and disadvantages. Knowing where data can and should be stored, as well as when it should be stored in certain locations is crucial for being an effective and efficient researcher.

Local Storage

About

Local storage includes the storage on any local devices such the C:/ drive in your DCCN-issued PC or your personal laptop, as well as the D:/ drive in lab computers.

Advantages

Using local storage can be helpful in multiple stages of the research cycle. One common use case is if you have a software package that cannot be downloaded on High Performance Storage - in this case it is best to work with data on the local storage of your personal PC or on the PC issued to you by the Technical Group. Similarly, if you are collecting data in the lab and you are writing data while the experiment is running, you may wish to write the data to the local storage of the lab computer. Thus, some advantages are:

  • Easy to to access and work with data

  • Can change and update software freely

  • Requires no training

Disadvantages

When conducting analyses on your personal PC or the PC issued to you by the Technical Group, you will need to download your research data onto the local storage of your device. In such cases, you MUST already have anonymized data in case of a data leak from your local storage. Also, downloading all of the research data may take a long time depending on the size of the data set you are analyzing. Similarly, such analyses generally can be run much faster on the HPC cluster and may require more RAM (i.e. working memory) than your PC has. Finally, your PC can crash at any moment so all data in local storage can be lost; thus you must constantly re-upload your data to High Performance Storage to mitigate potential data loss. Thus, some disadvantages are:

  • Constant involuntary risk of data loss

  • Potential for privacy breach

  • Downloading data is time-consuming

  • Requires constant re-uploading to mitigate potential data loss

  • Less RAM

  • Less storage space

  • Files are not visible to any other research team members

Ultimately, local storage is risky because data is not backed up anywhere and inefficient for several reasons. Nonetheless, it does have its use-cases though you always must be careful to prevent data loss and breaches in privacy.

High Performance Storage

About

High Performance Storage includes several different drives: most notably the Home drive where your personal files may be kept, the Groupshare drive where your lab group’s shared files may be kept, and the Project drive where your project files (including research data) is kept. High Performance Storage consists of drives such as these, which are mounted on Network PC’s in Trigon such as those in the Instruction and Trainee rooms as well as all Lab PC’s. High Performance Storage is also compatible with the HPC cluster.

Advantages
  • Larger storage space than local storage on PCs.

  • Easily accessible via both Network PC’s and the HPC Cluster

  • Easy to access and work with data

  • Set up to work with parallelization, making analysis many times faster

  • Much more working memory than local storage

Disadvantages
  • Sometimes analysis packages/softwares cannot be user-downloaded

  • Not suitable for long-term storage

  • Can only be accessed by research team members who are checked into the DCCN

High Performance Storage is the workhorse of data analysis at the DCCN: for the vast majority of use cases it is the ideal place to store data that you will analyze since it offers ease-of-access to files and is set up to function with other storage infrastructure. However, due to limited space you cannot leave data on High Performance Storage.

Radboud Data Repository

About

The Radboud Data Repository is where data is backed up. It includes three types of data collections which serve different purposes:

  • Data Acquisition Collections for raw data

  • Research Documentation Collections for scripts and logs outlining your intentions with your analyses

  • Data Sharing Collections for all data and analysis scripts used in creating the results reported in your manuscript

Advantages
  • Can store a lot of data

  • Secure

  • Complies with all funder requirements and privacy laws (compared to non-approved storage solutions)

Disadvantages
  • Cannot read/write files directly

  • Sometimes there are technical difficulties or services are down

The Radboud Data Repository is the DCCN’s vault where data that is no longer being used is stored.

Microsoft Teams

About

Microsoft Teams is a new storage solution adopted by Radboud University. Microsoft Teams is a collaboration platform which also has a storage feature that functions as cloud storage. Unlike the other storage locations, it is not endorsed or supported by the DCCN.

Advantages
  • Offers much more storage than is available with High Performance Storage

  • You can read and write files on teams from local storage

  • External collaborators can read and write files

Disadvantages
  • You may not be in compliance with privacy and security regulations

  • Files are less easily accessible than local storage or High Performance Storage

  • RAM is still determined by what is available on local storage so running analyses is likely to take longer compared to High Performance Storage

Microsoft Teams is Radboud University’s data storage solution during data analysis. It is less useful than High Performance Storage, but it has certain use cases.

Take Home Messages
  • Different storage locations have different pros and cons which DCCN policies are built around

  • High Performance Storage and the Radboud Data Repository are the main storage locations we will use but Local Storage and Microsoft Teams have certain use cases.