Data Storage Facilities

The DCCN has many data storage facilities - each with their own advantages and disadvantages. It also relies on the Radboud Data Repository, which is used by the entire RU. Knowing where data can and should be stored, as well as when it should be stored in certain locations is crucial for being an effective and efficient researcher.

Local Storage

About

Local storage includes the storage on any local devices such the C:/ drive in your DCCN-issued PC, as well as the D:/ drive in lab computers.

Advantages

Using local storage can be helpful in multiple stages of the research cycle. One common use case is if you have a software package that cannot be downloaded on High Performance Storage - in this case it is best to work with data on the local storage of the PC issued to you by the Technical Group. Similarly, if you are collecting data in the lab and you are writing data while the experiment is running, you may wish to write the data to the local storage of the lab computer. Thus, some advantages are:

  • Easy to to access and work with data

  • Can change and update software freely

  • Requires no training

Disadvantages

When conducting analyses on the PC issued to you by the Technical Group, you will need to download your research data onto the local storage of your device. In such cases, you MUST already have anonymized data in case of a data leak from your local storage. Also, downloading all of the research data may take a long time depending on the size of the data set you are analyzing. Similarly, such analyses generally can be run much faster on the HPC cluster and may require more RAM (i.e. working memory) than your PC has. Finally, your PC can crash at any moment so all data in local storage can be lost; thus you must constantly re-upload your data to High Performance Storage to mitigate potential data loss. Thus, some disadvantages are:

  • Constant involuntary risk of data loss

  • Potential for privacy breach

  • Downloading data is time-consuming

  • Requires constant re-uploading to mitigate potential data loss

  • Less RAM

  • Less storage space

  • Files are not visible to any other research team members

Ultimately, local storage is risky because data is not backed up anywhere and inefficient for several reasons. Nonetheless, it does have its use-cases though you always must be careful to prevent data loss and breaches in privacy.

High Performance Storage

About

High Performance Storage includes several different drives: most notably the Home drive where your personal files may be kept, the Groupshare drive where your lab group’s shared files may be kept, and the Project drive where your project files (including research data) is kept. High Performance Storage consists of drives such as these, which are mounted on Network PC’s in Trigon such as those in the Instruction and Trainee rooms as well as all Lab PC’s. High Performance Storage is also compatible with the HPC cluster.

Advantages
  • Larger storage space than local storage on PCs.

  • Easily accessible via both Network PC’s and the HPC Cluster

  • Easy to access and work with data

  • Set up to work with parallelization, making analysis many times faster

  • Much more working memory than local storage

  • Another layer of protection against data loss

Disadvantages
  • Sometimes analysis packages/softwares cannot be user-downloaded (may require time for the TG to make these software available)

  • Storage is limited to the duration of the research project

  • Can only be accessed by research team members who are checked into the DCCN

High Performance Storage is the workhorse of data analysis at the DCCN: for the vast majority of use cases it is the ideal place to store data that you will analyze since it offers ease-of-access to files and is set up to function with other storage infrastructure. However, due to limited space you cannot leave data on High Performance Storage.

Radboud Data Repository

About

The Radboud Data Repository is where data is backed up and ultimately Archived/ Published. It includes three types of data collections which serve different purposes:

  • Data Acquisition Collections for raw data

  • Research Documentation Collections for scripts and logs outlining your intentions with your analyses

  • Data Sharing Collections for all data and analysis scripts used in creating the results reported in your manuscript

The endpoint of DAC and RDC is archiving, which is intended only for internal use (i.e. amongst members of the project). The endopoint of a DSC is `publishing.

Advantages
  • (Basically) unlimited storage

  • Secure

  • Facilitates compliance with Findable and Accessible principles of FAIR, thereby meeting funder requirements, many journal requirements, and University guidelines

  • Data for publication is reviewed for compliance with FAIR principles and privacy laws by a data steward

Disadvantages
  • Cannot read/write files directly

  • Sometimes services are down for routine maintanence

  • Time investment needed for familiarizing with the platform, and uploading, archiving and publishing data for a project

The Radboud Data Repository is the DCCN’s vault where data that is no longer being used is stored.

Take Home Messages
  • Different storage locations have different pros and cons which DCCN policies are built around

  • High Performance Storage and the Radboud Data Repository are the main storage locations we will use but Local Storage has certain use cases.