Data Storage Facilities
The DCCN has many data storage facilities - each with their own advantages and disadvantages. Knowing where data can and should be stored, as well as when it should be stored in certain locations is crucial for being an effective and efficient researcher.
Local Storage
About
Local storage includes the storage on any local devices such the C:/ drive in your DCCN-issued PC or your personal laptop, as well as the D:/ drive in lab computers.
Advantages
Using local storage can be helpful in multiple stages of the research cycle. One common use case is if you have a software package that cannot be downloaded on High Performance Storage - in this case it is best to work with data on the local storage of your personal PC or on the PC issued to you by the Technical Group. Similarly, if you are collecting data in the lab and you are writing data while the experiment is running, you may wish to write the data to the local storage of the lab computer. Thus, some advantages are:
Easy to to access and work with data
Can change and update software freely
Requires no training
Disadvantages
When conducting analyses on your personal PC or the PC issued to you by the Technical Group, you will need to download your research data onto the local storage of your device. In such cases, you MUST already have anonymized data in case of a data leak from your local storage. Also, downloading all of the research data may take a long time depending on the size of the data set you are analyzing. Similarly, such analyses generally can be run much faster on the HPC cluster and may require more RAM (i.e. working memory) than your PC has. Finally, your PC can crash at any moment so all data in local storage can be lost; thus you must constantly re-upload your data to High Performance Storage to mitigate potential data loss. Thus, some disadvantages are:
Constant involuntary risk of data loss
Potential for privacy breach
Downloading data is time-consuming
Requires constant re-uploading to mitigate potential data loss
Less RAM
Less storage space
Files are not visible to any other research team members
Ultimately, local storage is risky because data is not backed up anywhere and inefficient for several reasons. Nonetheless, it does have its use-cases though you always must be careful to prevent data loss and breaches in privacy.
High Performance Storage
About
High Performance Storage includes several different drives: most notably the Home drive where your personal files may be kept, the Groupshare drive where your lab group’s shared files may be kept, and the Project drive where your project files (including research data) is kept. High Performance Storage consists of drives such as these, which are mounted on Network PC’s in Trigon such as those in the Instruction and Trainee rooms as well as all Lab PC’s. High Performance Storage is also compatible with the HPC cluster.
Advantages
Larger storage space than local storage on PCs.
Easily accessible via both Network PC’s and the HPC Cluster
Easy to access and work with data
Set up to work with parallelization, making analysis many times faster
Much more working memory than local storage
Disdvantages
Sometimes analysis packages/softwares cannot be user-downloaded
Not suitable for long-term storage
Can only be accessed by research team members who are checked into the DCCN
High Performance Storage is the workhorse of data analysis at the DCCN: for the vast majority of use cases it is the ideal place to store data that you will analyze since it offers ease-of-access to files and is set up to function with other storage infrastructure. However, due to limited space you cannot leave data on High Performance Storage.
Radboud Data Repository
About
The Radboud Data Repository is where data is backed up. It includes three types of data collections which serve different purposes:
Data Acquisition Collections for raw data
Research Documentation Collections for scripts and logs outlining your intentions with your analyses
Data Sharing Collections for all data and analysis scripts used in creating the results reported in your manuscript
Advantages
Can store a lot of data
Secure
Complies with all funder requirements and privacy laws (compared to non-approved storage solutions)
Disadvantages
Cannot read/write files directly
Sometimes there are technical difficulties or services are down
The Radboud Data Repository is the DCCN’s vault where data that is no longer being used is stored.
Microsoft Teams
About
Microsoft Teams is a new storage solution adopted by Radboud University. Microsoft Teams is a collaboration platform which also has a storage feature that functions as cloud storage. Unlike the other storage locations, it is not endorsed or supported by the DCCN.
Advantages
Offers much more storage than is available with High Performance Storage
You can read and write files on teams from local storage
External collaborators can read and write files
Disadvantages
You may not be in compliance with privacy and security regulations
Files are less easily accessible than local storage or High Performance Storage
RAM is still determined by what is available on local storage so running analyses is likely to take longer compared to High Performance Storage
Microsoft Teams is Radboud University’s data storage solution during data analysis. It is less useful than High Performance Storage, but it has certain use cases.
Take Home Messages
Different storage locations have different pros and cons which DCCN policies are built around
High Performance Storage and the Radboud Data Repository are the main storage locations we will use but Local Storage and Microsoft Teams have certain use cases.