Restoring Lost Data
To give an example on how you might analyze data on the HPC Cluster, we’re going to analyze the data we ‘collected’ earlier using the MakeDataUp.m program.
However, before doing so we want to make sure that our data set is complete!
Let’s walk through an example of how you can restore data that was lost in the Project Folder but is in the DAC.
Practice Restoring Data with Stager
Since everyone needs to be able to analyze the data, however, let’s first duplicate the files in the /project/3010000.05/raw/ into a new folder that you can make changes to without affecting others’ work.
Copy the raw directory from of the directories in the /project/3010000.05/ folder (the keyboard shortcut is Ctrl + A).
Now, we assigned you a project number called XXXXXXX.XX: use this number to create a folder called /project/3010000.05/XXXXXXX.XX/.
Open this folder paste /project/3010000.05/raw/ into it.
Check for data loss between subjects number 1 to 10
Open
Applicationsand go toFile Explorerin the dropdown menuAt the file directory where it says
/home/groupname/firlas/, replace it with/project/3010000.05/XXXXXXX.XX/raw/Notice that there are no folders for
sub-002,sub-005, andsub-006- this data has been accidentally deleted
Note
If you are attempting to follow these instructions for your own project, just delete /XXXXXXX.XX from /project/3010000.05/XXXXXXX.XX/raw/
Establish a Network Connection to Trigon (either eduVPN or hardwired)
Go to https://stager.dccn.nl
Log in to the Stager service
After login, the folders in the DCCN Project Storage are displayed on the left side of the screen.
Input your RDR data access credentials in the fields under the
Radboud Data Repositorysection (revist this page if you don’t remember where to find these)
Select the Radboud Data Repository directories to download
Double-click on
dccnon the Radboud Data Repository SideDouble-click on
DAC_3010000.05_873on the Radboud Data Repository SideDouble-click on
rawon the Radboud Data Repository SideCheck the boxes next to the
sub-002,sub-005, andsub-006directories
Select the Project Storage directory to download the data into
Double-click on the
3010000.05/directory on the Project Storage sideDouble-click on the
XXXXXXX.XXdirectory on the Project Storage sideDouble-click on the
rawdirectory on the Project Storage side
Press the
Downloadbutton
Practice Restoring Data with Repocli
Check for data loss between subjects number 1 to 10
Open a session in TigerVNC
Open
Applicationsand go toFile Explorerin the dropdown menuAt the file directory where it says
/home/groupname/firlas/, replace it with/project/3010000.05/XXXXXXX.XX/raw/Notice that there are no folders for
sub-002,sub-005, andsub-006- this data has been accidentally deleted
Establish a Network Connection to Trigon (either eduVPN or hardwired)
Open a TigerVNC session (read how to do that here)
Login to the Radboud Data Repository
Open the terminal application
Type
repocli shelland then pushenterType
configand then pushenterEnter the username of the RDR data access credentials (u1234567@ru.nl) and then push
enterEnter the password of the RDR data access credentials you retreived in step 2, then push
enter
Download the Data Sharing Collection to Your Home Directory
Type
get dccn/DAC_3010000.05_873/raw/sub-002 /project/3010000.05/XXXXXXX.XX/raw/and then pushenterType
get dccn/DAC_3010000.05_873/raw/sub-005 /project/3010000.05/XXXXXXX.XX/raw/and then pushenterType
get dccn/DAC_3010000.05_873/raw/sub-006 /project/3010000.05/XXXXXXX.XX/raw/and then pushenter
Snapshot
If you accidentally delete 1 or more files, you may be able to retreive them with a snapshot by simply copying and pasting. Snapshots are sporadic captures of the state of a computer system at a point in time. To read more about snapshots and how you can restore deleted data, visit this link on the intranet.
Advanced Example: Restoring All Missing Subject Directories
In the above excercise, we saw how we can restore data from a DAC to your Project Folder. However, with many folders and subfolders to check, this can be tedious, inefficient, and prone to user error. So in this advanced example we will automate this process by creating a Bash script which runs on the HPC cluster.
Start a TigerVNC session
Run
/project/3010000.05/scripts/makeMissing.sh
Open the terminal emulator and run the following code
cd /project/3010000.05/scripts/
chmod +x makeMissing.sh
./makeMissing.sh /project/3010000.05/XXXXXXX.XX/raw/
Create
/project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh
Open the text editor and write code that compares all DAC folders to Project Folder folders,
restoring folders that are in the DAC but not the Project Folder.
Save the file as /project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh
Hint 1: Enumerate all folders in the DAC
#!/bin/bash
repocli ls dccn/DAC_3010000.05_873/raw/
Hint 2: Go through each in the DAC
#!/bin/bash
for sub_dir in $(repocli ls dccn/DAC_3010000.05_873/raw/); do
echo "dccn/DAC_3010000.05_873/raw/"$sub_dir;
done
Inside the for loop, we’re just printing the subject’s directory
Answer
#!/bin/bash
for sub_dir in $(repocli ls dccn/DAC_3010000.05_873/raw/); do
if [ ! -d "/project/3010000.05/XXXXXXX.XX/raw/"$sub_dir ]; then
repocli get "dccn/DAC_3010000.05_873/raw/"$sub_dir "/project/3010000.05/XXXXXXX.XX/raw/"$sub_dir
fi
done
Run
/project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh
cd /project/3010000.05/XXXXXXX.XX/scripts/
chmod +x restoreMissing.sh
./restoreMissing.sh