Restoring Lost Data

To give an example on how you might analyze data on the HPC Cluster, we’re going to analyze the data we ‘collected’ earlier using the MakeDataUp.m program. However, before doing so we want to make sure that our data set is complete! Let’s walk through an example of how you can restore data that was lost in the Project Folder but is in the DAC.

Practice Restoring Data with Stager

Since everyone needs to be able to analyze the data, however, let’s first duplicate the files in the /project/3010000.05/raw/ into a new folder that you can make changes to without affecting others’ work. Copy the raw directory from of the directories in the /project/3010000.05/ folder (the keyboard shortcut is Ctrl + A). Now, we assigned you a project number called XXXXXXX.XX: use this number to create a folder called /project/3010000.05/XXXXXXX.XX/. Open this folder paste /project/3010000.05/raw/ into it.

  1. Check for data loss between subjects number 1 to 10

  • Open Applications and go to File Explorer in the dropdown menu

  • At the file directory where it says /home/groupname/firlas/, replace it with /project/3010000.05/XXXXXXX.XX/raw/

  • Notice that there are no folders for sub-002, sub-005, and sub-006 - this data has been accidentally deleted

Note

If you are attempting to follow these instructions for your own project, just delete /XXXXXXX.XX from /project/3010000.05/XXXXXXX.XX/raw/

  1. Establish a Network Connection to Trigon (either eduVPN or hardwired)

  2. Go to https://stager.dccn.nl

  3. Log in to the Stager service

  • After login, the folders in the DCCN Project Storage are displayed on the left side of the screen.

  • Input your RDR data access credentials in the fields under the Radboud Data Repository section (revist this page if you don’t remember where to find these)

  1. Select the Radboud Data Repository directories to download

  • Double-click on dccn on the Radboud Data Repository Side

  • Double-click on DAC_3010000.05_873 on the Radboud Data Repository Side

  • Double-click on raw on the Radboud Data Repository Side

  • Check the boxes next to the sub-002, sub-005, and sub-006 directories

  1. Select the Project Storage directory to download the data into

  • Double-click on the 3010000.05/ directory on the Project Storage side

  • Double-click on the XXXXXXX.XX directory on the Project Storage side

  • Double-click on the raw directory on the Project Storage side

  1. Press the Download button

Practice Restoring Data with Repocli

  1. Check for data loss between subjects number 1 to 10

  • Open a session in TigerVNC

  • Open Applications and go to File Explorer in the dropdown menu

  • At the file directory where it says /home/groupname/firlas/, replace it with /project/3010000.05/XXXXXXX.XX/raw/

  • Notice that there are no folders for sub-002, sub-005, and sub-006 - this data has been accidentally deleted

  1. Establish a Network Connection to Trigon (either eduVPN or hardwired)

  2. Open a TigerVNC session (read how to do that here)

  3. Login to the Radboud Data Repository

  • Open the terminal application

  • Type repocli shell and then push enter

  • Type config and then push enter

  • Enter the username of the RDR data access credentials (u1234567@ru.nl) and then push enter

  • Enter the password of the RDR data access credentials you retreived in step 2, then push enter

  1. Download the Data Sharing Collection to Your Home Directory

  • Type get dccn/DAC_3010000.05_873/raw/sub-002 /project/3010000.05/XXXXXXX.XX/raw/ and then push enter

  • Type get dccn/DAC_3010000.05_873/raw/sub-005 /project/3010000.05/XXXXXXX.XX/raw/ and then push enter

  • Type get dccn/DAC_3010000.05_873/raw/sub-006 /project/3010000.05/XXXXXXX.XX/raw/ and then push enter

Snapshot

If you accidentally delete 1 or more files, you may be able to retreive them with a snapshot by simply copying and pasting. Snapshots are sporadic captures of the state of a computer system at a point in time. To read more about snapshots and how you can restore deleted data, visit this link on the intranet.

Advanced Example: Restoring All Missing Subject Directories

In the above excercise, we saw how we can restore data from a DAC to your Project Folder. However, with many folders and subfolders to check, this can be tedious, inefficient, and prone to user error. So in this advanced example we will automate this process by creating a Bash script which runs on the HPC cluster.

  1. Start a TigerVNC session

  2. Run /project/3010000.05/scripts/makeMissing.sh

Open the terminal emulator and run the following code

cd /project/3010000.05/scripts/
chmod +x makeMissing.sh
./makeMissing.sh /project/3010000.05/XXXXXXX.XX/raw/
  1. Create /project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh

Open the text editor and write code that compares all DAC folders to Project Folder folders, restoring folders that are in the DAC but not the Project Folder. Save the file as /project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh

Hint 1: Enumerate all folders in the DAC
#!/bin/bash
repocli ls dccn/DAC_3010000.05_873/raw/
Hint 2: Go through each in the DAC
#!/bin/bash
for sub_dir in $(repocli ls dccn/DAC_3010000.05_873/raw/); do
    echo "dccn/DAC_3010000.05_873/raw/"$sub_dir;
done

Inside the for loop, we’re just printing the subject’s directory

Answer
#!/bin/bash
for sub_dir in $(repocli ls dccn/DAC_3010000.05_873/raw/); do
    if [ ! -d "/project/3010000.05/XXXXXXX.XX/raw/"$sub_dir ]; then
        repocli get "dccn/DAC_3010000.05_873/raw/"$sub_dir "/project/3010000.05/XXXXXXX.XX/raw/"$sub_dir
    fi
done
  1. Run /project/3010000.05/XXXXXXX.XX/scripts/restoreMissing.sh

cd /project/3010000.05/XXXXXXX.XX/scripts/
chmod +x restoreMissing.sh
./restoreMissing.sh