Difference between revisions of "Scp files from cluster to NYBlue"

From Rizzo_Lab
Jump to: navigation, search
(Using rsync)
(Using rsync)
Line 81: Line 81:
 
This will cause /media/backup/folder1 to be created on ringo. md5sum can be used to verify the contents of the file transferred. I have tested this for a large 76GB backup archive from cluster to ringo, and the hashes matched up. If you setup keypair authentication, you can put rsync in a script for automated backups offsite.
 
This will cause /media/backup/folder1 to be created on ringo. md5sum can be used to verify the contents of the file transferred. I have tested this for a large 76GB backup archive from cluster to ringo, and the hashes matched up. If you setup keypair authentication, you can put rsync in a script for automated backups offsite.
  
  rsync -av --delete reorg_testset ringo.ams.sunysb.edu:/media/sdb1
+
  rsync -av --delete --bwlimit=1000 reorg_testset ringo.ams.sunysb.edu:/media/sdb1
  
Note that there should not be a terminating / after the folder names, otherwise rsync will dump the contents of the folder in the tagret folder, and not create a new folder. This also uses the delete option that will remove files at the destination that do not exist in the source folder. v is the verbose option, so the files copied are listed. the n option performs a dry-running, showing a list of files copied and deleted, but does not actually change anything.
+
Note that there should not be a terminating / after the folder names, otherwise rsync will dump the contents of the folder in the tagret folder, and not create a new folder. This also uses the delete option that will remove files at the destination that do not exist in the source folder. v is the verbose option, so the files copied are listed. the n option performs a dry-running, showing a list of files copied and deleted, but does not actually change anything. --bwlimit=1000 constrains the bandwidth usage to 1000KB/s to keep from saturating the network connection.

Revision as of 14:35, 4 September 2009

This sets up ssh keypair authentication so that you can scp files between cluster.bnl.gov and fen.bluegene.bnl.gov. Note that the Blue Gene machine is behind an additional firewall so that NYBlue can connect to other machines at BNL, but other BNL machines cannot connect to NYBlue. Therefore all scp commands must be issued from NYBlue and not from cluster. Follow the steps below.

Generate an SSH key pair for cluster

Log into fen.bluegene.bnl.gov. Generate an SSH key pair to authenticate cluster.

ssh-keygen -q -b 2048 -t rsa -f ~/.ssh/cluster

When prompted for a password, hit return to specify no passphrase. This will create a 2048-bit RSA key pair in your '.ssh' directory, one public (named 'cluster.pub') and one private (named 'cluster').

Install public key on cluster

Copy your public key to cluster.bnl.gov

scp ~/.ssh/cluster.pub cluster.bnl.gov:~/.ssh

Now log into cluster.bnl.gov as usual with your Active Directory password. Append the public to to your authorized_keys files.

ssh cluster.bnl.gov
cd .ssh 
cat cluster.pub >> authorized_keys

Note: Make sure that your authorized_keys file does not already contain a public key from cluster. If so, delete that line in the file and add the new public key instead.

Logging in to cluster with your key

Log out of cluster back to fen.bluegene. Now try logging back into cluster using the following command:

exit
ssh -i ~/.ssh/cluster cluster.bnl.gov
exit

After the '-i' option you provide the path to your private key file. This command will log you into cluster without a password. This sets up your passwordless login.

Create an ssh config file on fen

cd .ssh
vi config

Create the file called "config" in your .ssh folder.

Host cluster.bnl.gov cluster 
  User username
  Hostname cluster.bnl.gov 
  Protocol 2
  StrictHostKeyChecking no

Replace username with your own username. Change permissions of the "config" file to be -rw-r--r--

chmod 644 config

Copying files

You can now log in to cluster from fen with just

ssh cluster

You can now copy files from fen to cluster as

scp file.mol2 cluster:/path/in/cluster

You can also copy files from cluster using

scp cluster:/path/in/cluster/file.txt /path/in/fen

To copy multiple files with wildcards you have to escape the * with a \

scp cluster:/path1/file.\*  cluster:/path2/file2 /path/in/fen

If you copy the file without specifying a path, it will be saved in the home directory on seawulf. Every time you copy a file, the cluster login notice will print. Note that you use CTRL-D to autocomplete pathnames on cluster even when using scp on fen. However, this is slow and will cause the cluster login notice to print every time.

Using rsync

rsync allows you to sync the contents of two folders. Only the differences between the two folders are transferred, making this much faster for updating your backups. If you are backing up a large folder of scripts and data files to ringo, you should rysnc rather than using tar+scp. -a preserves file perssions, -v is verbose and -e ssh id required.

rsync -ave ssh /some/folder1 ringo.ams.sunysb.edu:/media/backup
rsync -ae ssh large_file.tar.bz2 ringo.ams.sunysb.edu:/media/backup
md5sum large_file.tar.bz2

This will cause /media/backup/folder1 to be created on ringo. md5sum can be used to verify the contents of the file transferred. I have tested this for a large 76GB backup archive from cluster to ringo, and the hashes matched up. If you setup keypair authentication, you can put rsync in a script for automated backups offsite.

rsync -av --delete --bwlimit=1000 reorg_testset ringo.ams.sunysb.edu:/media/sdb1

Note that there should not be a terminating / after the folder names, otherwise rsync will dump the contents of the folder in the tagret folder, and not create a new folder. This also uses the delete option that will remove files at the destination that do not exist in the source folder. v is the verbose option, so the files copied are listed. the n option performs a dry-running, showing a list of files copied and deleted, but does not actually change anything. --bwlimit=1000 constrains the bandwidth usage to 1000KB/s to keep from saturating the network connection.