Difference between revisions of "Scp files from cluster to NYBlue"

From Rizzo_Lab
Jump to: navigation, search
(Using rsync)
(Copying files)
 
(2 intermediate revisions by the same user not shown)
Line 70: Line 70:
 
If you copy the file without specifying a path, it will be saved in the home directory on seawulf.
 
If you copy the file without specifying a path, it will be saved in the home directory on seawulf.
 
Every time you copy a file, the cluster login notice will print.
 
Every time you copy a file, the cluster login notice will print.
Note that you use CTRL-D to autocomplete pathnames on cluster even when using scp on fen. However, this is slow and will cause the cluster login notice to print every time.
+
Note that you use CTRL-D to autocomplete pathnames on cluster even when using scp on fen. However, this is slow and will cause the cluster login notice to print every time. For transferring a large number of files, consider using [[rsync]].
 
 
==Using rsync==
 
rsync allows you to sync the contents of two folders. Only the differences between the two folders are transferred, making this much faster for updating your backups. If you are backing up a large folder of scripts and data files to ringo, you should rysnc rather than using tar+scp. -a preserves file perssions, -v is verbose and -e ssh id required.
 
 
 
rsync -ave ssh /some/folder1 ringo.ams.sunysb.edu:/media/backup
 
rsync -ae ssh large_file.tar.bz2 ringo.ams.sunysb.edu:/media/backup
 
md5sum large_file.tar.bz2
 
 
 
This will cause /media/backup/folder1 to be created on ringo. md5sum can be used to verify the contents of the file transferred. I have tested this for a large 76GB backup archive from cluster to ringo, and the hashes matched up. If you setup keypair authentication, you can put rsync in a script for automated backups offsite.
 
 
 
rsync -av --delete reorg_testset ringo.ams.sunysb.edu:/media/sdb1
 
 
 
Note that there should not be a terminating / after the folder names, otherwise rsync will dump the contents of the folder in the tagret folder, and not create a new folder. This also uses the delete option that will remove files at the destination that do not exist in the source folder. v is the verbose option, so the files copied are listed. the n option performs a dry-running, showing a list of files copied and deleted, but does not actually change anything.
 

Latest revision as of 13:56, 11 November 2009

This sets up ssh keypair authentication so that you can scp files between cluster.bnl.gov and fen.bluegene.bnl.gov. Note that the Blue Gene machine is behind an additional firewall so that NYBlue can connect to other machines at BNL, but other BNL machines cannot connect to NYBlue. Therefore all scp commands must be issued from NYBlue and not from cluster. Follow the steps below.

Generate an SSH key pair for cluster

Log into fen.bluegene.bnl.gov. Generate an SSH key pair to authenticate cluster.

ssh-keygen -q -b 2048 -t rsa -f ~/.ssh/cluster

When prompted for a password, hit return to specify no passphrase. This will create a 2048-bit RSA key pair in your '.ssh' directory, one public (named 'cluster.pub') and one private (named 'cluster').

Install public key on cluster

Copy your public key to cluster.bnl.gov

scp ~/.ssh/cluster.pub cluster.bnl.gov:~/.ssh

Now log into cluster.bnl.gov as usual with your Active Directory password. Append the public to to your authorized_keys files.

ssh cluster.bnl.gov
cd .ssh 
cat cluster.pub >> authorized_keys

Note: Make sure that your authorized_keys file does not already contain a public key from cluster. If so, delete that line in the file and add the new public key instead.

Logging in to cluster with your key

Log out of cluster back to fen.bluegene. Now try logging back into cluster using the following command:

exit
ssh -i ~/.ssh/cluster cluster.bnl.gov
exit

After the '-i' option you provide the path to your private key file. This command will log you into cluster without a password. This sets up your passwordless login.

Create an ssh config file on fen

cd .ssh
vi config

Create the file called "config" in your .ssh folder.

Host cluster.bnl.gov cluster 
  User username
  Hostname cluster.bnl.gov 
  Protocol 2
  StrictHostKeyChecking no

Replace username with your own username. Change permissions of the "config" file to be -rw-r--r--

chmod 644 config

Copying files

You can now log in to cluster from fen with just

ssh cluster

You can now copy files from fen to cluster as

scp file.mol2 cluster:/path/in/cluster

You can also copy files from cluster using

scp cluster:/path/in/cluster/file.txt /path/in/fen

To copy multiple files with wildcards you have to escape the * with a \

scp cluster:/path1/file.\*  cluster:/path2/file2 /path/in/fen

If you copy the file without specifying a path, it will be saved in the home directory on seawulf. Every time you copy a file, the cluster login notice will print. Note that you use CTRL-D to autocomplete pathnames on cluster even when using scp on fen. However, this is slow and will cause the cluster login notice to print every time. For transferring a large number of files, consider using rsync.