BASH scripting

From Rizzo_Lab
Jump to: navigation, search

Bourne-Again Shell (Bash)

Bash is an acronym for "Bourne-Again Shell", the name of a code interpreter and a high-level programming language, and it is a must-know tool in Computational Chemistry and Biology. You can use Bash scripting in Unix/Linux computers through a terminal. When you initialize the shell, i.e, the interpreter, your computer runs initialization files -- ~/.bash_profile, ~/.bash_login, and ~/.profile (where ~/ points to your home directory) -- but we do not recommend changing these files unless you really know what you are doing. In most cases, you can change the ~/.bashrc file, which allows the user to customize the system according to their needs.

A bash script is a text file containing a series of instructions written in the bash language. You can create one by typing the following commands in the terminal:

 touch my_first_script.sh

which will generate a modifiable file that you can use to write the instructions to be executed by the shell. You can use the Vi text editor to write your code; just remember to add to the beginning of the file the following line:

 #!/bin/sh

This line tells the interpreter that this is a bash script. You can run your script by telling the interpreter:

 bash my_first_script.sh

or you can change the permissions of the file to make it an executable by typing:

 chmod +x my_first_script.sh

and then running:

 ./my_first_script.sh

Suppose your my_first_script.sh contains the following lines:

 #!/bin/sh
 
 number=6
 for ((i=0;i<${number};i++))
 do
     echo "Hello world ${i}"
 done

If you run ./my_first_script.sh, the output will be:

 Hello world 0
 Hello world 1
 Hello world 2
 Hello world 3
 Hello world 4
 Hello world 5

For more on commands, see Unix.

Environment Variables

Declaring and accessing variables

Bash allows the user to assign values to variables in the command line, but it is more common to set any variables inside your scripts or ~/.bashrc file. In Bash, you define your variable using the following syntax:

 my_variable=value

Do not leave spaces between the variable name and its value. You can check the value of a variable by typing the following command in your terminal:

 echo $my_variable

The shell will show the following result in your screen:

 value

Remember to tell the shell that you want to return the value of the my_variable by using the $ sign, otherwise, you'll be shell to print the string my_variable on the screen.

In the previous section, the variable number contained the value 6, and the variable i was an iteration counter that was called inside the for loop. It is good practice to encapsulate the variable name with curly brackets {} to avoid ambiguities inside the code.


Paths and Environment variables

Every file has a path, i.e, a location within the file system. Paths can be specified in an absolute way -- with respect to the whole file system -- or in a relative way -- with respect to the working directory. HOME and PATH are variables containing paths information, i.e., the "addresses" within the directory tree that allows the user to find their files and executables.

Variables such as HOME or PATH are inherited from the environment. They have short and easily memorizable names that characterize important environment features. HOME, for instance, is the variable that identifies your home directory (e.g, /gpfs/home/your_username on Seawulf). You can modify your home directory by changing HOME to whichever path you find more suitable, but it is advisable not to do it if you have scripts and programs that depend on a pre-existing value for the variable.

PATH is the variable that determines which directories the shell should look for the programs that the user might use. If you use the Python programming language, you might need to create a PYTHONPATH variable to be able to include your own Python subroutines and classes. Similarly, if you use the Amber software, all Amber-related programs can be found in the path defined by AMBERHOME. We, DOCK6 developers and users, define DOCKHOME as the path to the most stable release or as we see it fit.

If you type the name of a file as if it were a command, the shell searches for this program in certain directories defined in the PATH variable. PATH specifies the order in which these directories should be searched by the shell. You can add more directories to the variable by typing:

 export PATH=/new/path/to/directory:${PATH}

In the command above, you appended the path /new/path/to/directory to the beginning of the PATH variable.

Basic commands

Most basic commands can be found at Unix. Here are some tricks of the trade:

Iterations

If you have an iterative task to run, you can use a for loop. The simplest kind of for loop was already shown in Bourne-Again Shell section. Suppose, however, that each line of the file systems.txt contains the name of a system that you need to work on. The most straightforward way of doing it in bash is:

 for line in $(cat systems.txt)
 do 
      ## Commands
     echo ${line}
 done

while loops can be used in a similar way:

 while IFS= read -r line
 do
     ## Commands
 done < system.txt

Every character after a # is a comment and will not be read by the interpreter. The exceptions to this rule are the #! that defines the shell and cluster management/job scheduling systems (see SLURM). IFS stands for internal field separator and it is used by the shell to deal with word splitting.

Conditionals

Conditions are written as if...then, if...then...else statements. You can nest if and else clauses as many times as necessary, but try to keep the decision-making process simple. Nested if clauses can be great sources of headaches.

 if  conditional_expression
     then
     ##commands
 elif another_conditional_expression
     then
     ##other commands
 else
     ##more commands
 fi

Bash does not care about code indentation, but it is useful to indent your code to increase its readability.

Creating files during runtime

You will frequently run bash scripts to prepare your files before a simulation. This is easily done by:

 cat <<EOF > filename
 text
 EOF

EOF means end of file and the sequence above means that text will be written in filename.