Difference between revisions of "Stupid awk tricks"

From Rizzo_Lab
Jump to: navigation, search
(csh variable in awk script)
 
(8 intermediate revisions by 2 users not shown)
Line 18: Line 18:
 
Here we use the shell variable rmsdcut to set a different RMSD cutoff for success rate when calculating the theoretical success rate in the multi-mol2 file of docked poses. This prints 1 if any of the poses in the mol2 file have an RMSD <= $rmsdcut.
 
Here we use the shell variable rmsdcut to set a different RMSD cutoff for success rate when calculating the theoretical success rate in the multi-mol2 file of docked poses. This prints 1 if any of the poses in the mol2 file have an RMSD <= $rmsdcut.
 
  awk 'BEGIN{ths=0}/RMSD:/{if($3<='$rmsdcut')ths=1}END{print ths}' $mol2file
 
  awk 'BEGIN{ths=0}/RMSD:/{if($3<='$rmsdcut')ths=1}END{print ths}' $mol2file
 +
 +
==Formal charge of a LIG mol2 file==
 +
This will only work on lines where the residue name is LIG
 +
awk '/  LIG  /{sum+=$9}END{print sum}' lig.mol2 
 +
 +
==Average # of rotatable bonds from dock.out==
 +
awk 'BEGIN{tot=0;sum=0}/Number of rotatable bonds/{sum+=$6;tot+=1}END{print tot,sum/tot}' rot.out
 +
 +
==Average size of a whole bunch of files==
 +
du ????/003_grid/$spacing/????.rec.nrg | awk '{sum+=$1;n+=1}END{print sum/n,n}'
 +
 +
==csh variable in awk script==
 +
Passing variable from a csh script to a nested awk script can be done as follows:
 +
 +
#! /bin/csh
 +
 +
set stringval = "text"
 +
 +
# the following two lines produce the same thing
 +
awk '{print "text"$2}'
 +
awk '{print "' $stringval '"$2}'
 +
 +
If you want to use a variable defined in the shell, you must break the string passed to awk in pieces
 +
awk '{print "' $stringval '"$2}'
 +
awk string1    string2    string3
 +
 +
The combination of the strings are executed in awk.
 +
string2 uses the string contained in the variable.
  
 
==References==
 
==References==
 
[http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_3.html awk Tutorial]
 
[http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_3.html awk Tutorial]

Latest revision as of 06:31, 6 May 2011

Grep out the wallclock time from the namd out files, convert to hours and adds them together.

grep "WallClock:" *.out | awk '{sum+=$2/3600} END {print "Hours="sum}'

Same thing, but now also does mean and sd

awk '{sum+=$2/3600; n+=1; sumsq+=($2/3600)^2} END {print "SD="sqrt(sum^2-sumsq)/(n+1),"Mean="sum/n}'

correction

awk -F, '{mean+=$94/5000; n+=1; meansq+=(($94^2)/5000)} END {print "SD="sqrt(meansq - mean^2),"Mean="mean}'

Theoretical success rate i.e. at least one pose in mol2 file with RMSD<=2

awk '/RMSD:/{if($3<=2.0)ths=1}END{print ths}' scored.mol2 

Return lowest EMSD in mol2 file

awk 'BEGIN{lrm=99}/RMSD:/{if($3<=lrm)lrm=$3}END{printf"%.2f",lrm}' mol2file

Lowest DOCK RMSD in MOL2 output

grep RMSD mol_scored.mol2 | awk '{print $3}' | sort -n | head 

Using a shell variable

Here we use the shell variable rmsdcut to set a different RMSD cutoff for success rate when calculating the theoretical success rate in the multi-mol2 file of docked poses. This prints 1 if any of the poses in the mol2 file have an RMSD <= $rmsdcut.

awk 'BEGIN{ths=0}/RMSD:/{if($3<='$rmsdcut')ths=1}END{print ths}' $mol2file

Formal charge of a LIG mol2 file

This will only work on lines where the residue name is LIG

awk '/  LIG  /{sum+=$9}END{print sum}' lig.mol2  

Average # of rotatable bonds from dock.out

awk 'BEGIN{tot=0;sum=0}/Number of rotatable bonds/{sum+=$6;tot+=1}END{print tot,sum/tot}' rot.out

Average size of a whole bunch of files

du ????/003_grid/$spacing/????.rec.nrg | awk '{sum+=$1;n+=1}END{print sum/n,n}'

csh variable in awk script

Passing variable from a csh script to a nested awk script can be done as follows:

#! /bin/csh

set stringval = "text"

# the following two lines produce the same thing
awk '{print "text"$2}' 
awk '{print "' $stringval '"$2}'

If you want to use a variable defined in the shell, you must break the string passed to awk in pieces

awk '{print "' $stringval '"$2}'
awk string1     string2    string3

The combination of the strings are executed in awk. string2 uses the string contained in the variable.

References

awk Tutorial