Difference between revisions of "Stupid awk tricks"
(→csh variable in awk script) |
|||
(9 intermediate revisions by 3 users not shown) | |||
Line 14: | Line 14: | ||
==Lowest DOCK RMSD in MOL2 output== | ==Lowest DOCK RMSD in MOL2 output== | ||
grep RMSD mol_scored.mol2 | awk '{print $3}' | sort -n | head | grep RMSD mol_scored.mol2 | awk '{print $3}' | sort -n | head | ||
+ | |||
+ | ==Using a shell variable== | ||
+ | Here we use the shell variable rmsdcut to set a different RMSD cutoff for success rate when calculating the theoretical success rate in the multi-mol2 file of docked poses. This prints 1 if any of the poses in the mol2 file have an RMSD <= $rmsdcut. | ||
+ | awk 'BEGIN{ths=0}/RMSD:/{if($3<='$rmsdcut')ths=1}END{print ths}' $mol2file | ||
+ | |||
+ | ==Formal charge of a LIG mol2 file== | ||
+ | This will only work on lines where the residue name is LIG | ||
+ | awk '/ LIG /{sum+=$9}END{print sum}' lig.mol2 | ||
+ | |||
+ | ==Average # of rotatable bonds from dock.out== | ||
+ | awk 'BEGIN{tot=0;sum=0}/Number of rotatable bonds/{sum+=$6;tot+=1}END{print tot,sum/tot}' rot.out | ||
+ | |||
+ | ==Average size of a whole bunch of files== | ||
+ | du ????/003_grid/$spacing/????.rec.nrg | awk '{sum+=$1;n+=1}END{print sum/n,n}' | ||
+ | |||
+ | ==csh variable in awk script== | ||
+ | Passing variable from a csh script to a nested awk script can be done as follows: | ||
+ | |||
+ | #! /bin/csh | ||
+ | |||
+ | set stringval = "text" | ||
+ | |||
+ | # the following two lines produce the same thing | ||
+ | awk '{print "text"$2}' | ||
+ | awk '{print "' $stringval '"$2}' | ||
+ | |||
+ | If you want to use a variable defined in the shell, you must break the string passed to awk in pieces | ||
+ | awk '{print "' $stringval '"$2}' | ||
+ | awk string1 string2 string3 | ||
+ | |||
+ | The combination of the strings are executed in awk. | ||
+ | string2 uses the string contained in the variable. | ||
==References== | ==References== | ||
[http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_3.html awk Tutorial] | [http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_3.html awk Tutorial] |
Latest revision as of 06:31, 6 May 2011
Grep out the wallclock time from the namd out files, convert to hours and adds them together.
grep "WallClock:" *.out | awk '{sum+=$2/3600} END {print "Hours="sum}'
Same thing, but now also does mean and sd
awk '{sum+=$2/3600; n+=1; sumsq+=($2/3600)^2} END {print "SD="sqrt(sum^2-sumsq)/(n+1),"Mean="sum/n}'
correction
awk -F, '{mean+=$94/5000; n+=1; meansq+=(($94^2)/5000)} END {print "SD="sqrt(meansq - mean^2),"Mean="mean}'
Theoretical success rate i.e. at least one pose in mol2 file with RMSD<=2
awk '/RMSD:/{if($3<=2.0)ths=1}END{print ths}' scored.mol2
Return lowest EMSD in mol2 file
awk 'BEGIN{lrm=99}/RMSD:/{if($3<=lrm)lrm=$3}END{printf"%.2f",lrm}' mol2file
Contents
Lowest DOCK RMSD in MOL2 output
grep RMSD mol_scored.mol2 | awk '{print $3}' | sort -n | head
Using a shell variable
Here we use the shell variable rmsdcut to set a different RMSD cutoff for success rate when calculating the theoretical success rate in the multi-mol2 file of docked poses. This prints 1 if any of the poses in the mol2 file have an RMSD <= $rmsdcut.
awk 'BEGIN{ths=0}/RMSD:/{if($3<='$rmsdcut')ths=1}END{print ths}' $mol2file
Formal charge of a LIG mol2 file
This will only work on lines where the residue name is LIG
awk '/ LIG /{sum+=$9}END{print sum}' lig.mol2
Average # of rotatable bonds from dock.out
awk 'BEGIN{tot=0;sum=0}/Number of rotatable bonds/{sum+=$6;tot+=1}END{print tot,sum/tot}' rot.out
Average size of a whole bunch of files
du ????/003_grid/$spacing/????.rec.nrg | awk '{sum+=$1;n+=1}END{print sum/n,n}'
csh variable in awk script
Passing variable from a csh script to a nested awk script can be done as follows:
#! /bin/csh set stringval = "text" # the following two lines produce the same thing awk '{print "text"$2}' awk '{print "' $stringval '"$2}'
If you want to use a variable defined in the shell, you must break the string passed to awk in pieces
awk '{print "' $stringval '"$2}' awk string1 string2 string3
The combination of the strings are executed in awk. string2 uses the string contained in the variable.