class: center, middle, inverse, title-slide # Tutorial 2 — Bash & scripting ## MICB 405 — Bioinformatics 2021W1 ### Axel Hauduc ### University of British Columbia ### February 17, 2022 --- ## The Unix Philosophy 1. Make each program do one thing well. To do a new job, build a new program rather than complicate old programs. 1. The output of any program should easily become the input for another. --- ## Utilities review `cd` change directory `ls` list files (`-l` long) `pwd` print working directory `cp` copy `rm` remove (`-r` recursive) `mv` move `cat` concatenate `echo` print content you provided as argument --- ## More utilities... `grep` global regular expression print `less` read text files `head`/`tail` print first/last `wc` wordcount `-l` lines, `-w` word, `-m` character `sort` sort lines `uniq` return unique lines `chmod` change mode a.k.a. permissions `mkdir` make directory --- ## `awk` command Less Unix-y than previous commands More of a multi-tool for dealing with data files Not short for anything – does file processing based on lines & columns Follows the format: `awk '<filter> { <action }' <file>` e.g. `awk 'NR >= 2 { print $0 }' mtcars.tsv` --- ## Special characters `>` redirect `>>` append `|` pipe (not `1`, `l`, or `I`) `;` separate command `&&` AND `||` OR `*` wildcard --- ## In-terminal text editing `nano` = easiest to use & sufficient for most purposes Example: `nano <filename>` command creates a new file, or edits an existing one of that name <kbd>ctrl</kbd> + <kbd>o</kbd> overwrite (i.e. save) <kbd>ctrl</kbd> + <kbd>x</kbd> exit --- ## Scripting Helps repeat repetitive tasks Scripts the form of a text file that can be "executed" when you need them --- ## Script headers All script files need a header to indicate how the script file should be run First line specifies the program that will interpret the rest of the script ```bash #!/bin/bash ``` - Program should interpret following text using Bash - Other programs = other headers, if run by calling the script directly --- ## Script variables We can create a variable and assign it a value with ```bash results_dir="results/" ``` Note that spaces matter when setting Bash variables. Do **NOT** use spaces around the equal sign `=`. --- ## Variables To access a variable's value, we use a dollar sign in front of the variable's name. Suppose we want to create a directory for a sample's alignment data, called `<sample>_aln/`, where `<sample>` is replaced by the samples name. ```bash sample="Individual_2A" mkdir "${sample}_data/" ``` This will create a directory with the name `Individual_2A_data`. Curly braces `{ }` indicate where variable name starts and ends. --- ## Command line arguments ```bash grep -c "<string>" <file> bash <script> <path/to/file> ``` Arguments added after calling a script are assigned to the default variables `$1`, `$2`, `$3`, and so forth within the script. --- ## `for` loops In bioinformatics, most of our data is split across multiple files. Many processing pipelines need a way to apply the same workflow on each file, taking care to keep track of sample names. Looping over files with Bash’s `for` loop = simplest way to accomplish this Three essential parts to creating a pipeline to process a set of files: 1. Selecting which files to apply the commands to 1. Looping over the data and applying the commands 1. Keeping track of the names of any output files created --- ## `for` loops template ```bash #!/bin/bash for a_file in </path/to/dir/>* do <action on> $a_file <another action on> $a_file done ``` --- ## `for` looping through directories Creating for loops that loop through an entire directory ```bash #!/bin/bash for foo in /home/ahauduc_mb20/test_directory/* do head $foo > ${foo}.head.and.tail.txt tail $foo >> ${foo}.head.and.tail.txt done ``` --- ## `if` statements Use if you want to perform commands on a subset of files, or only if an action meets certain conditions. The basic syntax is: ```bash if <condition is true> then <DO THIS> else <DO THAT> fi ``` --- ## `if` statements ```bash #!/bin/bash if cat $1 then echo "The file exists!" else echo "The file doesn't exist!" fi ``` --- ## Return codes `0` Command executed successfully `1`+ Command did not execute successfully There can by many different error types denoted by specific numbers --- ## Return codes are invisible ```bash #!/bin/bash if cat $1 then echo "The file exists!" else echo "The file doesn't exist!" fi ``` --- ## `test` statements: `[[ <condition> ]]` Like other programs, `test` exits with either `0` or `1`. Test statements can be included at the beginning of the `if` program to make producing the right return code easier `test` supports numerous helpful comparisons you might need --- ## `test` summary comparisons <table style="width:100%;"> <thead> <tr> <th style="text-align:left;"> Condition </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;width: 25%; "> -z <str> </td> <td style="text-align:left;"> string <str> is null </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <str1> = <str2> </td> <td style="text-align:left;"> <str1> and <str2> are identical </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <str1> != <str2> </td> <td style="text-align:left;"> <str1> and <str2> are different </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -eq <int2> </td> <td style="text-align:left;"> Integers <int1> and <int2> are equal </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -ne <int2> </td> <td style="text-align:left;"> <int1> and <int2> are not equal </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -lt <int2> </td> <td style="text-align:left;"> <int1> is less than <int2> </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -gt <int2> </td> <td style="text-align:left;"> <int1> is greater than <int2> </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -le <int2> </td> <td style="text-align:left;"> <int1> is less than or equal to <int2> </td> </tr> <tr> <td style="text-align:left;width: 25%; "> <int1> -ge <int2> </td> <td style="text-align:left;"> <int1> is greater than or equal to <int2> </td> </tr> </tbody> </table> --- ## `test` summary for files/directories <table style="width:100%;"> <thead> <tr> <th style="text-align:left;"> Condition </th> <th style="text-align:left;"> Description </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;width: 25%; "> -d <dir> </td> <td style="text-align:left;"> <dir> is a directory </td> </tr> <tr> <td style="text-align:left;width: 25%; "> -f <file> </td> <td style="text-align:left;"> <file> is a file </td> </tr> <tr> <td style="text-align:left;width: 25%; "> -e <file> </td> <td style="text-align:left;"> <file> exists </td> </tr> <tr> <td style="text-align:left;width: 25%; "> -r <file> </td> <td style="text-align:left;"> <file> is readable </td> </tr> <tr> <td style="text-align:left;width: 25%; "> -w <file> </td> <td style="text-align:left;"> <file> is writeable </td> </tr> <tr> <td style="text-align:left;width: 25%; "> -x <file> </td> <td style="text-align:left;"> <file> is executable </td> </tr> </tbody> </table> --- ## `if` + `test` statements Combining `test` with `if` statements is simple: ```bash #!/bin/bash if [[ <condition> ]] then <this action> else <that action> fi ``` Note the spaces around and within the brackets `[[ ]]`: these are required. --- class: middle ## Putting it all together... --- ## `for` looping through directories ```bash #!/bin/bash for foo in /home/axel/Documents/data/* do if [[ -f ${foo} ]] then head ${foo} > ${foo}.head.and.tail.txt tail ${foo} >> ${foo}.head.and.tail.txt else echo “${foo} is not a file to summarize!” fi done ```