Introduction to awk programming 2016
Welcome to the web page for the course "Introduction to awk
programming".
The lecture notes as well as a list of the material covered can be found at the end of the page. An abstract of the of the course can also be found further down.
Course structure
The course will take place from the 15th to the 17th August 2016 at the Heidelberg University. We will meet in room 3.103 (PC-Pool 1), Mathematikon (INF 205) on the third floor. The course is structured as a full day course running from 9:30am till about 5pm each day (with a one hour lunch break in between).
Abstract
Dealing with large numbers of plain text files is quite frequent when
making scientific calculations or simulations. For example, one wants to
read a part of a file, do some processing on it and send the result off
to another program for plotting. Often these tasks are very similar, but
at the same time highly specific to the particular application or
problem in mind, such that writing a single-use program in high-level
language like C++
or Java
hardly ever makes much sense: The
development time is just too high. On the other end of the scale are
simple shell scripts. But with them sometimes even simple data
manipulation becomes extremely complicated or the script simply does not
scale up and takes forever to work on bigger data sets.
Data-driven languages like awk
sit on a middle ground here: awk
scripts are as easy to code as plain shell scripts, but are well-suited
for processing textual data in all kinds of ways. One should note,
however, that awk
is not extremely general. Following the UNIX
philosophy it can do only one thing, but this it can do right. To make
proper use of awk
one hence needs to consider it in the context of a
UNIX-like operating system.
In the first part of the course we will thus start with revising some
concepts, which are common to many UNIX programs and also prominent in
awk
, like regular expressions. Afterwards we will discuss the basic
structure of awk
scripts and core awk
features like
- ways to define how data sits in the input file
- extracting and printing data
- control statements (
if
,for
,while
, ...) awk
functionsawk
arrays
If there is time left we will also look at some advanced topics, like
performing calculations
with arbitrary precision using awk
.
This course is a subsidiary to the bash course which was offered in August 2015.
Learning objectives
After the course you will be able to
- enumerate different ways to define the structure of an input file in
awk
, - parse an structured input file and access individual values for post-processing,
- use regular expressions to search for text in a file,
- find and extract a well-defined part of a large file without relying on the exact position of this part,
- use awk to perform simple checks on text (like checking for repeated words) in less than 5 lines of code.
Prerequisites
- Familiarity with a UNIX-like operating system like GNU/Linux and the terminal is assumed.
- Basic knowledge of the UNIX command
grep
is assumed. You should for example know how to usegrep
to search for a word in a file. - It is not assumed, but highly recommended, that participants have had previous experiences with programming or scripting in a UNIX-like operating system.
Files
Link |
---|
Course abstract |
Lecture notes |
Course files (including notes, resources and example files) |
Solutions to the exercises (pdf with comments) |
Solution script files |
Both the lecture notes as well as the script examples are managed
in a public git repository on github.
For the most recent version of the material (including corrected errors
and other updates) you should refer to this repository
or to the DOI https://doi.org/10.5281/zenodo.1038521
.
Feel free to cite this DOI in case you find the course material useful for your work.