Introduction to awk programming 2016

Welcome to the web page for the course "Introduction to awk programming".

The lecture notes as well as a list of the material covered can be found at the end of the page. An abstract of the of the course can also be found further down.

Course structure

The course will take place from the 15th to the 17th August 2016 at the Heidelberg University. We will meet in room 3.103 (PC-Pool 1), Mathematikon (INF 205) on the third floor. The course is structured as a full day course running from 9:30am till about 5pm each day (with a one hour lunch break in between).

Abstract

Dealing with large numbers of plain text files is quite frequent when making scientific calculations or simulations. For example, one wants to read a part of a file, do some processing on it and send the result off to another program for plotting. Often these tasks are very similar, but at the same time highly specific to the particular application or problem in mind, such that writing a single-use program in high-level language like C++ or Java hardly ever makes much sense: The development time is just too high. On the other end of the scale are simple shell scripts. But with them sometimes even simple data manipulation becomes extremely complicated or the script simply does not scale up and takes forever to work on bigger data sets.

Data-driven languages like awk sit on a middle ground here: awk scripts are as easy to code as plain shell scripts, but are well-suited for processing textual data in all kinds of ways. One should note, however, that awk is not extremely general. Following the UNIX philosophy it can do only one thing, but this it can do right. To make proper use of awk one hence needs to consider it in the context of a UNIX-like operating system.

In the first part of the course we will thus start with revising some concepts, which are common to many UNIX programs and also prominent in awk, like regular expressions. Afterwards we will discuss the basic structure of awk scripts and core awk features like

If there is time left we will also look at some advanced topics, like performing calculations
with arbitrary precision using awk.

This course is a subsidiary to the bash course which was offered in August 2015.

Learning objectives

After the course you will be able to

Prerequisites

Files

Link
Course abstract
Lecture notes
Course files (including notes, resources and example files)
Solutions to the exercises (pdf with comments)
Solution script files

Both the lecture notes as well as the script examples are managed in a public git repository on github. For the most recent version of the material (including corrected errors and other updates) you should refer to this repository or to the DOI https://doi.org/10.5281/zenodo.1038521. Feel free to cite this DOI in case you find the course material useful for your work.

Links and references