AWK Programming Language

The awk programming language is a scripting language that is executed by the awk command line tool. The basics of awk can be found in that other guide. This page will go over some of the features of the programming language.

Separators

By default awk treats empty space as a field separator for columns but you can specify any character you like. Let's play around with the contents of our passwd file:

cat /etc/passwd

root:x:0:0::/root:/usr/bin/bash
bin:x:1:1::/:/usr/bin/nologin
daemon:x:2:2::/:/usr/bin/nologin
mail:x:8:12::/var/spool/mail:/usr/bin/nologin
ftp:x:14:11::/srv/ftp:/usr/bin/nologin
http:x:33:33::/srv/http:/usr/bin/nologin
nobody:x:65534:65534:Kernel Overflow User:/:/usr/bin/nologin
dbus:x:81:81:System Message Bus:/:/usr/bin/nologin
systemd-coredump:x:980:980:systemd Core Dumper:/:/usr/bin/nologin
systemd-network:x:979:979:systemd Network Management:/:/usr/bin/nologin
systemd-oom:x:978:978:systemd Userspace OOM Killer:/:/usr/bin/nologin
systemd-journal-remote:x:977:977:systemd Journal Remote:/:/usr/bin/nologin
systemd-resolve:x:976:976:systemd Resolver:/:/usr/bin/nologin
systemd-timesync:x:975:975:systemd Time Synchronization:/:/usr/bin/nologin
tss:x:974:974:tss user for tpm2:/:/usr/bin/nologin
uuidd:x:68:68::/:/usr/bin/nologin
polkitd:x:102:102:User for polkitd:/:/usr/bin/nologin
avahi:x:973:973:Avahi mDNS/DNS-SD daemon:/:/usr/bin/nologin
git:x:972:972:git daemon user:/:/usr/bin/git-shell
brltty:x:970:970:Braille Device Daemon:/var/lib/brltty:/usr/bin/nologin
colord:x:969:969:Color management daemon:/var/lib/colord:/usr/bin/nologin
flatpak:x:968:968:Flatpak system helper:/:/usr/bin/nologin
gdm:x:120:120:Gnome Display Manager:/var/lib/gdm:/usr/bin/nologin
geoclue:x:967:967:Geoinformation service:/var/lib/geoclue:/usr/bin/nologin
gnome-remote-desktop:x:966:966:GNOME Remote Desktop:/var/lib/gnome-remote-desktop:/usr/bin/nologin
rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/nologin
saned:x:965:965:SANE daemon user:/:/usr/bin/nologin
usbmux:x:140:140:usbmux user:/:/usr/bin/nologin
named:x:40:40:BIND DNS Server:/:/usr/bin/nologin
alpm:x:963:963:Arch Linux Package Management:/:/usr/bin/nologin
monero:x:962:962::/var/lib/monero:/usr/bin/nologin
unbound:x:961:961:unbound:/etc/unbound:/usr/bin/nologin
mysql:x:960:960:MariaDB:/var/lib/mysql:/usr/bin/nologin
gluster:x:959:959:GlusterFS daemons:/var/run/gluster:/usr/bin/nologin
qemu:x:958:958:QEMU user:/:/usr/bin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/usr/bin/nologin
libvirt-qemu:x:956:956:Libvirt QEMU user:/:/usr/bin/nologin
dnsmasq:x:955:955:dnsmasq daemon:/:/usr/bin/nologin

Here we can see the file is clearly split into columns, however this time columns are separated by :. Let's pass this into awk now:

# -F is used to specify our field separator
cat /etc/passwd | awk -F ":" '{print $1}'

root
bin
daemon
mail
ftp
http
nobody
dbus
systemd-coredump
systemd-network
systemd-oom
systemd-journal-remote
systemd-resolve
systemd-timesync
tss
uuidd
polkitd
avahi
git
brltty
colord
flatpak
gdm
geoclue
gnome-remote-desktop
rtkit
saned
usbmux
named
alpm
monero
unbound
mysql
gluster
qemu
rpc
libvirt-qemu
dnsmasq

This will give us a list of all the users on our system. Let's say now that we wanted to not only grab all the users on our system but we wanted to know each user's home directory and default shell. We can simply grab all of those columns with awk like we have before but what if we wanted our output to be formatted pretty? With awk we not only specify which field separator we want to look for in our input but we can specify an output field separator:

# You will notice this time that we aren't piping the output of cat into awk
# but rather just telling awk which file we want to run this on. This is just
# another way to use awk and I wanted to show it off.
awk 'BEGIN{FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd

root-/root-/usr/bin/bash
bin-/-/usr/bin/nologin
daemon-/-/usr/bin/nologin
mail-/var/spool/mail-/usr/bin/nologin
ftp-/srv/ftp-/usr/bin/nologin
http-/srv/http-/usr/bin/nologin
nobody-/-/usr/bin/nologin
dbus-/-/usr/bin/nologin
systemd-coredump-/-/usr/bin/nologin
systemd-network-/-/usr/bin/nologin
systemd-oom-/-/usr/bin/nologin
systemd-journal-remote-/-/usr/bin/nologin
systemd-resolve-/-/usr/bin/nologin
systemd-timesync-/-/usr/bin/nologin
tss-/-/usr/bin/nologin
uuidd-/-/usr/bin/nologin
polkitd-/-/usr/bin/nologin
avahi-/-/usr/bin/nologin
git-/-/usr/bin/git-shell
brltty-/var/lib/brltty-/usr/bin/nologin
colord-/var/lib/colord-/usr/bin/nologin
flatpak-/-/usr/bin/nologin
gdm-/var/lib/gdm-/usr/bin/nologin
geoclue-/var/lib/geoclue-/usr/bin/nologin
gnome-remote-desktop-/var/lib/gnome-remote-desktop-/usr/bin/nologin
rtkit-/proc-/usr/bin/nologin
saned-/-/usr/bin/nologin
usbmux-/-/usr/bin/nologin
named-/-/usr/bin/nologin
alpm-/-/usr/bin/nologin
monero-/var/lib/monero-/usr/bin/nologin
unbound-/etc/unbound-/usr/bin/nologin
mysql-/var/lib/mysql-/usr/bin/nologin
gluster-/var/run/gluster-/usr/bin/nologin
qemu-/-/usr/bin/nologin
rpc-/var/lib/rpcbind-/usr/bin/nologin
libvirt-qemu-/-/usr/bin/nologin
dnsmasq-/-/usr/bin/nologin

You will see that now we have the three fields we wanted and they are being separated by a - like we specified in our command.

String parsing

For our next challenge let's say we wanted to know each of the shells installed on our system. We can just view whats in /etc/shells:

cat /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

/bin/sh
/bin/bash
/usr/bin/sh
/usr/bin/bash
/usr/bin/git-shell
/usr/bin/fish
/bin/fish

This will work but what if we wanted just the name of the shells themselves. With awk we can print just the last column of our supplied text with $NF:

awk -F "/" '{print $NF}' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

sh
bash
sh
bash
git-shell
fish
fish

This is almost what we wanted but you will see that the first few lines of the shells file also stuck around since it didn't use our / separator. Let's talk about how we can tell awk exactly what kind of text we want to look for from our file. Inside of our single quotes in our awk command we can do more than just specify what we want to print. In fact anything inside our single quotes is actually our awk script if you want to think about it that way. Earlier in the separators section you saw that we told awk what we wanted our input and output field separators to be; this was done inside the single quotes of our awk command. Now let's tell awk what type of line we want to grab for our shells file. We can specify any search pattern we want to look for inside of / /:

# awk uses regex inside of the '/ /' to define what it is searching for.
# For more information on regex see my regex guide.
awk -F "/" '/^\// {print $NF}' /etc/shells

sh
bash
sh
bash
git-shell
fish
fish

We used regex to define that we only wanted to look for lines that started with a /. Now lets just pipe the output of our awk command into sort so they are sorted alphabetically, and lets pipe that into uniq to remove the duplicate shells:

awk -F "/" '/^\// {print $NF}' /etc/shells | sort | uniq

bash
fish
git-shell
sh

This time lets search our bashrc for any lines starting a a or a b:

awk '$1 ~ /^[a,b]/ {print $0}' ~/.bashrc

alias ls='ls --color=auto'
alias grep='grep --color=auto'

Scripting

One of the things that makes awk so powerful is that it in itself is a scripting language. What do I mean by that? Lets think of an example, we will be picking on the shells file again. Lets say we only wanted to print lines that are over 12 characters long:

awk 'length($0) > 12' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.
/usr/bin/bash
/usr/bin/git-shell
/usr/bin/fish

We also have if statements available to us:

# ps -ef prints all of the resources running on our machine
ps -ef | awk '{ if($NF == "/usr/bin/fish") print $0 }'

epost       4820    4730  0 Feb14 pts/0    00:00:02 /usr/bin/fish

We used a simple if statement to see if the last column ($NF) was equal to /bin/fish and if so we printed the whole line ($0).

We also have for loops available to us:

awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'

The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
The square of 6 is 36
The square of 7 is 49
The square of 8 is 64
The square of 9 is 81
The square of 10 is 100

Our for loop is layed out just like it is in any other language; We specify our incrementing variable and initialize it, we set our stopping point, and we set our incrementing amount. You may have also noticed that we can do arithmetic in our awk script which is another powerful aspect of awk scripting.

Line numbers

A feature of awk worth noting is the line number specifier. Say we had a big block of output from a command and we only wanted to see a specific line number of the output, or even a specific range of line numbers. Lets try this on the df command:

df | awk 'NR==7, NR==11 {print NR, $0}'

7 tmpfs                1024         0       1024   0% /run/credentials/systemd-journald.service
8 tmpfs            32465572     49004   32416568   1% /tmp
9 /dev/nvme0n1p2 1952463960 918893540 1033148156  48% /.snapshots
10 /dev/nvme0n1p2 1952463960 918893540 1033148156  48% /var/cache/pacman/pkg
11 /dev/nvme0n1p2 1952463960 918893540 1033148156  48% /var/log

NR is what we use in awk to signify line number. Above you can see we were able to grab lines 7-11 using NR and print both the line number and the line itself. Of course if we didn't want to print the line number we could just drop NR from our print statement:

df | awk 'NR==7, NR==11 {print $0}'

tmpfs                1024         0       1024   0% /run/credentials/systemd-journald.service
tmpfs            32465572     49016   32416556   1% /tmp
/dev/nvme0n1p2 1952463960 918893572 1033148124  48% /.snapshots
/dev/nvme0n1p2 1952463960 918893572 1033148124  48% /var/cache/pacman/pkg
/dev/nvme0n1p2 1952463960 918893572 1033148124  48% /var/log

We can also use NR to get a line count of a file, lets pick on /etc/shells again:

awk 'END {print NR}' /etc/shells

We can also use the line number feature of awk to replace the linux tool head:

awk 'NR < 6' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

/bin/sh
/bin/bash

We can see that we grabbed the first 6 lines of our shells file.