Awk

Separators

By default awk treats empty space as a field separator for columns but you can specify any character you like. Lets play around with the contents of our passwd file:

cat /etc/passwd

root:x:0:0::/root:/bin/fish
bin:x:1:1::/:/usr/bin/nologin
daemon:x:2:2::/:/usr/bin/nologin
mail:x:8:12::/var/spool/mail:/usr/bin/nologin
ftp:x:14:11::/srv/ftp:/usr/bin/nologin
http:x:33:33::/srv/http:/usr/bin/nologin
nobody:x:65534:65534:Nobody:/:/usr/bin/nologin
dbus:x:81:81:System Message Bus:/:/usr/bin/nologin
systemd-journal-remote:x:982:982:systemd Journal Remote:/:/usr/bin/nologin
systemd-network:x:981:981:systemd Network Management:/:/usr/bin/nologin
systemd-resolve:x:980:980:systemd Resolver:/:/usr/bin/nologin
systemd-timesync:x:979:979:systemd Time Synchronization:/:/usr/bin/nologin
systemd-coredump:x:978:978:systemd Core Dumper:/:/usr/bin/nologin
uuidd:x:68:68::/:/usr/bin/nologin
avahi:x:977:977:Avahi mDNS/DNS-SD daemon:/:/usr/bin/nologin
colord:x:976:976:Color management daemon:/var/lib/colord:/usr/bin/nologin
dhcpcd:x:975:975:dhcpcd privilege separation:/:/usr/bin/nologin
git:x:974:974:git daemon user:/:/usr/bin/git-shell
lightdm:x:973:973:Light Display Manager:/var/lib/lightdm:/usr/bin/nologin
polkitd:x:102:102:PolicyKit daemon:/:/usr/bin/nologin
epost:x:1000:1000::/home/epost:/bin/fish
brltty:x:970:970:Braille Device Daemon:/var/lib/brltty:/usr/bin/nologin
dnsmasq:x:969:969:dnsmasq daemon:/:/usr/bin/nologin
nvidia-persistenced:x:143:143:NVIDIA Persistence Daemon:/:/usr/bin/nologin
rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/nologin
usbmux:x:140:140:usbmux user:/:/usr/bin/nologin
systemd-oom:x:966:966:systemd Userspace OOM Killer:/:/usr/bin/nologin

Here we can see the file is clearly split into columns, however this time columns are separated by :. Lets pass this into awk now:

# -F is used to specify our field separator
cat /etc/passwd | awk -F ":" '{print $1}'

root
bin
daemon
mail
ftp
http
nobody
dbus
systemd-journal-remote
systemd-network
systemd-resolve
systemd-timesync
systemd-coredump
uuidd
avahi
colord
dhcpcd
git
lightdm
polkitd
epost
brltty
dnsmasq
nvidia-persistenced
rtkit
usbmux
systemd-oom

This will give us a list of all the users on our system. Lets say now that we wanted to not only grab all the users on our system but we wanted to know each user's home directory and default shell. We can simply grab all of those columns with awk like we have before but what if we wanted our output to be formatted pretty? With awk we not only specify which field separator we want to look for in our input but we can specify an output field separator:

# You will notice this time that we aren't piping the output of cat into awk
# but rather just telling awk which file we want to run this on. This is just
# another way to use awk and I wanted to show it off.
awk 'BEGIN{FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd

root-/root-/bin/fish
bin-/-/usr/bin/nologin
daemon-/-/usr/bin/nologin
mail-/var/spool/mail-/usr/bin/nologin
ftp-/srv/ftp-/usr/bin/nologin
http-/srv/http-/usr/bin/nologin
nobody-/-/usr/bin/nologin
dbus-/-/usr/bin/nologin
systemd-journal-remote-/-/usr/bin/nologin
systemd-network-/-/usr/bin/nologin
systemd-resolve-/-/usr/bin/nologin
systemd-timesync-/-/usr/bin/nologin
systemd-coredump-/-/usr/bin/nologin
uuidd-/-/usr/bin/nologin
avahi-/-/usr/bin/nologin
colord-/var/lib/colord-/usr/bin/nologin
dhcpcd-/-/usr/bin/nologin
git-/-/usr/bin/git-shell
lightdm-/var/lib/lightdm-/usr/bin/nologin
polkitd-/-/usr/bin/nologin
epost-/home/epost-/bin/fish
brltty-/var/lib/brltty-/usr/bin/nologin
dnsmasq-/-/usr/bin/nologin
nvidia-persistenced-/-/usr/bin/nologin
rtkit-/proc-/usr/bin/nologin
usbmux-/-/usr/bin/nologin
systemd-oom-/-/usr/bin/nologin

You will see that now we have the three fields we wanted and they are being separated by a - like we specified in our command.

String parsing

For our next challenge lets say we wanted to know each of the shells installed on our system. We can just view whats in /etc/shells:

cat /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

/bin/sh
/bin/bash
/usr/bin/git-shell
/usr/bin/fish
/bin/fish

This will work but what if we wanted just the name of the shells themselves. With awk we can print just the last column of our supplied text with $NF:

awk -F "/" '{print $NF}' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

sh
bash
git-shell
fish
fish

This is almost what we wanted but you will see that the first few lines of the shells file also stuck around since it didn't use our / separator. Lets talk about how we can tell awk exactly what kind of text we want to look for from our file. Inside of our single quotes in our awk command we can do more than just specify what we want to print. In fact anything inside our single quotes is actually our awk script if you want to think about it that way. Earlier in the separators section you saw that told awk what we wanted our input and output field separators to be; this was done inside the single quotes of our awk command. Now lets tell awk what type of line we want to grab for our shells file. We can specify any search pattern we want to look for inside of / /:

# awk uses regex inside of the '/ /' to define what it is searching for.
# For more information on regex see my regex guide.
awk -F "/" '/^\// {print $NF}' /etc/shells

sh
bash
git-shell
fish
fish

We used regex to define that we only wanted to look for lines that started with a /. Now lets just pipe the output of our awk command into uniq to remove the duplicate shells, and lets pipe that into sort so they are sorted alphabetically:

awk -F "/" '/^\// {print $NF}' /etc/shells | uniq | sort

bash
fish
git-shell
sh

This time lets search our bashrc for any lines starting a b or a c:

awk '$1 ~ /^[b,c]/ {print $0}' ~/.bashrc

Scripting

One of the things that makes awk so powerful is that it in itself is a scripting language. What do I mean by that? Lets think of an example, we will be picking on the shells file again. Lets say we only wanted to print lines that are over 8 characters long:

awk 'length($0) > 8' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.
/bin/bash
/usr/bin/git-shell
/usr/bin/fish
/bin/fish

We also have if statements available to us:

# ps -ef prints all of the resources running on our machine
ps -ef | awk '{ if($NF == "/bin/fish") print $0 }'

epost       5675    5674  0 06:11 pts/0    00:00:00 /bin/fish
epost       7201    7200  0 06:30 pts/1    00:00:01 /bin/fish

We used a simple if statement to see if the last column ($NF) was equal to /bin/fish and if so we printed the whole line ($0).

We also have for loops available to us:

awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'

The square of 1 is 1
The square of 2 is 4
The square of 3 is 9
The square of 4 is 16
The square of 5 is 25
The square of 6 is 36
The square of 7 is 49
The square of 8 is 64
The square of 9 is 81
The square of 10 is 100

Our for loop is layed out just like it is in any other language; We specify our incrementing variable and initialize it, we set our stopping point, and we set our incrementing amount. You may have also noticed that we can do arithmetic in our awk script which is another powerful aspect of awk scripting.

Line numbers

A feature of awk worth noting is the line number specifier. Say we had a big block of output from a command and we only wanted to see a specific line number of the output, or even a specific range of line numbers. Lets try this on the df command:

df | awk 'NR==7, NR==11 {print NR, $0}'

7 /dev/loop0        225280   225280         0 100% /var/lib/snapd/snap/multipass/4458
8 /dev/loop2         33152    33152         0 100% /var/lib/snapd/snap/snapd/12159
9 /dev/loop3        225280   225280         0 100% /var/lib/snapd/snap/multipass/4861
10 /dev/loop5         56832    56832         0 100% /var/lib/snapd/snap/core18/2066
11 /dev/loop4         56832    56832         0 100% /var/lib/snapd/snap/core18/2074

NR is what we use in awk to signify line number. Above you can see we were able to grab lines 7-11 using NR and print both the line number and the line itself. Of course if we didn't want to print the line number we could just drop NR from our print statement:

df | awk 'NR==7, NR==11 {print $0}'

/dev/loop0        225280   225280         0 100% /var/lib/snapd/snap/multipass/4458
/dev/loop2         33152    33152         0 100% /var/lib/snapd/snap/snapd/12159
/dev/loop3        225280   225280         0 100% /var/lib/snapd/snap/multipass/4861
/dev/loop5         56832    56832         0 100% /var/lib/snapd/snap/core18/2066
/dev/loop4         56832    56832         0 100% /var/lib/snapd/snap/core18/2074

We can also use NR to get a line count of a file, lets pick on /etc/shells again:

awk 'END {print NR}' /etc/shells

We can also use the line number feature of awk to replace the linux tool head:

awk 'NR < 13' /etc/shells

# Pathnames of valid login shells.
# See shells(5) for details.

/bin/sh
/bin/bash
/usr/bin/git-shell
/usr/bin/fish
/bin/fish

We can see that we grabbed the first 13 lines of our shells file.