Awk
Separators
By default awk treats empty space as a field separator for columns but you
can specify any character you like. Lets play around with the contents of our
passwd file:
cat /etc/passwd
root:x:0:0::/root:/bin/fish bin:x:1:1::/:/usr/bin/nologin daemon:x:2:2::/:/usr/bin/nologin mail:x:8:12::/var/spool/mail:/usr/bin/nologin ftp:x:14:11::/srv/ftp:/usr/bin/nologin http:x:33:33::/srv/http:/usr/bin/nologin nobody:x:65534:65534:Nobody:/:/usr/bin/nologin dbus:x:81:81:System Message Bus:/:/usr/bin/nologin systemd-journal-remote:x:982:982:systemd Journal Remote:/:/usr/bin/nologin systemd-network:x:981:981:systemd Network Management:/:/usr/bin/nologin systemd-resolve:x:980:980:systemd Resolver:/:/usr/bin/nologin systemd-timesync:x:979:979:systemd Time Synchronization:/:/usr/bin/nologin systemd-coredump:x:978:978:systemd Core Dumper:/:/usr/bin/nologin uuidd:x:68:68::/:/usr/bin/nologin avahi:x:977:977:Avahi mDNS/DNS-SD daemon:/:/usr/bin/nologin colord:x:976:976:Color management daemon:/var/lib/colord:/usr/bin/nologin dhcpcd:x:975:975:dhcpcd privilege separation:/:/usr/bin/nologin git:x:974:974:git daemon user:/:/usr/bin/git-shell lightdm:x:973:973:Light Display Manager:/var/lib/lightdm:/usr/bin/nologin polkitd:x:102:102:PolicyKit daemon:/:/usr/bin/nologin epost:x:1000:1000::/home/epost:/bin/fish brltty:x:970:970:Braille Device Daemon:/var/lib/brltty:/usr/bin/nologin dnsmasq:x:969:969:dnsmasq daemon:/:/usr/bin/nologin nvidia-persistenced:x:143:143:NVIDIA Persistence Daemon:/:/usr/bin/nologin rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/nologin usbmux:x:140:140:usbmux user:/:/usr/bin/nologin systemd-oom:x:966:966:systemd Userspace OOM Killer:/:/usr/bin/nologin
Here we can see the file is clearly split into columns, however this time
columns are separated by :. Lets pass this into awk now:
# -F is used to specify our field separator cat /etc/passwd | awk -F ":" '{print $1}'
root bin daemon mail ftp http nobody dbus systemd-journal-remote systemd-network systemd-resolve systemd-timesync systemd-coredump uuidd avahi colord dhcpcd git lightdm polkitd epost brltty dnsmasq nvidia-persistenced rtkit usbmux systemd-oom
This will give us a list of all the users on our system. Lets say now that we
wanted to not only grab all the users on our system but we wanted to know each
user's home directory and default shell. We can simply grab all of those
columns with awk like we have before but what if we wanted our output to be
formatted pretty? With awk we not only specify which field separator we want
to look for in our input but we can specify an output field separator:
# You will notice this time that we aren't piping the output of cat into awk # but rather just telling awk which file we want to run this on. This is just # another way to use awk and I wanted to show it off. awk 'BEGIN{FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd
root-/root-/bin/fish bin-/-/usr/bin/nologin daemon-/-/usr/bin/nologin mail-/var/spool/mail-/usr/bin/nologin ftp-/srv/ftp-/usr/bin/nologin http-/srv/http-/usr/bin/nologin nobody-/-/usr/bin/nologin dbus-/-/usr/bin/nologin systemd-journal-remote-/-/usr/bin/nologin systemd-network-/-/usr/bin/nologin systemd-resolve-/-/usr/bin/nologin systemd-timesync-/-/usr/bin/nologin systemd-coredump-/-/usr/bin/nologin uuidd-/-/usr/bin/nologin avahi-/-/usr/bin/nologin colord-/var/lib/colord-/usr/bin/nologin dhcpcd-/-/usr/bin/nologin git-/-/usr/bin/git-shell lightdm-/var/lib/lightdm-/usr/bin/nologin polkitd-/-/usr/bin/nologin epost-/home/epost-/bin/fish brltty-/var/lib/brltty-/usr/bin/nologin dnsmasq-/-/usr/bin/nologin nvidia-persistenced-/-/usr/bin/nologin rtkit-/proc-/usr/bin/nologin usbmux-/-/usr/bin/nologin systemd-oom-/-/usr/bin/nologin
You will see that now we have the three fields we wanted and they are being
separated by a - like we specified in our command.
String parsing
For our next challenge lets say we wanted to know each of the shells installed
on our system. We can just view whats in /etc/shells:
cat /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/sh /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
This will work but what if we wanted just the name of the shells themselves.
With awk we can print just the last column of our supplied text with $NF:
awk -F "/" '{print $NF}' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. sh bash git-shell fish fish
This is almost what we wanted but you will see that the first few lines of the
shells file also stuck around since it didn't use our / separator. Lets talk
about how we can tell awk exactly what kind of text we want to look for from
our file. Inside of our single quotes in our awk command we can do more than
just specify what we want to print. In fact anything inside our single quotes
is actually our awk script if you want to think about it that way. Earlier
in the separators section you saw that told awk what we wanted our input and
output field separators to be; this was done inside the single quotes of our
awk command. Now lets tell awk what type of line we want to grab for our
shells file. We can specify any search pattern we want to look for inside of
/ /:
# awk uses regex inside of the '/ /' to define what it is searching for. # For more information on regex see my regex guide. awk -F "/" '/^\// {print $NF}' /etc/shells
sh bash git-shell fish fish
We used regex to define that we only wanted to look for lines that started
with a /. Now lets just pipe the output of our awk command into uniq to
remove the duplicate shells, and lets pipe that into sort so they are sorted
alphabetically:
awk -F "/" '/^\// {print $NF}' /etc/shells | uniq | sort
bash fish git-shell sh
This time lets search our bashrc for any lines starting a b or a c:
awk '$1 ~ /^[b,c]/ {print $0}' ~/.bashrc
Scripting
One of the things that makes awk so powerful is that it in itself is a
scripting language. What do I mean by that? Lets think of an example, we will
be picking on the shells file again. Lets say we only wanted to print lines
that are over 8 characters long:
awk 'length($0) > 8' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
We also have if statements available to us:
# ps -ef prints all of the resources running on our machine ps -ef | awk '{ if($NF == "/bin/fish") print $0 }'
epost 5675 5674 0 06:11 pts/0 00:00:00 /bin/fish epost 7201 7200 0 06:30 pts/1 00:00:01 /bin/fish
We used a simple if statement to see if the last column ($NF) was equal to
/bin/fish and if so we printed the whole line ($0).
We also have for loops available to us:
awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'
The square of 1 is 1 The square of 2 is 4 The square of 3 is 9 The square of 4 is 16 The square of 5 is 25 The square of 6 is 36 The square of 7 is 49 The square of 8 is 64 The square of 9 is 81 The square of 10 is 100
Our for loop is layed out just like it is in any other language; We specify
our incrementing variable and initialize it, we set our stopping point, and we
set our incrementing amount. You may have also noticed that we can do
arithmetic in our awk script which is another powerful aspect of awk
scripting.
Line numbers
A feature of awk worth noting is the line number specifier. Say we had a big
block of output from a command and we only wanted to see a specific line
number of the output, or even a specific range of line numbers. Lets try this
on the df command:
df | awk 'NR==7, NR==11 {print NR, $0}'
7 /dev/loop0 225280 225280 0 100% /var/lib/snapd/snap/multipass/4458 8 /dev/loop2 33152 33152 0 100% /var/lib/snapd/snap/snapd/12159 9 /dev/loop3 225280 225280 0 100% /var/lib/snapd/snap/multipass/4861 10 /dev/loop5 56832 56832 0 100% /var/lib/snapd/snap/core18/2066 11 /dev/loop4 56832 56832 0 100% /var/lib/snapd/snap/core18/2074
NR is what we use in awk to signify line number. Above you can see we were
able to grab lines 7-11 using NR and print both the line number and the line
itself. Of course if we didn't want to print the line number we could just
drop NR from our print statement:
df | awk 'NR==7, NR==11 {print $0}'
/dev/loop0 225280 225280 0 100% /var/lib/snapd/snap/multipass/4458 /dev/loop2 33152 33152 0 100% /var/lib/snapd/snap/snapd/12159 /dev/loop3 225280 225280 0 100% /var/lib/snapd/snap/multipass/4861 /dev/loop5 56832 56832 0 100% /var/lib/snapd/snap/core18/2066 /dev/loop4 56832 56832 0 100% /var/lib/snapd/snap/core18/2074
We can also use NR to get a line count of a file, lets pick on /etc/shells
again:
awk 'END {print NR}' /etc/shells
8
We can also use the line number feature of awk to replace the linux tool
head:
awk 'NR < 13' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/sh /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
We can see that we grabbed the first 13 lines of our shells file.