Awk
Separators
By default awk
treats empty space as a field separator for columns but you
can specify any character you like. Lets play around with the contents of our
passwd file:
cat /etc/passwd
root:x:0:0::/root:/bin/fish bin:x:1:1::/:/usr/bin/nologin daemon:x:2:2::/:/usr/bin/nologin mail:x:8:12::/var/spool/mail:/usr/bin/nologin ftp:x:14:11::/srv/ftp:/usr/bin/nologin http:x:33:33::/srv/http:/usr/bin/nologin nobody:x:65534:65534:Nobody:/:/usr/bin/nologin dbus:x:81:81:System Message Bus:/:/usr/bin/nologin systemd-journal-remote:x:982:982:systemd Journal Remote:/:/usr/bin/nologin systemd-network:x:981:981:systemd Network Management:/:/usr/bin/nologin systemd-resolve:x:980:980:systemd Resolver:/:/usr/bin/nologin systemd-timesync:x:979:979:systemd Time Synchronization:/:/usr/bin/nologin systemd-coredump:x:978:978:systemd Core Dumper:/:/usr/bin/nologin uuidd:x:68:68::/:/usr/bin/nologin avahi:x:977:977:Avahi mDNS/DNS-SD daemon:/:/usr/bin/nologin colord:x:976:976:Color management daemon:/var/lib/colord:/usr/bin/nologin dhcpcd:x:975:975:dhcpcd privilege separation:/:/usr/bin/nologin git:x:974:974:git daemon user:/:/usr/bin/git-shell lightdm:x:973:973:Light Display Manager:/var/lib/lightdm:/usr/bin/nologin polkitd:x:102:102:PolicyKit daemon:/:/usr/bin/nologin epost:x:1000:1000::/home/epost:/bin/fish brltty:x:970:970:Braille Device Daemon:/var/lib/brltty:/usr/bin/nologin dnsmasq:x:969:969:dnsmasq daemon:/:/usr/bin/nologin nvidia-persistenced:x:143:143:NVIDIA Persistence Daemon:/:/usr/bin/nologin rtkit:x:133:133:RealtimeKit:/proc:/usr/bin/nologin usbmux:x:140:140:usbmux user:/:/usr/bin/nologin systemd-oom:x:966:966:systemd Userspace OOM Killer:/:/usr/bin/nologin
Here we can see the file is clearly split into columns, however this time
columns are separated by :
. Lets pass this into awk
now:
# -F is used to specify our field separator cat /etc/passwd | awk -F ":" '{print $1}'
root bin daemon mail ftp http nobody dbus systemd-journal-remote systemd-network systemd-resolve systemd-timesync systemd-coredump uuidd avahi colord dhcpcd git lightdm polkitd epost brltty dnsmasq nvidia-persistenced rtkit usbmux systemd-oom
This will give us a list of all the users on our system. Lets say now that we
wanted to not only grab all the users on our system but we wanted to know each
user's home directory and default shell. We can simply grab all of those
columns with awk
like we have before but what if we wanted our output to be
formatted pretty? With awk
we not only specify which field separator we want
to look for in our input but we can specify an output field separator:
# You will notice this time that we aren't piping the output of cat into awk # but rather just telling awk which file we want to run this on. This is just # another way to use awk and I wanted to show it off. awk 'BEGIN{FS=":"; OFS="-"} {print $1, $6, $7}' /etc/passwd
root-/root-/bin/fish bin-/-/usr/bin/nologin daemon-/-/usr/bin/nologin mail-/var/spool/mail-/usr/bin/nologin ftp-/srv/ftp-/usr/bin/nologin http-/srv/http-/usr/bin/nologin nobody-/-/usr/bin/nologin dbus-/-/usr/bin/nologin systemd-journal-remote-/-/usr/bin/nologin systemd-network-/-/usr/bin/nologin systemd-resolve-/-/usr/bin/nologin systemd-timesync-/-/usr/bin/nologin systemd-coredump-/-/usr/bin/nologin uuidd-/-/usr/bin/nologin avahi-/-/usr/bin/nologin colord-/var/lib/colord-/usr/bin/nologin dhcpcd-/-/usr/bin/nologin git-/-/usr/bin/git-shell lightdm-/var/lib/lightdm-/usr/bin/nologin polkitd-/-/usr/bin/nologin epost-/home/epost-/bin/fish brltty-/var/lib/brltty-/usr/bin/nologin dnsmasq-/-/usr/bin/nologin nvidia-persistenced-/-/usr/bin/nologin rtkit-/proc-/usr/bin/nologin usbmux-/-/usr/bin/nologin systemd-oom-/-/usr/bin/nologin
You will see that now we have the three fields we wanted and they are being
separated by a -
like we specified in our command.
String parsing
For our next challenge lets say we wanted to know each of the shells installed
on our system. We can just view whats in /etc/shells
:
cat /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/sh /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
This will work but what if we wanted just the name of the shells themselves.
With awk
we can print just the last column of our supplied text with $NF
:
awk -F "/" '{print $NF}' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. sh bash git-shell fish fish
This is almost what we wanted but you will see that the first few lines of the
shells file also stuck around since it didn't use our /
separator. Lets talk
about how we can tell awk
exactly what kind of text we want to look for from
our file. Inside of our single quotes in our awk command we can do more than
just specify what we want to print. In fact anything inside our single quotes
is actually our awk
script if you want to think about it that way. Earlier
in the separators section you saw that told awk
what we wanted our input and
output field separators to be; this was done inside the single quotes of our
awk
command. Now lets tell awk
what type of line we want to grab for our
shells file. We can specify any search pattern we want to look for inside of
/ /
:
# awk uses regex inside of the '/ /' to define what it is searching for. # For more information on regex see my regex guide. awk -F "/" '/^\// {print $NF}' /etc/shells
sh bash git-shell fish fish
We used regex to define that we only wanted to look for lines that started
with a /
. Now lets just pipe the output of our awk
command into uniq
to
remove the duplicate shells, and lets pipe that into sort
so they are sorted
alphabetically:
awk -F "/" '/^\// {print $NF}' /etc/shells | uniq | sort
bash fish git-shell sh
This time lets search our bashrc for any lines starting a b
or a c
:
awk '$1 ~ /^[b,c]/ {print $0}' ~/.bashrc
Scripting
One of the things that makes awk
so powerful is that it in itself is a
scripting language. What do I mean by that? Lets think of an example, we will
be picking on the shells file again. Lets say we only wanted to print lines
that are over 8 characters long:
awk 'length($0) > 8' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
We also have if statements available to us:
# ps -ef prints all of the resources running on our machine ps -ef | awk '{ if($NF == "/bin/fish") print $0 }'
epost 5675 5674 0 06:11 pts/0 00:00:00 /bin/fish epost 7201 7200 0 06:30 pts/1 00:00:01 /bin/fish
We used a simple if statement to see if the last column ($NF
) was equal to
/bin/fish
and if so we printed the whole line ($0
).
We also have for loops available to us:
awk 'BEGIN{for(i=1; i<=10; i++) print "The square of", i, "is", i*i;}'
The square of 1 is 1 The square of 2 is 4 The square of 3 is 9 The square of 4 is 16 The square of 5 is 25 The square of 6 is 36 The square of 7 is 49 The square of 8 is 64 The square of 9 is 81 The square of 10 is 100
Our for loop is layed out just like it is in any other language; We specify
our incrementing variable and initialize it, we set our stopping point, and we
set our incrementing amount. You may have also noticed that we can do
arithmetic in our awk
script which is another powerful aspect of awk
scripting.
Line numbers
A feature of awk
worth noting is the line number specifier. Say we had a big
block of output from a command and we only wanted to see a specific line
number of the output, or even a specific range of line numbers. Lets try this
on the df
command:
df | awk 'NR==7, NR==11 {print NR, $0}'
7 /dev/loop0 225280 225280 0 100% /var/lib/snapd/snap/multipass/4458 8 /dev/loop2 33152 33152 0 100% /var/lib/snapd/snap/snapd/12159 9 /dev/loop3 225280 225280 0 100% /var/lib/snapd/snap/multipass/4861 10 /dev/loop5 56832 56832 0 100% /var/lib/snapd/snap/core18/2066 11 /dev/loop4 56832 56832 0 100% /var/lib/snapd/snap/core18/2074
NR
is what we use in awk
to signify line number. Above you can see we were
able to grab lines 7-11 using NR
and print both the line number and the line
itself. Of course if we didn't want to print the line number we could just
drop NR
from our print statement:
df | awk 'NR==7, NR==11 {print $0}'
/dev/loop0 225280 225280 0 100% /var/lib/snapd/snap/multipass/4458 /dev/loop2 33152 33152 0 100% /var/lib/snapd/snap/snapd/12159 /dev/loop3 225280 225280 0 100% /var/lib/snapd/snap/multipass/4861 /dev/loop5 56832 56832 0 100% /var/lib/snapd/snap/core18/2066 /dev/loop4 56832 56832 0 100% /var/lib/snapd/snap/core18/2074
We can also use NR
to get a line count of a file, lets pick on /etc/shells
again:
awk 'END {print NR}' /etc/shells
8
We can also use the line number feature of awk
to replace the linux tool
head
:
awk 'NR < 13' /etc/shells
# Pathnames of valid login shells. # See shells(5) for details. /bin/sh /bin/bash /usr/bin/git-shell /usr/bin/fish /bin/fish
We can see that we grabbed the first 13 lines of our shells file.