The awk Command in Linux

Photo by Mel Poole on Unsplash

The awk Command in Linux

awk is a powerful programming language used for text processing and manipulation in Unix/Linux environments. It's particularly well-suited for tasks involving structured text files, especially when those files are data files or CSV files. It gets its name from the initials of its creators: Aho, Weinberger, and Kernighan.

Image source: Wikipedia

How awk Works

When you run awk, you specify an awk program that tells awk what to do. The program consists of a series of rules. Each rule specifies one pattern to search for and one action to perform upon finding the pattern.

Syntactically, a rule consists of a pattern followed by an action. The action is enclosed in braces to separate it from the pattern. Newlines usually separate rules. Therefore, a awk program looks like this:

awk [options] 'program' input-file(s)

OR

awk [options] 'pattern { action }' input-file(s)
  • pattern: Specifies when the action should be performed. If omitted, the action is applied to every line.

  • action: What to do when a line matches the pattern. Actions are enclosed in braces {}.

Single quotes around program makes the shell not to interpret any awk characters as special shell characters. The quotes also cause the shell to treat all of program as a single argument for awk, and allow program to be more than one line long.

Benefits of AWK

  1. Supports complex pattern-matching and processing.

  2. Designed for efficient text processing on both small and large files.

  3. Easy to write and understand one-liners for basic tasks.

  4. Changing data files.

  5. Producing formatted reports.

  6. Available on all Unix-like systems without the need for installation or setup.

Variables in AWK

Variables in AWK play a crucial role in processing text and data. They are used to store temporary data, manipulate fields, control the flow of the program, and customize output. AWK variables can be user-defined or built-in, with the latter providing access to various useful pieces of information or functionality. Some of the most commonly used built-in variables are:

VariableDescription
FS (Field Separator)Controls how fields in a record (line) are separated. The default is whitespace. You can change it to parse CSV or other formats.
OFS (Output Field Separator)Specifies the separator to use when printing multiple fields with print.
RS (Record Separator)Determines how records are separated in the input data. By default, it's a newline character.So if you do not change it, a record is one line of the input file.
ORS (Output Record Separator)The separator used when printing output records. By default, it's a newline.
NF (Number of Fields)Contains the number of fields in the current record.
NR*(Number of Records)*The total number of input records processed so far. Working with Text Files.
FILENAMEThe name of the current input file.
$0Represents the entire current record.
$1, $2, ..., $nRepresents the first, second, ..., nth field in the current record.
  • BEGIN and END - are not variables but special pattern blocks. The BEGIN block is executed before any input is read, and the END block is executed after all input has been processed. These blocks are useful for initialization and summary tasks, respectively.

  • Arrays - AWK supports associative arrays, which can be indexed by string or number. Arrays are useful for collecting and organizing data dynamically during execution.

Hands-on Exercise Overview

The main goal of this hands-on exercise is to learn how to use awk utility to manipulate data.

The input file for the examples provided below is the mail-list.txt file, which represents a list of peoples’ names together with their email addresses and information about those people. Each record contains the name of a person, his/her phone number, his/her email address, and a code for his/her relationship with the author of the list. The columns are aligned using spaces. An ‘A’ in the last column means the person is an acquaintance. An ‘F’ in the last column means the person is a friend. An ‘R’ means that the person is a relative:

Hands-on Exercise

  1. Print the 1st and 3rd columns:

     awk '{ print $1 "\t" $3}' mail-list.txt
    

  2. Print lines that match a certain pattern. Search the input file mail-list.txt for the character string li:

     awk '/li/ { print $0 }' mail-list
    

    When lines containing ‘li’ are found, they are printed because print $0 means print the current line.

    The slashes indicate that ‘li’ is the pattern to search for. This type of pattern is called a regular expression. The pattern is allowed to match parts of words.

  3. Print columns that match a specific pattern:

     awk '/.edu/ { print $3 }' mail-list.txt
    

    When awk locates a pattern match, the command will execute the whole record. You can change the default by issuing an instruction to display only certain fields.

  4. Print every line that is longer than 55 characters:

     awk 'length($0) > 55' file
    

    awk has a built-in length function that returns the length of the string. From the command $0 variable stores the entire line and in the absence of a body block, the default action is taken, i.e., the print action. Therefore, in our mail-list.txt file, if a line has more than 55 characters, the comparison results to true, and the line is printed as shown below.

  5. Print the total number of bytes used by mail-list.txt:

     ls -l mail-list | awk '{ x += $5 }
                        END { print "total bytes: " x }'
    

  6. Count the lines in mail-list.txt:

     awk 'END { print NR }' mail-list.txt
    
    • END specifies that the action should be performed after all lines are processed.

    • NR is a built-in variable that keeps track of the number of records (lines) read.

  7. Print the even-numbered lines in the mail-list.txt file:

     awk 'NR % 2 == 0' mail-list.txt
    

    • If you used the expression ‘NR % 2 == 1’ instead, the program would print the odd-numbered lines.

References:

  1. Linux Crash Course - awk

  2. A User’s Guide for GNU Awk

  3. Getting Started with awk - O'REILLY

  4. AWK Command in Linux with Examples (phoenixNAP)

  5. Getting Started With AWK Command [Beginner's Guide - Linux Handbook]

  6. AWK command in Linux/Unix - Digital Ocean

  7. Advance your awk skills with two easy tutorials - opensource.com

  8. A practical guide to learning awk - opensource.com