How to read column data from a text file in a bash shell script

Last updated on February 23, 2021 by Dan Nanni

One common task in day-to-day shell scripting jobs is to read data line by line from a file, parse the data, and process it. The input file can be either a regular text file (e.g., logs or config files) where each line contains multiple fields separated by whitespaces, or a CSV file that is formatted with delimiter-separated values in each row. In bash, you can easily read columns from a file and store them into separate variables for further processing. In this tutorial, let me demonstrate with examples how you can write a shell script that reads columns into variables in bash.

Method One: Read Space-Separated Columns in Bash

Let's consider the following text file (input.txt) as an example input.

1   sophia      22      'love pie'
2   charlotte   25     'plum cake'
3   elizabeth   19      'monkey baby'
4   sophia      30      'sleeping beauty'
5   avery       29      'woofy'
6   wendy       28      'smarty pants'

To read column data from a file in bash, you can use read, a built-in command that reads a single line from the standard input or a file descriptor. When combined with for/while loop, the read command can read the content of a file line by line, until end-of-file is reached. If each line contains multiple fields, read can read individual fields and store them into supplied variables. By default, read recognizes whitespaces as a separator for different fields.

The following while loop reads four columns in each line of the input file, and store them in four separate variables. If the number of columns in a line is less than the number of supplied variables, the column values are stored in the left most variables, and any extra variables will be assigned an empty value.

while read index name age nickname; do
    echo "$index : $name, $age, $nickname"
done < "input.txt"
cat input.txt | while read index name age nickname; do
    echo "$index : $name, $age, $nickname"
done

Method Two: Read Columns from CSV File in Bash

If you are working with a CSV file which uses a non-whitespace character (e.g., "," or ";" or "|") as a delimeter for columns, you can easily read the columns using the same read command. In this case, though, you need to specify the chosen delimeter in IFS variable. The IFS variable (short for "Input Field Separator") is a special bash variable that indicates the character or characters that separate fields.

Let's consider the following CSV file (employee.csv).

John,Doe,500 Mountain Ave.,Riverside, NJ, 08075
Jack,McGinnis,220 Main St.,Philadelphia, PA,09119
"Anne","Hoffman",120 Jefferson St.,Chatham, NJ,08070
Stephen,King,"7452 Terrace Rd",New York,NY, 91234
Dan,Nann,,San Francisco, CA, 00298

The following while loop can read the CSV file and store columns into the specified variables. As you can see, IFs=, before the read command instructs read to use , as a word splitter. Also note that we can check if a particular column is empty or not by using -z operator.

while IFS=, read first last address city state zipcode; do
    if [ -z "$address" ]; then
        echo "$first $last has no address"
    else
        echo "$first $last lives at $address"
    fi
done < "employee.csv"

This while loop will produce the following output.

John Doe lives at 500 Mountain Ave.
Jack McGinnis lives at 220 Main St.
"Anne" "Hoffman" lives at 120 Jefferson St.
Stephen King lives at "7452 Terrace Rd"
Dan Nann has no address

Method Three: Read Column Data with Regular Expression

A more general and thus more complicated scenario is when you have to read column data from less-structured files such as logs. A typical log file is not as structured as CSV files, and may not use a fixed delimiter character, nor use a fixed number of columns. Let's consider the following snippet of auth.log as an example.

Feb 21 21:42:51 ubuntu sudo: dan : TTY=pts/1 ; PWD=/home/dan/download/shc ; USER=root ; COMMAND=/usr/bin/apt-get install autotools-dev
Feb 21 21:42:51 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 21 21:42:52 ubuntu sudo: pam_unix(sudo:session): session closed for user root
Feb 21 22:40:20 ubuntu sudo:    alice : TTY=pts/1 ; PWD=/home/alice ; USER=root ; COMMAND=/usr/bin/apt remove nginx
Feb 21 22:40:20 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 21 22:40:25 ubuntu sudo: pam_unix(sudo:session): session closed for user root
Feb 21 22:41:57 ubuntu sudo:    alice : TTY=pts/1 ; PWD=/home/alice ; USER=root ; COMMAND=/bin/cp bin/dummy /usr/bin
Feb 21 22:41:57 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 21 22:41:57 ubuntu sudo: pam_unix(sudo:session): session closed for user root
Feb 22 10:50:43 ubuntu sudo: dan : TTY=pts/0 ; PWD=/home/dan/abc ; USER=root ; COMMAND=/usr/bin/vi /etc/hosts
Feb 22 10:50:43 ubuntu sudo: pam_unix(sudo:session): session opened for user root by (uid=0)
Feb 22 10:50:49 ubuntu sudo: pam_unix(sudo:session): session closed for user root
Feb 22 10:51:56 ubuntu sudo: dan : TTY=pts/0 ; PWD=/home/dan/abc ; USER=root ; COMMAND=/usr/bin/vi /etc/resolv.conf

In auth.log, let's say we want to extract the column data that is highlighted in red color. That is, we want to extract user login and the sudo command run by the user.

In this case, you can use the read command to read each line in entirety, and then extract necessary column data by using bash's built-in regular expression. The shell script below gets this job done. The necessary regular expression is stored in pattern, which matches two patterns; one for user login, and the other for a command entered. These two match results can be retrieved from a special bash array variable called BASH_REMATCH (${BASH_REMATCH[1]$ for the first match, and ${BASH_REMATCH[2]$ for the second match).

while read -r line; do
    pattern='ubuntu sudo:\s+([^[:space:]]+).*COMMAND=(.*)'
    if [[ $line =~ $pattern ]]; then
        echo "${BASH_REMATCH[1]} : ${BASH_REMATCH[2]}"
    fi
done < "auth.log"

This shell script will produce the following output.

dan : /usr/bin/apt-get install autotools-dev
alice : /usr/bin/apt remove nginx
alice : /bin/cp bin/dummy /usr/bin
dan : /usr/bin/vi /etc/hosts
dan : /usr/bin/vi /etc/resolv.conf

If you find this tutorial helpful, I recommend you check out the series of bash shell scripting tutorials provided by Xmodulo.

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.

Xmodulo © 2021 ‒ AboutWrite for UsFeed ‒ Powered by DigitalOcean