How to manipulate strings in bash

Last updated on March 7, 2021 by Dan Nanni

Without explicit support for variable types, all bash variables are by default treated as character strings. Therefore more often than not, you need to manipulate string variables in various fashions while working on your bash script. Unless you are well-versed in this department, you may end up constantly coming back to Google and searching for tips and examples to handle your specific use case.

In the spirit of saving your time and thus boosting your productivity in shell scripting, I compile in this tutorial a comprehensive list of useful string manipulation tips for bash scripting. Where possible I will try to use bash's built-in mechanisms (e.g., parameter expansion) to manipulate strings instead of invoking external tools such as awk, sed or grep.

If you find any missing tips, feel free to suggest it in the comment. I will be happy to incorporate it in the article.

String Variables in Bash

Foremost, it is worth noting that when you are working with string variables, it is good practice to wrap double quotes around them (e.g., "$var1"). That is because bash can apply word splitting while expanding a variable if the variable is not quoted. If the string stored in an unquoted variable contains whitespaces, the string may be split by whitespaces and treated as multiple strings, depending on contexts (e.g., when the string variable is used as an argument to a function).

Concatenate Two Strings in Bash

In bash, there is no dedicated operator that concatenates two strings. To combine two string variables in bash, you can simply put one variable after another without any special operator in between. If you want to concatenate a string variable with a string literal, you need to enclose the variable in curly braces {} to distinguish the variable name from the subsequent string literal. See the following example for string concatenation in bash.

base=http://www.abc.com
api="/devices/"
deviceid=1024
url="$base$api$deviceid"           # concatenate string variables
url2="$base$api${deviceid}/ports"  # concatenate a string variable with a string literal

echo "URL: $url"
echo "URL2: $url2"

URL: http://www.abc.com/devices/1024
URL2: http://www.abc.com/devices/1024/ports

Append a String to a Variable in Bash

This scenario is similar to string concatenation. Thus you can use the same method described above to add a string to an existing variable. Another (easier) way is to use a built-in operator +=. When used with string operands, the += operator appends a string to a variable, as illustrated below.

var1="Hello"
var2=" !"
var1+=" World"  # append a string literal
var1+="$var2"   # append a string variable

echo $var1

Hello World !

Compare Two Strings in Bash

You can use '==' or '!=' operators to check equality or inequality of two strings (or string variables) in bash. If you are using single brackets in the conditional, you can also use '=' as an equality operator. But the '=' operator is not allowed inside double round brackets.

# The following formats are all valid.
if [ "$var1" == "apple" ]; then
    echo "This is good"
fi
if [ "$var1" = "apple" ]; then
    echo "This is good"
fi
if [ "$var1" != "$var2" ]; then
    echo "This is bad"
fi
if (( "$var1" == "apple" )); then
    echo "This is okay"
fi

Find the Length of a String in Bash

There are several ways to count the length of a string in bash. Of course you can use wc or awk to get string length information, but you don't need an external tool for a simple task like this. The following example shows how to find string length using bash's built-in mechanism.

my_var="This is my example string"
len=${#my_var}
len=$(expr length "$my_var")

Remove a Trailing Newline Character from a String in Bash

If you want to remove a trailing newline or carriage return character from a string, you can use the bash's parameter expansion in the following form.

${string%$var}

This expression implies that if the "string" contains a trailing character stored in "var", the result of the expression will become the "string" without the character. For example:

# input string with a trailing newline character
input_line=$'This is my example line\n'
# define a trailing character.  For carriage return, replace it with $'\r' 
character=$'\n'

echo -e "($input_line)"
# remove a trailing newline character
input_line=${input_line%$character}
echo -e "($input_line)"

(This is my example line
)
(This is my example line)

Trim Leading/Trailing Whitespaces from a String in Bash

If you want to remove whitespaces at the beginning or at the end of a string (also known as leading/trailing whitespaces) from a string, you can use sed command.

my_str="   This is my example string    "

# original string with leading/trailing whitespaces
echo -e "($my_str)"

# trim leading whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/^[[:space:]]*//")
echo -e "($my_str)"

# trim trailing whitespaces in a string
my_str=$(echo "$my_str" | sed -e "s/[[:space:]]*$//")
echo -e "($my_str)"

(   This is my example string    )
(This is my example string    )      ← leading whitespaces removed
(This is my example string)          ← trailing whitespaces removed

If you want to stick with bash's built-in mechanisms, the following bash function can get the job done.

trim() {
    local var="$*"
    # remove leading whitespace characters
    var="${var#"${var%%[![:space:]]*}"}"
    # remove trailing whitespace characters
    var="${var%"${var##*[![:space:]]}"}"   
    echo "$var"
}

my_str="   This is my example string    "
echo "($my_str)"

my_str=$(trim $my_str)
echo "($my_str)"

(   This is my example string    )
(This is my example string)

Remove a fixed Prefix, Suffix or any Substring from a String in Bash

This is a generalization of the previous whitespace/newline character removal. Again, you can use the sed command to remove any substring from a string. The following example illustrates how you can remove a pre-defined prefix/suffix, or remove all occurrences of a substring from a string variable. One thing to note is that if the substring contains any special character (e.g., '[' and ']' in this example), the character needs to be escaped with '\' in sed.

my_str="[DEBUG] Device0 is not a valid input [EOL]"
prefix="\[DEBUG\]"
suffix="\[EOL\]"
substring="valid"
echo "$my_str"

# remove a prefix from a string
my_str=$(echo "$my_str" | sed -e "s/^$prefix//")
echo "$my_str"

# remove a suffix from a string
my_str=$(echo "$my_str" | sed -e "s/$suffix$//")
echo "$my_str"

# remove a substring from a string
my_str=$(echo "$my_str" | sed -e "s/$substring//")
echo "$my_str"

[DEBUG] Device0 is not a valid input [EOL]
 Device0 is not a valid input [EOL]
 Device0 is not a valid input 
 Device0 is not a  input

Another way to remove a prefix or a suffix from a string is to use the bash's built-in pattern matching mechanism. In this case, the special character does not need to be escaped.

my_str="[DEBUG] Device0 is not a valid input [EOL]"
prefix="[DEBUG]"
suffix="[EOL]"

# remove a prefix string
my_str=${my_str#"$prefix"}
echo "$my_str"

# remove a suffix string
my_str=${my_str%"$suffix"}
echo "$my_str"

 Device0 is not a valid input [EOL]
 Device0 is not a valid input

Check if a String Starts with a Substring in Bash

If you want to check whether or not a given string variable starts with a prefix, there are multiple ways to do it, as illustrated below.

var1="This is my text"
prefix="This"

case $var1 in $prefix*)
    echo "1. \"$var1\" starts with \"$prefix\""
esac

if [[ $var1 =~ ^$prefix ]]; then
    echo "2. \"$var1\" starts with \"$prefix\""
fi

if [[ $var1 == $prefix* ]]; then
    echo "3. \"$var1\" starts with \"$prefix\""
fi

if [[ $var1 == This* ]]; then
    echo "4. \"$var1\" starts with \"This\""
fi

1. "This is my text" starts with "This"
2. "This is my text" starts with "This"
3. "This is my text" starts with "This"
4. "This is my text" starts with "This"

Note that the first approach is the most portable, POSIX-compliant one (which works not just for bash, but also for other shells).

Check if a String Ends with a Substring in Bash

Similarly, if you want to check whether or not a string ends with a specific suffix, you can try one of these methods shown below.

var1="This is my text"
suffix="text"

case $var1 in *$suffix)
    echo "1. \"$var1\" ends with \"$suffix\""
esac
   
if [[ $var1 =~ $suffix$ ]]; then
    echo "2. \"$var1\" ends with \"$suffix\""
fi

if [[ $var1 == *$suffix ]]; then
    echo "3. \"$var1\" ends with \"$suffix\""
fi

if [[ $var1 == *text ]]; then
    echo "4. \"$var1\" ends with \"text\""
fi

1. "This is my text" ends with "text"
2. "This is my text" ends with "text"
3. "This is my text" ends with "text"
4. "This is my text" ends with "text"

Check if a String Matches a Regular Expression in Bash

In bash, you can check if a string contains a substring that is matched by a regular expression. As a special case, it's even easier to check if a string contains a fixed substring.

pattern="length\s+[0-9]+"   # regular expression for a substring
var1="This data has length 1000"
var2="This data is not valid"
 
if [[ $var1 =~ $pattern ]]; then
    echo "$var1: length found"
else
    echo "$var1: length not found"
fi

if [[ $var2 =~ $pattern ]]; then
    echo "$var2: length found"
else
    echo "$var2: length not found"
fi

This data has length 1000: length found
This data is not valid: length not found

Split a String in Bash

When you need to split a string in bash, you can use bash's built-in read command. This command reads a single line of string from stdin, and splits the string on a delimiter. The split elements are then stored in either an array or separate variables supplied with the read command. The default delimiter is whitespace characters (' ', '\t', '\r', '\n'). If you want to split a string on a custom delimiter, you can specify the delimiter in IFS variable before calling read.

# strings to split
var1="Harry Samantha Bart   Amy"
var2="green:orange:black:purple"

# split a string by one or more whitespaces, and store the result in an array
read -a my_array <<< $var1

# iterate the array to access individual split words
for elem in "${my_array[@]}"; do
    echo $elem
done

echo "----------"
# split a string by a custom delimter
IFS=':' read -a my_array2 <<< $var2
for elem in "${my_array2[@]}"; do
    echo $elem
done

Harry
Samantha
Bart
Amy
----------
green
orange
black
purple

Replace a String with Another String in Bash

If you want to replace a string with another string in bash, you can use the bash's parameter expansion feature.

var1="This is a very bad guide"
substring="bad"         # string to be replaced
replacement="useful"    # substitute string

var2="${var1/$substring/$replacement}"
echo $var2

This is a very useful guide

Remove All Text After a Character in Bash

Let's say you want to remove from a string all text that appears after a specific character (e.g., a delimeter character), along with the character itself. In this case you can use the bash's parameter expansion in the following format.

${string%word}

The above expression means that if the "word" matches a trailing portion of the "string", the result of this expression will become the "string" without the matched pattern. For example:

url="http://www.mysite.com:50001"
delimeter=":"

# remove all text starting from the delimeter
result=${url%$delimeter*}
echo $result

http://www.mysite.com

Remove All Text Before a Character in Bash

Let's say you want to delete everything preceding and including a specific character (e.g., a delimeter character). The following form of bash's parameter expansion can get it done.

${string##*word}

The above expression means that if the "string" contains a text ending with "word", the result of this expression will become the "string" without the (longest) matched pattern. The longest matching pattern means that if the "string" contains multiple instances of "word", the matched pattern should contain all of them. For example:

# remove all text preceding and including the delimeter
url="http://www.mysite.com:50001"
delimeter=":"

result=${url##*$delimeter}
echo $result

In the above example, the original string contains two instances of the delimeter ':'. Since we use the longest matching pattern, the matched content is "http://www.mysite.com:", not "http:" and hence the result ("50001") is what remains after the matched content is removed.

If you find this tutorial helpful, I recommend you check out the series of `bash` shell scripting tutorials provided by Xmodulo.

Support Xmodulo

This website is made possible by minimal ads and your gracious donation via PayPal or credit card

Please note that this article is published by Xmodulo.com under a Creative Commons Attribution-ShareAlike 3.0 Unported License. If you would like to use the whole or any part of this article, you need to cite this web page at Xmodulo.com as the original source.