How to Automate Common Tasks with Shell Scripts

We’re Earthly.dev. We make building software simpler and therefore faster using containerization. This article is about how to automate common tasks with shell scripts. If you’re interested in a different approach to building and packaging software then check us out.

While performing various software development practices, it is common to repeat specific tasks. This repetition can lead to human error, which introduces the task to bugs and other security vulnerabilities. Also, doing repetitive tasks manually can be time-consuming and inefficient in software development. However, automation of tasks can help solve these problems.

In software development, automation involves using tools and scripts to execute repetitive tasks, including testing, monitoring, and analyzing logs that do not require human intervention. With automation, you don’t have to worry about human error or running multiple commands several times. Instead, a single line of code can execute a script to run the entire task for you.

Shell scripts are an excellent way to automate repetitive tasks. Shell scripts are programs written in a shell language such as bash, csh, or sh that can be executed from the command line. As a result of their flexibility and power, shell scripts allow developers to automate tasks according to their needs. Implementing changes to an existing script is also very easy, making it a fast and more effective tool for software development.

In this article, I will walk you through automating everyday tasks using bash scripts. You will learn various fundamental software development techniques, such as loops, variables, regular expressions, etc., and how they are incorporated into shell scripts.

The repository for the code used in this article can be found on GitHub.

Prerequisites

To fully utilize this article, you should have basic knowledge of software development and the command line interface. In particular, you should understand the following:

Bash commands while using the command line interface, such as navigating directories, creating, and editing files, etc.
Bash scripting basics, such as writing and executing scripts

Task Selection

Task

Developers can automate many typical software development tasks with shell scripts. These tasks range from simple file backups to complicated data processing and system maintenance. This article focuses on five tasks developers often perform and can be automated using shell scripts.

Common software development tasks include:

File backup: File backups preserve data and prevent data loss. In this task, you will learn how to automate backups with shell scripts and understand concepts such as variables, conditional statements, and loops.
Data processing: Software development requires data processing. It can take a lot of time and energy to process data manually. This task will teach you how to automate data processing using shell scripts. In addition, you will be introduced to a core programming concept such as functions.
Log Analysis: Log analysis is a crucial component of software development, and when dealing with large amounts of data, it can be challenging to do manually. This task demonstrates how shell scripts can automate log analysis and introduce concepts like regular expressions and command-line tools.
System maintenance: Performing backups, cleaning up disk space, and monitoring system resources are all time-consuming and tedious system maintenance duties. This task introduces concepts such as user input, error handling, and how to automate system maintenance tasks with shell scripts.
Local Application Deployment with Docker: Deploying software to production environments can be complex and error-prone when done manually. This task introduces concepts such as version control, environment variables, and containerization to automate deployment using shell scripts.

The tasks above were chosen because they are common procedures in software development and can be performed using shell scripts. By automating these tasks, developers can save time, reduce errors, and increase productivity. The following section is a step-by-step guide on automating each task using shell scripts.

Task 1: Automating File Backup

Backup

File backups prevent data loss due to unexpected incidents. Performing a backup manually can expose data to the possibility of human errors, such as the omission of important files, incorrect selection, inconsistent backup schedules, or failure to configure the backup system correctly. This can also be time-consuming, especially when you regularly back up large data volumes. Automating file backup can simplify this task.

Concepts Covered

In this task, you will learn and utilize the following concepts to achieve an automated file backup:

Variables: Variables refer to storage locations where you store data. You can reference and manipulate this data anywhere in the shell script.
Conditional statements: Conditional statements allow you to execute code blocks based on a specific condition.
Loops: With loops, you can execute a block of code several times while the condition remains true.

Step-by-Step Guide

The automated backup script will perform the following tasks:

Define two variables called source and destination. The location of the source and destination directories will be stored in these variables.
Check if a destination directory exists using a conditional statement.
Copy the file from the source to the destination directory.
Additionally, the script includes error handling that actively detects and promptly displays any issues encountered during the backup process. This approach ensures that errors are not overlooked, providing the user with an opportunity to address and resolve them. This error-handling mechanism increases the chances of successful backup execution, as users can take appropriate corrective actions before reattempting the backup.

To automate the backup of a file using a bash script, follow the steps below:

Create a file with the .sh extension in your terminal and open it using any text editor such as Vim or Nano. For example, you can call this file backup.sh:
```
   vim backup.sh
```
Define the source and destination variables:
```
   #!/bin/bash

   SRC_DIR=/path/to/source/directory
   DST_DIR=/path/to/backup/directory
```
The code assigns values to two variables SRC_DIR and DST_DIR. These variables are used to store the source directory and backup directory paths, respectively.

To ensure that the script works correctly, make sure to replace the placeholder paths /path/to/source/directory and /path/to/backup/directory with the appropriate paths to your source and backup directories.

Note: For your script, add either a relative or full path for the variables SRC_DIR and DST_DIR. If the script is in the same directory as the source and destination directory, use a relative path; otherwise, use a full path.
Check if the destination directory exists using a conditional statement. If the directory doesn’t exist, create one using the mkdir command:
```
   if [ ! -d "$DST_DIR" ]; then
      mkdir -p "$DST_DIR"
   fi
```
The code block above is a conditional statement. The explanation of each component is as follows:
- The square brackets [ ] denote the start and end of a conditional expression.
- The exclamation mark ! is the logical NOT operator. It negates the result of the expression that follows it.
- The -d flag is used to check if a directory exists.
- The dollar sign $ before the DST_DIR variable is used to expand the variable and retrieve its value.
- The variable name is enclosed in double quotes "" to handle cases where the variable value contains spaces or special characters. The quotes ensure that the variable is treated as a single entity.
- The semicolon; terminates the conditional statement.
- The fi closes the conditional statement. It is the reverse of if and indicates the end of the conditional block.

Add error handling to the script:

   if [ ! -d "$SRC_DIR" ]; then
      echo "Error: Source directory does not exist"
      exit 1
   fi

Copy the files from the source directory to the destination directory using a loop:
```
   for file in "$SRC_DIR"/*; do # Loops over each file in the SRC_DIR.
      cp "$file" "$DST_DIR" # Copy the files to the DST_DIR.
   done
```
The following is an explanation of the code block above:
- The asterisk * after SRC_DIR is a wildcard character that matches any file within the directory.
- The for file in "$SRC_DIR"/*; do line initiates a loop that iterates over each file in the directory specified by the SRC_DIR variable. The $SRC_DIR"/* is a [glob pattern](https://www.malikbrowne.com/blog/a-beginners-guide-glob-patterns/] that expands to all files in the SRC_DIR directory.
- The do keyword indicates the start of the loop’s body, which contains the code to be executed for each iteration.
- The cp "$file" "$DST_DIR": line inside the loop copies each file to the destination directory specified by the DST_DIR variable. The $file represents the current file in each iteration of the loop, and $DST_DIR specifies the destination directory to which the file will be copied.
- The done keyword marks the end of the loop.
Save and close the file.
Update the script permissions using chmod. The command chmod u+x backup.sh grants the user of the backup.sh file execute permission. It allows the user to execute the script as a program. The u stands for user, and +x adds the execute permission to the file for the user:
```
   chmod u+x backup.sh
```

Run the script to backup files:

   ./backup.sh

When you’re done, your script should look like this:

   #!/bin/bash

   SRC_DIR=/path/to/source/directory
   DST_DIR=/path/to/backup/directory

   if [ ! -d "$DST_DIR" ]; then
      mkdir -p "$DST_DIR"
   fi

   for file in "$SRC_DIR"/*; do
      cp "$file" "$DST_DIR"
   done

   if [ ! -d "$SRC_DIR" ]; then
      echo "Error: Source directory does not exist"
      exit 1
   fi

You can confirm that the backup has occurred by checking the content of the backup destination.

With this procedure, you can back up files faster and more efficiently.

Task 2: Automating Data Processing

Processing

Processing large amounts of data can be time-consuming and error-prone if done manually. However, shell scripts enable automated data processing, reducing mistakes and saving time.

Concepts Covered

For this task, you will learn the following key concepts related to creating an automated data processing script:

Functions: These are blocks of code that perform a particular task. They can accept inputs (arguments) and return values.

Step-by-Step Guide

To create an automated data processing script, follow the steps below:

Write a function to process the data. This function should take the input data as an argument and return the processed data:
```
   #!/bin/bash

   # Define the processing function
   process_data() {
      # Take the input file as an argument
      input_file=$1

      # Process the data using command-line tools
      output_file=$(cut -f 1,3 $input_file | grep 'foo' | sort -n)

      # Return the processed data
      echo $output_file
   }
```
The code sample above demonstrates a simple processing pipeline. The process_data function performs data processing by extracting specific columns, filtering lines based on a pattern, sorting the data, and returning the processed output.

Below is an explanation of the processing steps.
- cut -f 1,3 $input_file: This uses the cut command to extract specific fields from each line of the input file. In this case, -f 1,3 specifies that we want to extract the first and third fields from each line. The result of this operation is a subset of the original data.
- grep 'foo': This uses the grep command to search for the foo is the pattern. Only the lines that contain the word foo will be retained in the output. In this code block, foo is a placeholder which represents a specific word or string that you want to search for in each line of the input file.
- sort -n: This uses the sort command to sort the data in ascending order. The -n option specifies that the sorting should be done numerically, ensuring that numeric values are sorted correctly.
To define a function in shell scripting, you use the following syntax: function_name() { ... }. In the provided code, the process_data function is defined using this syntax.

The function expects an input data file as an argument. Within the function, the input file is accessed through the $1 variable. In shell scripting, arguments passed to a function are accessed using positional parameters, where $1 represents the first argument, $2 represents the second argument, and so on.

Here’s an example of how you would call the process_data function and pass an input file as an argument:
```
input_file="path/to/input/file.txt"
process_data "$input_file"
```
In this example, the variable input_file is set to the path of the input file. Then, the process_data function is invoked with $input_file as the argument. Inside the function, the value of $input_file can be accessed as $1.

By using this approach, you can pass different input files to the process_data function, allowing you to process various data sets within your script.

Write a loop to process multiple files. This loop should call the processing function on each file:


   # Loop through each file and call the processing function
   for file in /path/to/files/*.txt; do
      # Call the processing function on the current file
      processed_data=$(process_data $file)

      # Write the processed data to a new file
      echo $processed_data > "${file}_processed.txt"
   done

Here’s an explanation of the code block above:

for file in /path/to/files/*.txt; do: A loop that iterates over each file in the /path/to/files/ directory with a .txt extension. The loop assigns each file’s path to the variable file.
processed_data=$(process_data $file): This line calls the process_data function and passes the current file as an argument. The output of the function is stored in the processed_data variable. It uses the $() syntax for command substitution, which captures the output of a command.
echo $processed_data > "${file}_processed.txt": This line writes the value of processed_data to a new file with a name derived from the original file name. The ${file}_processed.txt notation appends _processed.txt to the original file name.

When you’re done, your script should look like this:


    #!/bin/bash

    # Define the processing function
    process_data() {
        # Take the input file as an argument
        input_file=$1

        # Process the data using command-line tools
        output_file=$(cut -f 1,3 $input_file | grep 'foo' | sort -n)

        # Return the processed data
        echo $output_file
    }

    # Loop through each file and call the processing function
    for file in /path/to/files/*.txt; do
        # Call the processing function on the current file
        processed_data=$(process_data $file)

        # Write the processed data to a new file
        echo $processed_data > "${file}_processed.txt"
    done

This task demonstrates how shell scripts can efficiently process data. It introduces core bash scripting concepts, such as functions, and the use of command-line tools such as grep, cut, and sort.

Check out this article to learn more about text-processing commands in Linux.

Task 3: Automating Log Analysis

Analysis

Any software system needs log files to keep track of significant events and diagnose problems. In this task, you will automate log file analysis using shell scripts.

Concepts Covered

To accomplish this task, you will need to understand the following:

Regular expressions: These are characters you can use to define a search pattern. You can use regular expressions for data validation, searching and manipulating text, and extracting specific information from files.
Command line tools: These are programs or libraries that accomplish a specific task. You can use command line tools to perform text processing and file manipulation.

Step-by-Step Guide

The script to perform log analysis automatically will perform the following tasks:

Search for errors in the syslog file.
Remove irrelevant information.
Count the number of errors found.
Saves errors to a newly created file.

The following steps will guide you in creating an automated log analysis script:

Define the log file to be analyzed using a variable:


   #!/bin/bash

   # Define the log file
   LOG_FILE="/var/log/syslog"

Use the [grep](https://man7.org/linux/man-pages/man1/grep.1.html) command to search for specific keywords or patterns in the log file:
```
   # Search for errors in the log file
   ERRORS=$(grep "error" "$LOG_FILE")
```
The code above searches for the word error in the log file specified by $LOG_FILE and assigns the output to the ERRORS variable. Below is a detailed explanation:
- ERRORS=: This assigns the result of the command to the variable ERRORS. It prepares the variable to store the output of the grep command.
- grep "error" "$LOG_FILE": This command searches for the word error within the contents of the file stored in the variable $LOG_FILE.
- grep: This is a command-line tool used for searching patterns within files.
- "error": This is the pattern or keyword we are searching for. In this case, it is the word error.
- "$LOG_FILE": This is the variable that contains the path to the log file we want to search in. The double quotes ensure that the variable is expanded to its value.
Use the [sed](https://earthly.dev/blog/sed-find-replace/) command to modify or filter log file data. The code below aims to remove any leading text before the string error and any trailing text after the string at:
```
   # Filter out irrelevant data
   ERRORS=$(echo "$ERRORS" | sed -e "s/.*error: //" -e "s/ at .*$//")
```
Note: From the code above, the irrelevant data refers to the leading and trailing text that is not part of the actual error message.
Use the wc command to count the number of errors:
```
   # Count the number of errors
   ERROR_COUNT=$(echo "$ERRORS" | wc -l)
```
The wc command is used in conjunction with the -l option to count the number of lines in the input.

Here’s a breakdown of the syntax:
- echo "$ERRORS": The echo command is used to print the content of the $ERRORS variable.
- |: The pipe symbol (|) is used to redirect the output of the previous command (echo “$ERRORS”) as input to the next command (wc -l).
- wc -l: The wc command is used to count lines in the input. The -l option tells wc to count only the lines and not other elements like words or characters.

Write the log analysis output to the file or display it in the terminal:


   # Display the results
   echo "Found $ERROR_COUNT errors:"
   echo "$ERRORS" > errors.txt

In the end, your script should look like this:


   #!/bin/bash

   # Define the log file
   LOG_FILE="/var/log/syslog"

   # Search for errors in the log file
   ERRORS=$(grep "error" "$LOG_FILE")

   # Filter out irrelevant data
   ERRORS=$(echo "$ERRORS" | sed -e "s/.*error: //" -e "s/ at .*$//")

   # Count the number of errors
   ERROR_COUNT=$(echo "$ERRORS" | wc -l)

   # Display the results
   echo "Found $ERROR_COUNT errors:"
   echo "$ERRORS" > errors.txt

This task introduces regular expressions and command-line tools like grep, sed, and awk, commonly used in shell scripting.

Using sed for find and replace, is a great article to properly understand text processing.

Task 4: Automating System Maintenance

Maintenance

Among the most critical aspects of software development is system maintenance. System maintenance involves a set of tasks and processes that are performed regularly to keep the system running smoothly and efficiently.

Automating system maintenance processes involves using tools, scripts, and technologies to streamline and simplify the execution of routine maintenance tasks. Instead of manually performing each task, automation helps to get rid of repetitive and time-consuming tasks, reducing human error and increasing efficiency. In this task, you’ll learn how to automate system maintenance processes.

Concepts Covered

This task covers the following concepts:

User input: The user provides information to a program while it runs.
Error handling: This is detecting and resolving errors when running a program.

Step-by-Step Guide

The script to perform an automatic system maintenance task will include the following:

Prompt the user for confirmation before proceeding. If the user fails to respond, an error code is displayed. -Create a log file if it doesn’t exist -Check for system errors and act -Send the log file via email to the system admin
Restart the Apache web server: After performing the necessary system maintenance tasks, the script includes the command service apache2 restart to restart the Apache web server. This step ensures that any changes or updates made during the maintenance process take effect, ensuring the proper functioning of the web server.

You can use the following steps to create an automated system maintenance script:

Get user confirmation from the user:
```
   #!/bin/bash

   # Get user confirmation before proceeding
   read -p "This script will perform system maintenance tasks. 
   Are you sure you want to proceed? (y/n) " confirm

   # Check if user confirmation is not "y" or "Y"  
   if [[ $confirm != [yY] ]]; then
      echo "Aborting script." 
      # Print a message indicating script abortion
      exit 1 # Exit the script
   fi
```
Here’s an explanation for the code block above:
- read -p: Prompts the user with the message in quotes and stores their input in the confirm variable.
- if [[ $confirm != [yY] ]]; then: Starts an if statement and checks if the value of the confirm variable is not equal to either y or Y.
- echo "Aborting script.": Prints the message Aborting script. to the console.
- exit 1: Exits the script with an exit code of 1, indicating an error or abnormal termination.

Define the log_file location in a variable:


   # Define variables
   log_file="/var/log/system_maintenance.log"

Create a log file if it doesn’t exist:


   # Create a log file if it doesn't exist
   touch $log_file

Check the system disk usage and append the result to the log file:
```
   df -h >> $log_file
```
Here’s a breakdown of the command:
- df: The df command stands for “disk free” and is used to display information about file system disk space usage.
- -h: The -h option is used to display the disk space in a human-readable format, making it easier to understand the sizes in a more familiar unit (e.g., GB, MB).
- >> $log_file: The >> operator is used for output redirection and appends the output of the df -h command to the specified log file $log_file variable. This allows you to capture the disk usage information in the log file for later reference or analysis.
Remove the old files:
```
   # Remove old log files
   find /var/log/ -type f -name "*.log" -mtime +30 -delete >> $log_file
```
The code block above searches for log files *.log in the /var/log/ directory that are older than 30 days and deletes them. Here is an explanation for the commands above:
- find /var/log/: This command searches for files under the /var/log/ directory.
- -type f: This option specifies that only regular files should be considered, excluding directories or other types of files.
- -name "*.log": This specifies that the search should only include files with names ending in .log.
- -mtime +30: This condition specifies that the files should have a modification timestamp older than 30 days.
- -delete: This action deletes the matching files.
Restart the Apache webserver service:
```
   service apache2 restart >> $log_file
```
This code block above restarts the Apache webserver service and logs the output of the command to the log file.
- service apache2 restart: This command restarts the Apache webserver service.
- >> $log_file: This appends the output of the command to the log file specified by the $log_file variable.

Send an email to the system admin:


   # Send email to the sysadmin with the log file attached
   mailx -a $log_file -s "System maintenance report" \
   sysadmin@example.com

mailx: This command is used to send emails from the command line.
-a $log_file: This option specifies the attachment (log file) to include in the email.
-s "System maintenance report": This option sets the subject of the email to “System maintenance report”.
sysadmin@example.com: This is a placeholder email address.

The complete script looks like the following:

   #!/bin/bash

   # Get user confirmation before proceeding
   read -p "This script will perform system maintenance tasks. 
   Are you sure you want to proceed? (y/n) " confirm

   if [[ $confirm != [yY] ]]; then
      echo "Aborting script."
      exit 1
   fi

   # Define variables
   log_file="/var/log/system_maintenance.log"

   # Create a log file if it doesn't exist
   touch $log_file

   # Check system disk space usage and append the result to the log file
   df -h >> $log_file

   # Remove old log files
   find /var/log/ -type f -name "*.log" -mtime +30 -delete >> $log_file

   service apache2 restart >> $log_file

   # Send an email to the sysadmin with the log file attached
   mailx -a $log_file -s "System maintenance report" \
   sysadmin@example.com

Automating system maintenance tasks using shell scripts can save time and reduce errors. Furthermore, introducing user input and error handling can create robust and reliable shell scripts that simplify our system’s maintenance.

Task 5: Automating Local Application Deployment with Docker

Deployment

Software deployment involves moving an application from development to production. It can be complex and time-consuming, but it is necessary for software development. Automation can streamline deployment and reduce errors.

Concepts Covered

In this task, you will assume your application is stored in a version control system like Git. You will use a containerization tool like Docker to build and deploy the application.

The following concepts are covered in the task:

Version control: Version control allows developers to keep track of different code versions and collaborate effectively on software development projects.
Environment variables: These are variables defined in the operating system’s environment and can be accessed by programs running on the system.
Containerization: This is creating and managing lightweight, portable software containers that run software applications.

Step-by-Step Guide

The steps below can be used to create an automated deployment script:

Clone the Git repository containing your application code:
```
   #!/bin/bash

   git clone <repository-url>
```
Set environment variables in your shell script to store sensitive information such as API keys and passwords:
```
   export DB_PASSWORD=<password>
   export API_KEY=<api-key>
```
Build a Docker image of your application using a Dockerfile:
```
   docker build -t <image-name> .
```

Run the Docker container with the following command:


   docker run -p 8080:8080 -e DB_PASSWORD=$DB_PASSWORD -e API_KEY=$API_KEY <image-name>

From your browser, you can access your application by navigating to http://localhost:8080.

When you’re done, your script should look like this:


   #!/bin/bash

   git clone <repository-url>

   export DB_PASSWORD=<password>
   export API_KEY=<api-key>

   docker build -t <image-name> .

   docker run -p 8080:8080 -e DB_PASSWORD=$DB_PASSWORD -e API_KEY=$API_KEY <image-name>

This task provides a guide for creating an automated deployment script using containerization, which is a modern and widely-used approach to app deployment. Although the example demonstrates running the container locally, the principles can be applied to a production environment to ensure consistency and reliability.

Conclusion

Shell scripts offer a powerful solution to automate various tasks, such as file backups, data processing, and system maintenance. This article has provided a hands-on understanding of scripting features like variables, loops, system calls, and more. Shell scripting can decrease errors, boost productivity, and enhance system stability, becoming a highly sought skill in the DevOps domain.

While it’s essential to select the right tool, shell scripting isn’t a cure-all. If you’ve enjoyed learning about automation with shell scripts, you might want to explore Earthly for container-based build scripting and automation. It’s super cool stuff and a great next step in your journey towards mastering automation tools.

Earthly makes CI/CD super simple
Fast, repeatable CI/CD with an instantly familiar syntax – like Dockerfile and Makefile had a baby.

Learn More