How to automatically backup arbitrary files from across a Linux system to Github

6 min readApr 25, 2024

I was messing around with my Linux config files, and I realized it would be painful to have to do this all over again when I moved to another machine. My friend said I could just push my config to Github, which makes sense.

I was planning to do that, but I didn’t like how manual it sounded. I would probably forget to do it regularly, and by the time my laptop broke or I misplaced it, it would be too late.

My precious config files without a backup

True to the spirit of engineering: why spend 10 minutes doing something manually, when I could spend hours automating it?

I ended up created an automated way of having all of my config files be wherever they wanted to be on my system. All of them would get backed up to my private GitHub repo in a structured way, every evening at 7pm, all changes to the files included. If you want the code for this, it’s on my Github! The rest of the blog explains the process!

It works for any arbitrary files you might want to backup, and you can have different files backed up at different Github repos, at different schedules. It’s also really quick and easy to setup.

Symlinks and BASH scripts

The basic idea was to create a new local Git repo (in my case, inside a directory called Config), and inside of it create a folder called symlinks. This contained all of my structured symlinks to the files I wanted to backup. In my case, it looked like this:

`symlinks` folder containing all of the symlinks to my to-be-backed-up files and directories

To add symlinks, one can use the Linux command:

cd symlinks
ln -s [target] ./[name-of-desired-subdirectory-or-file]

For example, to copy over one’s .bashrc file (as a visible bashrc file for easy backup visibility) and one’s tmux configuration, one can run:

cd symlinks
ln -s ~/.bashrc ./basrhc
mkdir config
ln -s ~/.config/tmux ./config/tmux

The problem is that symlinks don’t copy the files over themselves. They only provide links to the original directories/files. I then wrote a BASH script copy-symlinks.sh to copy all of the files from the symlinks to a new directory called files, that mirrored the structure of symlinks. This created up-to-date copies of all of the symlinked files, so I could edit them inplace but have backups in my git repo. My files directory ended up looking like this after I ran the BASH script:

`files` folder containing copies from all of my symlinked paths

The following is most of the BASH script that “pulls” the files and directories from the symlinks:

#!/bin/bash

# Set the source directory containing the symlinks
# NOTE: this can be named to whatever one wishes the symlinks folder to be 
# called as long as one also renames the `symlinks` folder.
source_directory="./symlinks"

# Set the target directory
# NOTE: this can similarly be named to whatever one wishes.
target_directory="./files"

# Create the target directory if it doesn't exist
mkdir -p "$target_directory"

# Iterate over the symlinks in the source directory and its subdirectories
find "$source_directory" -type l | while read -r link; do
  # Get the relative path of the symlink within the source directory
  relative_path="${link#$source_directory/}"
  
  # Get the absolute path of the original file
  file=$(readlink -f "$link")

  # Create the corresponding directory structure in the target directory
  mkdir -p "$target_directory/$(dirname "$relative_path")"

  # If it's a directory:
  if [ -d "$file" ]; then 
    # Copy the original file to the target directory, preserving the structure
    cp -r "$file"/* "$target_directory/$relative_path"
  
  # If it's a file:
  else 
    cp "$file" "$target_directory/$relative_path"
  fi
done

This BASH script also committed all of the changes from the symlink “pull” to the git history, and pushed it to my Github repo.

# Add the copied files to the Git repository
git add .

# Commit the changes
git commit -m "Updated files via copy-symlinks.sh"

# Push
git push

I then set up a cronjob to automate this BASH script, so it ran every evening at 7pm. However, we have to set up an SSH agent first.

Setting up the SSH Agent

For the cronjob, I first had to setup an SSH agent. This agent forwards SSH details to the BASH script. That way, your SSH details don’t get written down anywhere in any of the commands or scripts. The SSH agent is responsible for automatically filling in your SSH password whenever a process asks for it, e.g. when we push to the Github repo.

To startup an ssh-agent, run:

eval "$(ssh-agent -s)

To add an SSH-key to the agent, if you have an RSA key run:

ssh-add ~/.ssh/id_rsa

For id_ed25519 keys run:

ssh-add ~/.ssh/id_ed25519

You may have to update your path, if your SSH keys aren’t stored in the default ~/.ssh directory.

You can check the SSH agent is running and has your SSH key via:

ssh-add -l

Setting up the Cronjob

Now that we’ve set up the SSH agent, we can setup the cronjob. This will automatically trigger the BASH script according to whatever schedule we give it!

To schedule a cronjob, one can use a cronjob command such as:

0 19 * * * git config --global --add safe.directory [PATH]; USER=[YOUR-LINUX-USERNAME] SSH_AUTH_SOCK=$(find /tmp/ssh-* -type s -user [YOUR-LINUX-USERNAME] 2>/dev/null | head -n 1) /bin/bash [PATH]/copy-symlinks-auto.sh > [PATH]/log-file.log 2>&1

with PATH equal to the path to this directory, e.g. /home/username/Config. The following:

0 19 * * *

represents that the script should run every evening at 19:00. However, one can schedule the backup to happen at whatever frequency one wishes (the format consists of minute, hour, day of the month, month, day of the week, where * is a wildcard).

To create a cronjob on Linux, run:

crontab -e

This will open up your cronjob file. Add the above cronjob command to the end of your file, with whatever time schedule you wish. If this is your first time using crontab, you will have to select your text editor. Choose your favorite text editor, but you can always change it at anytime afterwards via running select-editor and choosing again.

This cronjob will also log to a file log-file.log, and it may be useful for debugging issues with the cronjob. However, this logging can be disabled by removing the re-direction from the end of the cronjob command. I.e., remove > [PATH]/log-file.log 2>&1 from the end of the command.

End Result

According to your cronjob schedule, this will automatically pull all of the symlinked files into a single place, and automatically push those to Github (or BitBucket, GitLab, etc.). You can rest easy knowing that your backups are happening under the hood, and you can continue working without having to change your original file structure in any way.

My Github ended up looking like this:

If I made changes to the original files, the changes would get picked up when the cronjob ran and pushed to Github, with the commit message “Updated files via copy-symlinks.sh”. It felt very cool.

You can even have multiple different backup systems and cronjobs running at once, pushing to different repos! Again, if you want the code, here is the Github!

But yeah, that’s it for the blog!

I hope this was useful and/or interesting! If you like the repo, please leave a star, it would mean a lot! Have a pleasant rest of your day/evening!

How to automatically backup arbitrary files from across a Linux system to Github

Symlinks and BASH scripts

Setting up the SSH Agent

Setting up the Cronjob

End Result

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Oscar Wiljam Savolainen, PhD

No responses yet

More from Oscar Wiljam Savolainen, PhD

How to remove all trace of a file from your git history (and automate it with BASH)

I am no web developer, but I have dabbled. I have built 2 websites, both offline now, for hobbies and/or potential businesses I wanted to…

Alma: Find The Fastest PyTorch Model Conversion | Auto-Benchmark 50+ Options

Open source Python package that finds the best PyTorch conversion method for your model, data, and hardware.

My favourite Python snippets

I’ve coded Python full time for half a decade now, and I have found some snippets of code that are a godsend. I thought I would share them…

How to quantize a neural network model in PyTorch: an in-depth explanation

For a background on why neural network quantization is attractive, I would recommend this Medium post…

Recommended from Medium

Quality gate for helm charts

What is a quality gate? A quality gate is a milestone in an IT project that requires that predefined criteria be met before the project can…

100 Essential Commands, Scripts, and Hacks : The DevOps Engineer’s Survival Guide

100 Essential Commands, Scripts, and Hacks

Lists

General Coding Knowledge

Coding & Development

The New Chatbots: ChatGPT, Bard, and Beyond

Icon Design

Getting started with Web Sockets in C++ with Nodepp

In today’s world of live streaming, video conferencing, and remote work, latency has emerged as a critical concern for users seeking a…

How I Am Using a Lifetime 100% Free Server

Get a server with 24 GB RAM + 4 CPU + 200 GB Storage + Always Free

Monitoring FastAPI Using Grafana and Prometheus

Monitoring APIs is crucial to ensure their health, performance, and reliability. In this guide, we’ll walk through setting up monitoring…

Why You Should Read The Damn Book: Domain-Driven Design by Eric Evans

On the quest for good technical literature, you’d rarely stumble upon a book as profound as Eric Evans’ Domain-Driven Design.