Unix Shell Programming: Lessons Learned

I have been writing tiny shell scripts for many years now. It was always with the help of DuckDuckGo searches and places like StackOverflow. I never grasped the history, features, syntax, etc. of shells. This all changed when I got serious about learning Unix shell programming.

At work we use Jenkins for our continuous integration (CI). Almost all of the jobs use its Execute Shell build step extensively. Each job is a hodgepodge of shell scripting that the creator got working at some point. Some use /bin/sh and some use /bin/bash. Some set the e option; most don't. Others set the v or x options. There is no consistency.

I listen to BSD Now regularly. They, and the rest of the BSD community, often stress the need to write portable scripts in POSIX shell. I like to joke that I have been brainwashed after listening to this advice constantly. But it is excellent advice and all of us should follow it. What this means is that you should use Bourne Shell (/bin/sh) rather than Bourne Again Shell (/bin/bash) to write scripts.

Writing portable scripts is pretty hard. The first step is to use POSIX shell (/bin/sh). The second step is to not use GNU extensions to standard utilities like sed. This is the hardest step because of the pervasiveness of Linux in our lives. But as a daily user of macOS and a FreeBSD enthusiast I've learned the importance of having to write a script that can run unmodified on any Unix-like operating system. For this reason I consciously strive to always find solutions that solve the same problem a GNU extension does. You'll find posts on my Code Ghar blog that showcase such solutions.

With these two lessons learned I've started fixing Jenkins jobs to follow them. It's a slog but as I encounter new (to me) shell programming constructs my understanding grows deeper. I have even rewritten a Jenkins job in shell that provisions a test environment with Chef and runs tests against it using pytest. If I can do it anyone can. All it takes is motivation and persistence.

Another significant lesson I've learned is to not write complex code in shell. Instead, use a better suited language (I prefer Python) to do the job. For example, in the aforementioned rewrite, it would be a mistake to try to write in shell the functionality provided by Chef or pytest. Learn to recognize when it makes more sense to switch to a more powerful and readable language.

Code readability is a major requirement for me. Before I learned the little bit of shell I know now, reading scripts written by someone else was a major chore. I had to constantly look up documentation to figure out what was going on. For this reason I don't recommend writing large chunks of code in shell if others in your team are not well versed in it. I prefer Python because it's far more readable than shell.

Python is also a very powerful language not just because of its clear syntax but because of the ecosystem that surrounds it. It has libraries, documentation, applications, and community. Anything you want to do has very likely already been done by someone else and possibly released under an open source license. Use these benefits to your advantage.

I've settled on a strategy of writing bootstrap steps in shell and the central functionality in Python. For example, I've written a very basic make-like utility at work -- to build some software components -- in Python using Invoke. Our test suite is packaged to be pip installable and it's built using this utility. I have a run.sh that installs the required Python version, installs my utility and its dependencies, and runs it. My utility actually builds the artifact that a user can install. In this way shell and Python work harmoniously to get the job done. This pattern is repeated for other tasks but it's always the same: use shell to bootstrap the environment where my Python code runs.

Bootstrap code is sometimes not trivial. In such cases it can be written in Python as well. I made such an attempt with rewriting one Jenkins job we have using Ansible as a higher level shell script. The exploratory work was successful and Ansible did the job well. However, a few things bothered me enough that I came back to shell.

Ansible requires -vvv` flag to display all the information we really needed to troubleshoot issues. By the time we needed to troubleshoot it was too late since by default I wasn’t using this flag and had to re-run with the flag. Its output is JSON friendly but not as human readable as a shell script's output (with set -vex). Ansible captures output and displays it at the end; I like to see real-time output from Chef and pytest to keep tabs on progress. Jenkins and shell work together effectively to give this benefit.

Ansible was far easier to write and maintain as a higher level shell script. Code (or yaml in this case) reuse is much better in Ansible. It is possible for members of a team to reuse each other's roles, etc. across various Jenkins jobs. If our CI wasn’t based on Jenkins -- which relies heavily on shell scripts -- maybe Ansible could have been a better fit.

Eventually, output readability and real-time output made me use shell instead of Ansible in this case even though both got the job done.

So there you have it. These are some lessons I learned as I learned to become a better shell programmer. I encourage everyone to make their own journeys into shell programming because they'll find it is still an integral part of any work flow.