Absurd: An Introduction to Linux

20230310

In the 1990s, Linux was available on store shelves in big boxes. These were complete with thick manuals covering a wide variety of the software included in a given Linux distribution. After the dot com bust, this practice largely stopped. Many Linux distribution vendors went out of business, and even the largest projects changed how they operated. Red Hat, for example, used to sell boxed copies of Red Hat Linux in retail stores like many others.

redhat 6.2 boxed

redhat 6.2 manual

While Red Hat was easily the largest of the Linux distribution vendors at the time (certainly the most profitable), in 2003 the distribution took a different turn. Red Hat became far more focused on their enterprise computing clients and released Red Hat Enterprise Linux. The workstation and desktop offering became Fedora Core. The “core” would later be dropped and become just Fedora. Since the mid 2000s, Linux development accelerated. The World Wide Web was firmly established, and Linux with Apache, MySQL, and PHP was the way to serve a website (or with Java and Ant). If you’d like to know the history of Linux, I’ve written about Linux in the past. Linux distributions have grown in size, and the complexity of them has grown as concomitantly. At this point, any given Linux operating system is many gigabytes in size with thousands of software packages included. This makes it nearly impossible to document the system, and even more impossible for any newcomer to understand. Most users of Linux today have little to no understanding of what is involved in the system, and this somewhat dilutes the purpose of free and open source software. If no understanding can be had, no code review or contribution can take place. Even the ability to tinker gets destroyed if the user cannot understand his/her system.

Complaining does nothing for anyone. I’ve been a Linux professional for years, and I suppose that I am as suited as anyone to elucidate the workings of a Linux distribution. Today, in this article, I will build an overly simplified Linux distribution named Plain Old Linux. This is meant (currently anyway) to be an educational distribution. It will not be receiving any updates, security patches, or any support whatever and therefore should not be used in any production environment. However, it could be. If the user takes it upon himself/herself to regularly pull fresh versions of the included software and update the system, there is no technical reason for this system not to serve that selfsame user’s needs adequately. The lack of a package management system is intentional here. Package managers automate away some of the process, and therefore aren’t suited to this level of work.

BACKGROUND

So, what is Linux? Linux is an operating system kernel developed by Linus Torvalds in 1991. It is UNIX-like, and is often combined with the GNU operating system’s UNIX-like userland. The kernel is a piece of software that provides services to other software. This makes it easier to program for the computer as a developer need only target the kernel and not the myriad different pieces of hardware that exist in the world. The userland is what we users see. In Linux this is the command line environment, and in many cases a GUI environment as well (in this article, I am not getting into GUIs).

In a traditional IBM-compatible PC, BIOS will look for the first 512 bytes of the first disk. This is where the MBR is and where the boot loader resides. This boot loader will then load the operating system kernel into memory. Modern systems no longer use BIOS or an MBR. Instead, they use UEFI. This means that the kernel can be compiled as an UEFI application, and it can then be loaded directly by UEFI. This bypasses the need for a boot loader entirely. When the kernel is done loading, it will call the init system (pid 1). The init system is responsible for starting user land software services (think logging, GUIs, and other stuff that is always running) called daemons. Once init is finished, it will call getty and getty will start ttys. A tty is a pseudo-teletype. It interacts with a pseudo-teletype multiplexor in the Linux kernel of which each tty is a client. The device file for the multiplexor is exposed in the userland at /dev/ptmx. There’s interesting history behind teletypes, but that’s for another day.

These ttys are our interface to the system. First, they call login. After login successfully authenticates a user, the user’s specified shell will be launched. In most Linux systems that shell is bash, zsh, dash, or ash all of which are POSIX compatible (though they have their own extensions). Many more shells exist such as ksh, tcsh, csh, mksh, nushell, fish, powershell, qshell, es, rc, scsh, pdksh, ion, and that list could go on for a very long time. Every shell has its adherents, and every shell has its own strengths and weaknesses. On nearly all Linux systems, however, the system’s shell located at /bin/sh is POSIX compatible and less feature rich than other shells available to users of that system. Lately, there has been a trend to symlink /bin/bash to /bin/sh. This still follows the general rule I stated earlier as when bash is invoked as sh it’s in a more pure POSIX compatibility mode.

Regardless of which shell is invoked, a logged in user is now presented with a command line, which in Linux-speak is often called a terminal or console. From here, users can interact with UNIX-like utilities such as ls, cp, find, awk, sed, vi, or grep, and they can launch applications like emacs, joe, lynx, or mutt.

BUILDING PLAIN OLD LINUX

In order to build the target Linux system, a Linux system must be in place. Several dependencies may need to be installed, and these will vary from distribution to distribution. I recommend installing the distribution’s basic build packages, and specifically making certain that GNU’s autotools are installed and that git is installed.

There is a decision to make. This system can be built and used directly on disk, or it can be built and used in a file that is then utilized by qemu or another virtualization system. For the on-disk option, a dedicated hard disk partition is needed for the new system. Gparted or the GNOME or the KDE disk partitioning tools can be used to create the new partition. For the file option (which I will assume people are using) the first step is to create a large but empty file with dd:

dd if=/dev/zero of=/mnt/pol_root.img bs=4M count=2048

Now, this file needs to be attached to a loop device:

losetup /dev/loop0 pol_root.img

This should be visible in lsblk:

lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0     8G  0 loop

This image needs to be partitioned:

gdisk /dev/loop0

The first step within gdisk is to create GPT entries. This informs UEFI of where the partitions are, what their sizes are, and so on. For this, press o and then enter. Then a new disk is needed, press n and enter. Press enter when prompted for the partition number, again when prompted for the start location, and then +200M when prompted for the end location. For the partition type, enter ef00 (EFI system partition). One more partition is needed. The only differences between this one and the previous is that the end location is the end of the file, so press enter, and the type should be 8300. To exit gdisk, press w and then confirm write with y.

With new partitions, the file needs to be reread:

losetup -d /dev/loop0
losetup -Pf pol_root.img
lsblk

NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0         7:0    0     8G  0 loop
├─loop0p1   259:5    0   200M  0 part
└─loop0p2   259:6    0   7.8G  0 part

Partitioning only creates the physical separations on the disk. It does not create filesystems. That must now be done:

mkfs.fat -F32 /dev/loop0p1
mkfs.ext4 /dev/loop0p2

The virtual disks now must be mounted:

mkdir /mnt/target
mount /dev/loop0p2 /mnt/target
mkdir -p /mnt/target/boot/efi
mount /dev/loop0p1 /mnt/target/boot/efi

When building a Linux distribution from nothing, a compiler toolchain is required. This is a set software used to transform human written computer code into machine executable binaries (the 1s and 0s). Traditionally, this means: gcc, glibc, gmp, mpc, mpfr and binutils. GCC is the actual compiler, while glibc is the C library (the language file that the compiler uses to understand human written C language code). GMP, MPC, and MPFR are libraries for arbitrary-precision floating point numbers. These are used to extend the C language a bit, and make handling large numbers a bit easier. Binutils is a series of tools for handling binary files. POL doesn’t use glibc, but instead uses musl (a different C library). Everything else remains the same. Ready made musl toochains are available from musl.cc, and that is what will be utilized here. Compiling these ourselves serves little purpose as the result will be roughly identical to what is provided for us.

A directory to hold the sources of the software we are using is needed.

mkdir -p /mnt/target/usr/src

As POL doesn’t use GNU’s libc (it uses musl), a cross compiler is needed. Cross compilers allow one to compile code for different “targets.” Those targets are usually the hardware architecture, the OS, and the libc. One could target riscv-linux-musl, or x8664-minix-glibc, or something else. In our case, the target is x8664-linux-musl.

cd /mnt/target/usr/src
wget https://musl.cc/x86_64-linux-musl-cross.tgz
tar xf x86_64-linux-musl-cross.tgz
rm -f x86_64-linux-musl-cross.tgz
mv x86_64-linux-musl-cross /mnt/target/cross

Create a file titled pol_environment, and in it place:

unset CFLAGS
unset CXXFLAGS
PATH=/mnt/target/cross/bin:$PATH
CPATH=/mnt/target/usr/include:/mnt/target/usr/lib:/mnt/target/lib
C_INCLUDE_PATH=/mnt/target/usr/include
CPLUS_INCLUDE_PATH=/mnt/target/usr/include
OBJC_INCLUDE_PATH=/mnt/target/usr/include
LIBRARY_PATH=/mnt/target/usr/lib:/mnt/target/lib:/mnt/target/cross/x86_64-linux-musl/lib
COMPILER_PATH=/mnt/target/cross/bin
CC="/mnt/target/cross/bin/x86_64-linux-musl-gcc --sysroot=/mnt/target"
CXX="/mnt/target/cross/bin/x86_64-linux-musl-g++ --sysroot=/mnt/target"
AR="/mnt/target/cross/bin/x86_64-linux-musl-ar"
AS="/mnt/target/cross/bin/x86_64-linux-musl-as"
LD="/mnt/target/cross/bin/x86_64-linux-musl-ld --sysroot=/mnt/target"
RANLIB="/mnt/target/cross/bin/x86_64-linux-musl-ranlib"
READELF="/mnt/target/cross/bin/x86_64-linux-musl-readelf"
STRIP="/mnt/target/cross/bin/x86_64-linux-musl-strip"
POL_TARGET="x86_64-linux-musl"

This makes sure that any program being compiled has access to environment variables that list our cross compiler. Now, whenever working on this distribution, load those environment variables with the source command:

source pol_environment

With the cross compiler in place, we now need to make a directory hierarchy.

cd /mnt/target
mkdir -pv {bin,boot,dev,etc,home,lib,mnt,opt,proc,root,sbin,srv,sys,tmp}
mkdir -v lib/{firmware,modules}
mkdir -v etc/{rc.d,skel}
mkdir -pv var/{cache,lib,local,lock,log,opt,run,spool,tmp}
mkdir -pv usr/{,local/}{bin,include,lib,sbin,share,src}

A native toolchain is also required. The native toolchain is what will be used to build software when booted into POL. The toolchain archive will be removed after putting things in place. If you want the sources, the packages are: GMP, MPC, MPFR, GCC, musl, and binutils.

cd /mnt/target/usr/src
wget https://musl.cc/x86_64-linux-musl-native.tgz
tar xf x86_64-linux-musl-native.tgz
rm -f x86_64-linux-musl-native.tgz
cd x86_64-linux-musl-native/
cp -Rv {bin,include,lib,libexec,share} /mnt/target/usr/
cd ..
rm -rf x86_64-linux-musl-native

The kernel headers need to be installed. Header files originate from the C programming language used in writing Linux. In C, a forward declaration of a function is needed before it can be used. That is: a description of the function, its parameters, and what kind of data it returns. Common convention is to place all such forward declarations into a header. In this manner, other programs can then include the header thereby gaining access to all of the functions when compiled. Linux header files are suffixed as .h files, and these contain the functions that the Linux kernel provides that can be called from other programs. Once the headers are in place, a kernel must be compiled. While the kernel configuration is rather fat, there are still many devices that are modules, and the entire firmware set for Linux is going to be put in place as well. The firmware set distributed with kernel contains what are known as “binary blobs.” Binary blobs are non-source-available bits of binary code that enable certain devices. These are necessary for many video cards, many WiFi chipsets, and even for post-Broxton Intel CPUs. In many cases, a device can be used without a binary blob, but it may have reduced functionality. In other cases, a device may not function at all without the binary blob. The firmware set is by far the largest (by size on disk) component in POL. If a device with little storage space is targeted, omit the firmware installation entirely, and research which firmware blobs are required for the target hardware.

cd /mnt/target/usr/src
wget https://cdn.kernel.org/pub/linux/kernel/v6.x/linux-6.1.13.tar.xz
tar xf linux-6.1.13.tar.xz
rm -f linux-6.1.13.tar.xz
cd linux-6.1.13
make mrproper
make ARCH=x86_64 INSTALL_HDR_PATH=/mnt/target/usr headers_install
wget https://git.abortretry.fail/PlainOldLinux/KernelConf61/raw/branch/master/config-amd64
mv config-amd64 .config
make ARCH=x86_64 CROSS_COMPILE=${POL_TARGET}- oldconfig
make -j$(nproc) ARCH=x86_64 CROSS_COMPILE=${POL_TARGET}-
mkdir -p /mnt/target/boot/EFI/BOOT
cp arch/x86/boot/bzImage /mnt/target/boot/efi/EFI/BOOT/BOOTX64.EFI
cp .config /mnt/target/boot/config
cp System.map /mnt/target/boot/
make -j$(nproc) ARCH=x86_64 CROSS_COMPILE=${POL_TARGET}- INSTALL_MOD_PATH=/mnt/target/ modules_install
make clean
cd /mnt/target/usr/src
git clone https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware
cd linux-firmware/
sed -i 's#/lib/firmware#/mnt/target/lib/firmware#' Makefile
make install
cd ..
rm -rf linux-firmware/

A userland is now required. In most distributions, the userland is made up of many different pieces of software. In this case, BusyBox will be used. BusyBox combines common UNIX-like components into a single package. This serves to simplify the build process, to simplify maintenance, and to decrease the complexity of the system overall.

cd /mnt/target/usr/src
wget https://busybox.net/downloads/busybox-1.36.0.tar.bz2
tar xf busybox-1.36.0.tar.bz2
rm -f busybox-1.36.0.tar.bz2
cd busybox-1.36.0/
make clean
https://git.abortretry.fail/PlainOldLinux/BusyBoxConf136/raw/branch/master/config
mv config .config
make -j$(nproc) ARCH="x86_64" CROSS_COMPILE="/mnt/target/cross/bin/${POL_TARGET}-"
make ARCH="x86_64" CROSS_COMPILE="/mnt/target/cross/bin/${POL_TARGET}-" CONFIG_PREFIX="/mnt/target" install

Init is now required.

The first init script must be created. This script will turn off all but the most egregious kernel messages, mount the necessary file systems, run our other init scripts, and spawn a tty.

#!/bin/sh
# /etc/rc — this file will be called directly by /sbin/init
# it is safest to use absolute paths here because
# /etc/profile is not yet loaded
/bin/dmesg -n 1
/bin/echo "Mounting required filesystems"
/bin/mount -t sysfs sysfs /sys
/bin/mount -t proc proc /proc
/bin/echo "Mounting fstab filesystems"
/bin/mount -a
/bin/echo "Remounting root as read/write"
/bin/mount / -o remount,rw
/bin/echo "Mounting /dev/shm"
[ -d /dev/shm ] || /bin/mkdir /dev/shm
/bin/mount -t tmpfs tmpfs /dev/shm
/bin/echo "Mounting /dev/pts"
[ -d /dev/pts ] || /bin/mkdir /dev/pts
/bin/mount -t devpts devpts /dev/pts
/bin/echo "Mounting /tmp"
/bin/mount -t tmpfs tmpfs /tmp
/bin/echo "Starting mdev"
/bin/echo /sbin/mdev > /proc/sys/kernel/hotplug
/sbin/mdev -s
/bin/echo "Executing init scripts"
for FILE in /etc/rc.d/*; do
  [ -x /etc/rc.d/$FILE ] && /etc/rc.d/$FILE start
done
echo "System booted with $(/bin/cat /proc/cmdline)"
/bin/echo "Spawning TTY"
/sbin/respawn /bin/busybox getty 38400 /dev/tty0

The script must be made executable.

chmod 755 etc/rc

The mount -a listed there assumes that an fstab file exists.

/dev/sda2   /            ext4      ro,noatime  1  1

In UNIX-like systems, being self-hosting requires make. Make is a commonly used build tool that renders the building of source packages far easier than it would otherwise be. This was used here already with the kernel and with busybox.

cd /mnt/target/usr/src
wget https://ftp.gnu.org/gnu/make/make-4.3.tar.gz
tar xf make-4.3.tar.gz
rm -f make-4.3.tar.gz
cd make-4.3/
./configure --prefix=/usr --without-guile --host=$POL_TARGET
make && make DESTDIR=/mnt/target install

Now, the actual init system needs to be built and installed. It is beneficial to have init statically linked. This means that running init will require access to nothing other than init. This is useful if parts of a disk become corrupted, unavailable, have permission issues, or anything else. So, the sed command here just adds the -static flag to the CFLAGS.

cd /mnt/target/usr/src
git clone https://github.com/richfelker/minimal-init.git
cd minimal-init
sed -i 's/-O2/-O2 -static/' Makefile
make
cp {init,respawn} /mnt/target/sbin/

Rich Felker’s minimal-init doesn’t contain a reboot command. This must be created at /mnt/target/sbin/reboot. In this case, the reboot command calls stop on each init script, runs the sync command which writes buffered data to disk, remounts the filesystem as read-only, and then sends the reboot character to the kernel’s sysrq listener.

#!/bin/sh
# /sbin/reboot
for FILE in /etc/rc.d/*; do
  [ -x /etc/rc.d/$FILE ] && /etc/rc.d/$FILE stop
done
sync
mount / -o remount,ro
echo b >/proc/sysrq-trigger

This file must be made executable:

chmod +x /mnt/target/sbin/reboot

Minimal-init also lacks a shutdown command, and one must be created at /mnt/target/sbin/shutdown. This is just like reboot except that the poweroff character is sent to the kernel.

#!/bin/sh
# /sbin/shutdown
for FILE in /etc/rc.d/*; do
  [ -x /etc/rc.d/$FILE ] && /etc/rc.d/$FILE stop
done
sync
mount / -o remount,ro
echo o >/proc/sysrq-trigger

This file too must be made executable:

chmod +x /mnt/target/sbin/shutdown

When a login shell is started, /etc/profile will be read to create an environment:

#!/bin/sh
export PATH=/usr/local/bin:/usr/bin:/bin
if [ `id -u` -eq 0 ] ; then
        PATH=/usr/local/sbin:/usr/sbin:/sbin:$PATH
        unset HISTFILE
fi
export USER=`id -un`
export LOGNAME=$USER
export PAGER='/bin/more '
export EDITOR='/bin/vi'

Users need to be defined in /etc/passwd:

root::0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/bin/false
daemon:x:2:6:daemon:/sbin:/bin/false
adm:x:3:16:adm:/var/adm:/bin/false
lp:x:10:9:lp:/var/spool/lp:/bin/false
mail:x:30:30:mail:/var/mail:/bin/false
news:x:31:31:news:/var/spool/news:/bin/false
uucp:x:32:32:uucp:/var/spool/uucp:/bin/false
operator:x:50:0:operator:/root:/bin/ash
postmaster:x:51:30:postmaster:/var/spool/mail:/bin/false
nobody:x:65534:65534:nobody:/:/bin/false

Basic groups need to be defined in /etc/group:

root:x:0:
bin:x:1:
sys:x:2:
kmem:x:3:
tty:x:4:
tape:x:5:
daemon:x:6:
floppy:x:7:
disk:x:8:
lp:x:9:
dialout:x:10:
audio:x:11:
video:x:12:
utmp:x:13:
usb:x:14:
cdrom:x:15:
adm:x:16:root,adm,daemon
console:x:17:
mail:x:30:mail
news:x:31:news
users:x:9001:
nogroup:x:65533:
nobody:x:65534:

Available system shells need to be listed in /etc/shells:

/bin/sh
/bin/ash

Finally, some permission changes must be made:

umount /dev/loop0p1
cd /mnt/target
chmod 1777 var/tmp
chmod 1777 tmp
chmod 700 root
chown -Rv root:root ./
touch var/log/lastlog
chgrp -v 13 var/log/lastlog
chmod 664 var/log/lastlog
chmod 4755 bin/busybox

Now, it’s time to unmount the disk/file:

cd /mnt
umount /dev/loop0p2

To boot the image file, something like the following should suffice, but pressing escape frantically during bootup may be needed. OVMF will otherwise just try to boot the kernel without a specified root. Once in the OVMF EFI interface, go to boot manager, and then EFI shell:

qemu-system-x86_64 -bios /usr/share/qemu/OVMF.fd -enable-kvm -vga qxl -nodefaults -display gtk -hda pol_root.img -m 4096M -smp $(nproc)

Booting bare metal is interesting. To boot on an UEFI machine, an EFI system partition is required, but a boot loader is not. The kernel config I supplied has EFISTUB enabled, and therefore the kernel can be loaded directly by UEFI. So, from here there are two different methods to boot POL. All options will assume that the Linux kernel compiled earlier was copied to the EFI partition as bootx64.efi.

  1. Create an entry with bcfg directly from EFI Shell
  2. Pass the boot parameters directly from EFI Shell

For option 1, something like the following would suffice:

bcfg boot dump
bcfg boot add 2 fs0:\EFI\BOOT\BOOTX64.EFI "Plain Old Linux"
bcfg boot -opt 2 "root=/dev/sda2"

For option 2, something like this would work:

FS0:
\EFI\BOOT\BOOTX64.EFI root=/dev/sda2

image of UEFI boot

login prompt of POL

You can safely ignore the error for /dev/fd0 and /dev/cdrom, I had added those to my image and forgot to remove the fstab entries before this boot.

image of pol shell

Yay! Some things are working!

double headed dodo bird

The Linux mascot is a penguin, an awk. The dodo is another flightless bird, but an extinct one. A fitting mascot for a Linux distribution of a type that is long dead.

⇠ back

© MMXXIV, Abort Retry Fail LLC
Licentiam Absurdum