20 KiB
title | date |
---|---|
Setting Up Arch Linux | 2021-02-12T00:00:00 |
This note includes all commands I typed when I set up Arch Linux on my new server.
PSA: I published a toolchain for building AUR packages in a clean-room docker container https://github.com/uetchy/archpkgs
Setup
Wipe a whole disk
wipefs -a /dev/sda
Create partition
parted
select /dev/sda
mktable gpt
mkpart EFI fat32 0 512MB # EFI
mkpart Arch ext4 512MB 100% # Arch
set 1 esp on # flag partition 1 as ESP
quit
Install file-system
mkfs.vfat -F 32 /dev/sda1 # EFI
mkfs.ext4 /dev/sda2 # Arch
e2fsck -cc -C 0 /dev/sda2 # fsck
Mount disks
mkdir -p /mnt/boot
mount /dev/sda2 /mnt
mount /dev/sda1 /mnt/boot
Install Linux kernel
# choose between 'linux' or 'linux-lts'
pacstrap /mnt base linux-lts linux-firmware
genfstab -U /mnt >> /mnt/etc/fstab
arch-chroot /mnt
pacman -S reflector
reflector --protocol https --latest 30 --sort rate --save /etc/pacman.d/mirrorlist --verbose # optimize mirror list
Install essentials
pacman -S vim man-db man-pages git base-devel
Add fstab entries
# backup
UUID=<UUID> /mnt/backup ext4 defaults 0 2
# archive (do not prevent boot even if fsck fails)
UUID=<UUID> /mnt/archive ext4 defaults,nofail,x-systemd.device-timeout=4 0 2
You can find <UUID>
from lsblk -f
.
findmnt --verify --verbose # verify fstab
Locales
ln -sf /usr/share/zoneinfo/Asia/Tokyo /etc/localtime
hwclock --systohc
vim /etc/locale.gen & locale-gen
echo "LANG=en_US.UTF-8" > /etc/locale.conf
Install bootloader
pacman -S \
grub \
efibootmgr \
amd-ucode # AMD microcode
grub-install --target=x86_64-efi --efi-directory=/boot --bootloader-id=GRUB
vim /etc/default/grub
# GRUB_TIMEOUT=3
# GRUB_DISABLE_SUBMENU=y
grub-mkconfig -o /boot/grub/grub.cfg
Setup network
hostnamectl set-hostname takos
hostnamectl set-chassis server
127.0.0.1 localhost
::1 localhost
127.0.0.1 takos
See also: systemd.network, ArchWiki, and Ivan Smirnov's blog.
[Match]
Name=enp5s0
[Network]
#DHCP=yes
Address=10.0.1.2/24
Gateway=10.0.1.1
DNS=10.0.1.100 # self-hosted DNS resolver
DNS=1.1.1.1 # Cloudflare for the fallback DNS server
MACVLAN=dns-shim # to handle local DNS lookup to 10.0.1.100, which is managed by Docker macvlan driver
# to handle local dns lookup to 10.0.1.100
[NetDev]
Name=dns-shim
Kind=macvlan
[MACVLAN]
Mode=bridge
# to handle local dns lookup to 10.0.1.100
[Match]
Name=dns-shim
[Network]
IPForward=yes
[Address]
Address=10.0.1.103/32
Scope=link
[Route]
Destination=10.0.1.100/30
ip
equivalent to the above config:
ip link add dns-shim link enp5s0 type macvlan mode bridge # add macvlan shim interface
ip a add 10.0.1.103/32 dev dns-shim # assign the interface an ip address
ip link set dns-shim up # enable the interface
ip route add 10.0.1.100/30 dev dns-shim # route macvlan subnet (.100 - .103) to the interface
systemctl enable --now systemd-networkd
networkctl status
ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
# for self-hosted dns resolver
sed -r -i -e 's/#?DNSStubListener=yes/DNSStubListener=no/g' -e 's/#DNS=/DNS=10.0.1.100/g' /etc/systemd/resolved.conf
systemctl enable --now systemd-resolved
resolvectl status
resolvectl query ddg.gg
drill ddg.gg
If networkctl
keep showing enp5s0
as degraded
, then run ip addr add 10.0.1.2/24 dev enp5s0
to manually assign static IP address for the workaround.
Exit chroot
exit # leave chroot
umount -R /mnt
reboot
NTP
timedatectl set-ntp true
timedatectl status
AUR
git clone https://aur.archlinux.org/yay.git
cd yay
makepkg -si
Shell
pacman -S zsh
chsh -s /bin/zsh
# Install useful utils (totally optional)
yay -S pyenv exa antibody direnv fd ripgrep fzy peco ghq-bin hub neofetch tmux git-delta lazygit jq lostfiles ncdu htop rsync youtube-dl prettier tree age
Setup operator user (i.e., a user without superuser privilege)
passwd # change root password
useradd -m -s /bin/zsh <user> # add operator user
passwd <user> # change operator user password
userdbctl # verify users
userdbctl group # verify groups
pacman -S sudo
echo "%sudo ALL=(ALL) NOPASSWD:/usr/bin/pacman" > /etc/sudoers.d/pacman # allow users in sudo group to run pacman without password (optional)
groupadd sudo
usermod -aG sudo <user> # add operator user to sudo group
visudo -c
SSH
pacman -S openssh
vim /etc/ssh/sshd_config
systemctl enable --now sshd
if [ ! -S ~/.ssh/ssh_auth_sock ] && [ -S "$SSH_AUTH_SOCK" ]; then
ln -sf $SSH_AUTH_SOCK ~/.ssh/ssh_auth_sock
fi
set -g update-environment -r
setenv -g SSH_AUTH_SOCK $HOME/.ssh/ssh_auth_sock
Defaults env_keep += SSH_AUTH_SOCK
on the host machine:
ssh-copy-id <user>@<ip>
See also: Happy ssh agent forwarding for tmux/screen · Reboot and Shine
S.M.A.R.T.
pacman -S smartmontools
systemctl enable --now smartd
smartctl -t short /dev/sda
smartctl -l selftest /dev/sda
NVIDIA drivers
pacman -S nvidia-lts # 'nvidia' for 'linux'
reboot
nvidia-smi # test runtime
Docker
pacman -S docker docker-compose
yay -S nvidia-container-runtime
systemctl enable --now docker
{
"log-driver": "json-file",
"log-opts": {
"max-size": "10m", // default: -1 (unlimited)
"max-file": "3" // default: 1
},
"runtimes": {
// for docker-compose
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
systemctl restart docker
usermod -aG docker <user>
# to create mandatory device files on /dev
docker run --gpus all nvidia/cuda:10.2-cudnn7-runtime nvidia-smi
GPU_OPTS=(--gpus all --device /dev/nvidia0 --device /dev/nvidiactl --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools)
docker run --rm -it ${GPU_OPTS} nvidia/cuda:10.2-cudnn7-runtime nvidia-smi
docker run --rm -it ${GPU_OPTS} tensorflow/tensorflow:1.14.0-gpu-py3 bash
Use journald
log driver in Docker Compose
services:
web:
logging:
driver: "journald"
options:
tag: "{{.ImageName}}/{{.Name}}/{{.ID}}" # default: "{{.ID}}"
- Configure logging drivers | Docker Documentation
- Architecture Overview — NVIDIA Cloud Native Technologies documentation
Additional setup
nginx-proxy
git clone --recurse-submodules https://github.com/evertramos/nginx-proxy-automation.git /srv/proxy
cd /srv/proxy
./fresh-start.sh --yes -e your_email@domain --skip-docker-image-check
Nextcloud
git clone https://github.com/uetchy/docker-nextcloud.git /srv/cloud
cd /srv/cloud
cp .env.sample .env
vim .env # fill the blank variables
make # pull, build, start
make applypatches # run only once
Fail2ban
pacman -S fail2ban
[INCLUDES]
before = common.conf
[Definition]
failregex = .* client login failed: .+ client:\ <HOST>
ignoreregex =
[DEFAULT]
ignoreip = 127.0.0.1/8 10.0.1.0/24
[sshd]
enabled = true
port = 22,10122
bantime = 1h
mode = aggressive
# https://github.com/Mailu/Mailu/blob/master/docs/faq.rst#do-you-support-fail2ban
[mailu]
enabled = true
backend = systemd
journalmatch = CONTAINER_NAME=mail_front_1
filter = bad-auth
findtime = 1h
maxretry = 3
bantime = 1w
chain = DOCKER-USER
banaction = iptables-allports
- After=network.target iptables.service firewalld.service ip6tables.service ipset.service nftables.service
+ After=network.target iptables.service firewalld.service ip6tables.service ipset.service nftables.service docker.service
systemctl enable --now fail2ban
fail2ban-client status sshd
Telegraf
yay -S telegraf
# Global tags can be specified here in key="value" format.
[global_tags]
# Configuration for telegraf agent
[agent]
interval = "15s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
# Read InfluxDB-formatted JSON metrics from one or more HTTP endpoints
[[outputs.influxdb]]
urls = ["http://127.0.0.1:8086"]
database = "<db>"
username = "<user>"
password = "<password>"
# Read metrics about cpu usage
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
# Read metrics about disk usage by mount point
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
# Read metrics about disk IO by device
[[inputs.diskio]]
# Get kernel statistics from /proc/stat
[[inputs.kernel]]
# Read metrics about memory usage
[[inputs.mem]]
# Get the number of processes and group them by status
[[inputs.processes]]
# Read metrics about system load & uptime
[[inputs.system]]
# Read metrics about network interface usage
[[inputs.net]]
interfaces = ["enp5s0"]
# Read metrics about docker containers
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
perdevice = false
total = true
[[inputs.fail2ban]]
interval = "15m"
use_sudo = true
# Pulls statistics from nvidia GPUs attached to the host
[[inputs.nvidia_smi]]
timeout = "30s"
[[inputs.http_response]]
interval = "5m"
urls = [
"https://example.com"
]
# Monitor sensors, requires lm-sensors package
[[inputs.sensors]]
interval = "60s"
remove_numbers = false
# Run executable as long-running input plugin
[[inputs.execd]]
interval = "15s"
command = ["/metrics.sh"]
name_override = "metrics"
signal = "STDIN"
restart_delay = "20s"
data_format = "logfmt"
Cmnd_Alias FAIL2BAN = /usr/bin/fail2ban-client status, /usr/bin/fail2ban-client status *
telegraf ALL=(root) NOEXEC: NOPASSWD: FAIL2BAN
Defaults!FAIL2BAN !logfile, !syslog, !pam_session
chmod 440 /etc/sudoers.d/telegraf
usermod -aG docker telegraf
telegraf -config /etc/telegraf/telegraf.conf -test
systemctl enable --now telegraf
cfddns
Dynamic DNS for Cloudflare.
Star the GitHub repository if you like it :)
yay -S cfddns sendmail
token: <token>
notification:
enabled: true
from: cfddns@localhost
to: me@example.com
example.com
dev.example.com
example.org
systemctl enable --now cfddns
Backup
pacman -S restic
[Unit]
Description=Daily Backup Service
[Service]
Type=simple
Nice=19
IOSchedulingClass=2
IOSchedulingPriority=7
ExecStart=/etc/backup/run.sh
[Unit]
Description=Daily Backup Timer
[Timer]
WakeSystem=false
OnCalendar=*-*-* 14:00
RandomizedDelaySec=5min
[Install]
WantedBy=timers.target
#!/bin/bash -ue
# usage: run.sh
# https://restic.readthedocs.io/en/latest/040_backup.html#
export RESTIC_REPOSITORY=/path/to/backup
export RESTIC_PASSWORD=<passphrase>
export RESTIC_PROGRESS_FPS=1
date
# system
restic backup --tag system -v \
--one-file-system \
--exclude .cache \
--exclude .vscode-server \
--exclude .vscode-server-insiders \
--exclude TabNine \
--exclude /var/lib/docker/overlay2 \
/ /boot
# data
restic backup --tag data -v \
--exclude 'appdata_*/preview' \ # Nextcloud cache
--exclude 'appdata_*/dav-photocache' \ # Nextcloud cache
/mnt/data
# prune
restic forget --prune --group-by tags \
--keep-within-daily 7d \
--keep-within-weekly 1m \
--keep-within-monthly 3m
# verify
restic check
#!/bin/bash
# usage: show.sh <file|directory>
# https://restic.readthedocs.io/en/latest/050_restore.html
export RESTIC_REPOSITORY=/path/to/backup
export RESTIC_PASSWORD=<passphrase>
export RESTIC_PROGRESS_FPS=1
TARGET=${1:-$(pwd)}
MODE="ls -l"
if [[ -f $TARGET ]]; then
TARGET=$(realpath ${TARGET})
MODE=dump
fi
TAG=$(restic snapshots --json | jq -r '[.[].tags[0]] | unique| .[]' | fzy)
ID=$(restic snapshots --tag $TAG --json | jq -r ".[] | [.time, .short_id] | @tsv" | fzy | awk '{print $2}')
>&2 echo "Command: restic ${MODE} ${ID} ${TARGET}"
restic $MODE $ID ${TARGET}
#!/bin/bash
# https://restic.readthedocs.io/en/latest/050_restore.html
export RESTIC_REPOSITORY=/path/to/backup
export RESTIC_PASSWORD=<passphrase>
export RESTIC_PROGRESS_FPS=1
TARGET=${1:?Specify TARGET}
TARGET=$(realpath ${TARGET})
TAG=$(restic snapshots --json | jq -r '[.[].tags[0]] | unique | .[]' | fzy)
ID=$(restic snapshots --tag $TAG --json | jq -r ".[] | [.time, .short_id] | @tsv" | fzy | awk '{print $2}')
>&2 echo "Command: restic restore ${ID} -i ${TARGET} -t /"
read -p "Press enter to continue"
restic restore $ID -i ${TARGET} -t /
chmod 700 /etc/backup/{run,show}.sh
ln -sf /etc/backup/restic.{service,timer} /etc/systemd/system/
systemctl enable --now restic
Kubernetes
pacman -S minikube kubectl
minikube start --cpus=max
kubectl taint nodes --all node-role.kubernetes.io/master- # to allow the control plane to allocate pods to the master node
minikube ip
kubectl cluster-info
kubectl get cm -n kube-system kubeadm-config -o yaml
- Kubernetes - ArchWiki
- Kubernetes Ingress Controller with NGINX Reverse Proxy and Wildcard SSL from Let's Encrypt - Shogan.tech
Audio
pacman -S alsa-utils # maybe requires rebooting system
usermod -aG audio <user>
# list devices as root
aplay -l
arecord -L
cat /proc/asound/cards
# test speaker
speaker-test -c2
# test mic
arecord -vv -Dhw:2,0 -fS32_LE mic.wav
aplay mic.wav
# gui mixer
alsamixer
# for Mycroft.ai
pacman -S pulseaudio pulsemixer
pulseaudio --start
pacmd list-cards
# INPUT/RECORD
load-module module-alsa-source device="default" tsched=1
# OUTPUT/PLAYBACK
load-module module-alsa-sink device="default" tsched=1
# Accept clients -- very important
load-module module-native-protocol-unix
load-module module-native-protocol-tcp
pcm.mic {
type hw
card M96k
rate 44100
format S32_LE
}
pcm.speaker {
type plug
slave {
pcm "hw:1,0"
}
}
pcm.!default {
type asym
capture.pcm "mic"
playback.pcm "speaker"
}
#defaults.pcm.card 1
#defaults.ctl.card 1
- PulseAudio as a minimal unintrusive dumb pipe to ALSA
- SoundcardTesting - AlsaProject
- Advanced Linux Sound Architecture/Troubleshooting - ArchWiki
- ALSA project - the C library reference: PCM (digital audio) plugins
- Asoundrc - AlsaProject
Firewall
pacman -S firewalld
systemctl enable --now firewalld
See Introduction to Netfilter – To Linux and beyond !.
Maintenance
Quick checkups
htop # show task overview
systemctl --failed # show failed units
free -h # show memory usage
lsblk -f # show disk usage
networkctl status # show network status
userdbctl # show users
nvidia-smi # verify nvidia cards
ps aux | grep "defunct" # find zombie processes
Delve into system logs
journalctl -p err -b-1 -r # show error logs from previous boot in reverse order
journalctl -u sshd -f # tail logs from sshd unit
journalctl --no-pager -n 25 -k # show latest 25 logs from the kernel without pager
journalctl --since=yesterday --until "2020-07-10 15:10:00" # show logs within specific time range
journalctl CONTAINER_NAME=service_web_1 # show error from docker container named 'service_web_1'
journalctl _PID=2434 -e # filter logs based on PID and jump to the end of the logs
journalctl -g 'timed out' # filter logs based on regular expression. if the pattern is all lowercase, matching is case insensitive
- g - go to the first line
- G - go to the last line
- / - search for the string
Force overriding installation
pacman -S <pkg> --overwrite '*'
Check memory modules
pacman -S lshw dmidecode
lshw -short -C memory # lists installed mems
dmidecode # shows configured clock speed
File-system related issues checklist
smartctl -H /dev/sdd
# umount the drive before this ops
e2fsck -C 0 -p /dev/sdd1 # preen
e2fsck -C 0 -cc /dev/sdd1 # badblocks
Common issues
Longer SSH login (D-bus glitch)
systemctl restart systemd-logind
systemctl restart polkit
Annoying systemd-homed is not available
log messages
Move pam_unix
before pam_systemd_home
.
#%PAM-1.0
auth required pam_faillock.so preauth
# Optionally use requisite above if you do not want to prompt for the password
# on locked accounts.
auth [success=2 default=ignore] pam_unix.so try_first_pass nullok
-auth [success=1 default=ignore] pam_systemd_home.so
auth [default=die] pam_faillock.so authfail
auth optional pam_permit.so
auth required pam_env.so
auth required pam_faillock.so authsucc
# If you drop the above call to pam_faillock.so the lock will be done also
# on non-consecutive authentication failures.
account [success=1 default=ignore] pam_unix.so
-account required pam_systemd_home.so
account optional pam_permit.so
account required pam_time.so
password [success=1 default=ignore] pam_unix.so try_first_pass nullok shadow
-password required pam_systemd_home.so
password optional pam_permit.so
session required pam_limits.so
session required pam_unix.so
session optional pam_permit.so
Annoying systemd-journald-audit log
Audit=no
Missing /dev/nvidia-{uvm*,modeset}
This occurs after updating linux kernel.
- Run
docker run --rm --gpus all --device /dev/nvidia0 --device /dev/nvidiactl --device /dev/nvidia-modeset --device /dev/nvidia-uvm --device /dev/nvidia-uvm-tools -it nvidia/cuda:10.2-cudnn7-runtime nvidia-smi
once.
[sudo] Incorrect password
while password is correct
faillock --reset