Skip to content

Operations Runbook

This runbook is for maintainers validating builds, triaging failures, and shipping releases.

1) Pre-release checklist

  1. Ensure version alignment:
  2. core/__init__.py
  3. pyproject.toml
  4. (Note: setup.py has been removed; pyproject.toml is the sole build configuration)
  5. Confirm docs reflect behavior changes.
  6. Confirm Vagrantfile matrix matches scripts/run_automated_vm_tests.sh.
  7. Confirm packaging data includes templates and frontend.
  8. Docker build uses a single Dockerfile with uv multi-stage build (Dockerfile.rocky has been removed).

2) Release process (tag-driven)

  1. Commit release changes.
  2. Create/update tag (vX.Y.Z).
  3. Push branch + tag.
  4. Monitor Build and Release workflow.
  5. Download release artifacts and run VM validation.

3) VM validation process

Command:

./scripts/run_automated_vm_tests.sh vX.Y.Z

Outputs:

  • per-VM logs: vms/artifacts/logs/<vm>_run.log
  • reports: vms/artifacts/reports/<hostname>/

Current matrix:

  • ubuntu18
  • ubuntu22
  • ubuntu24
  • debian12
  • almalinux
  • opensuse
  • windows

4) Fast triage commands

Last 100 lines per VM log

for f in vms/artifacts/logs/*_run.log; do echo "===== $f ====="; tail -n 100 "$f"; done

Find hard failures quickly

grep -Ei "failed|error|GLIBCXX|not found|unable|interrupted" vms/artifacts/logs/*_run.log

Check reports produced

find vms/artifacts/reports -maxdepth 3 -type f | sort

5) Known failure categories

  1. Artifact/runtime mismatch
  2. Symptoms: GLIBCXX_* not found in apt/oscap subprocesses
  3. Action: verify build includes environment sanitization (get_clean_env() usage).

  4. Box provisioning issues

  5. Symptoms: interrupted/corrupt box, truncated archive
  6. Action: remove bad box cache, re-download box, retry single VM first.

  7. Expected unsupported tool paths

  8. Symptoms: USG unavailable on non-Ubuntu, distro-scoped skips
  9. Action: confirm logs show intentional skip, not hard failure.

  10. Server startup failures during VM checks

  11. Symptoms: Server failed to start
  12. Action: inspect emitted <runtime_home>/logs/web_server_<YYYYmmdd_HHMMSS>.log in VM runner output.

6) Manual per-VM debugging

Linux VM

vagrant up ubuntu22
vagrant ssh ubuntu22
sudo bash /mnt/artifacts/vm_test_runner.sh

Windows VM

vagrant up windows
vagrant powershell windows -c "Set-ExecutionPolicy Bypass -Scope Process; & C:\mnt\artifacts\vm_test_runner.ps1"

7) Documentation maintenance policy

When changing behavior, update at least:

8) Docker build

The project uses a single Dockerfile with a uv-based multi-stage build:

docker build -t cis-hardening-tool .
docker run --rm -it cis-hardening-tool

The previous Dockerfile.rocky has been removed. All container builds go through the single Dockerfile.

9) Incident log expectations

Each significant failure should capture:

  1. Symptom and exact error signature.
  2. Reproduction path.
  3. Root cause.
  4. Code/script/workflow fix.
  5. Validation evidence.