Operations Runbook¶
This runbook is for maintainers validating builds, triaging failures, and shipping releases.
1) Pre-release checklist¶
- Ensure version alignment:
core/__init__.pypyproject.toml- (Note:
setup.pyhas been removed;pyproject.tomlis the sole build configuration) - Confirm docs reflect behavior changes.
- Confirm
Vagrantfilematrix matchesscripts/run_automated_vm_tests.sh. - Confirm packaging data includes
templatesandfrontend. - Docker build uses a single
Dockerfilewith uv multi-stage build (Dockerfile.rockyhas been removed).
2) Release process (tag-driven)¶
- Commit release changes.
- Create/update tag (
vX.Y.Z). - Push branch + tag.
- Monitor
Build and Releaseworkflow. - Download release artifacts and run VM validation.
3) VM validation process¶
Command:
Outputs:
- per-VM logs:
vms/artifacts/logs/<vm>_run.log - reports:
vms/artifacts/reports/<hostname>/
Current matrix:
- ubuntu18
- ubuntu22
- ubuntu24
- debian12
- almalinux
- opensuse
- windows
4) Fast triage commands¶
Last 100 lines per VM log¶
Find hard failures quickly¶
Check reports produced¶
5) Known failure categories¶
- Artifact/runtime mismatch
- Symptoms:
GLIBCXX_* not foundin apt/oscap subprocesses -
Action: verify build includes environment sanitization (
get_clean_env()usage). -
Box provisioning issues
- Symptoms: interrupted/corrupt box, truncated archive
-
Action: remove bad box cache, re-download box, retry single VM first.
-
Expected unsupported tool paths
- Symptoms: USG unavailable on non-Ubuntu, distro-scoped skips
-
Action: confirm logs show intentional skip, not hard failure.
-
Server startup failures during VM checks
- Symptoms:
Server failed to start - Action: inspect emitted
<runtime_home>/logs/web_server_<YYYYmmdd_HHMMSS>.login VM runner output.
6) Manual per-VM debugging¶
Linux VM¶
Windows VM¶
vagrant up windows
vagrant powershell windows -c "Set-ExecutionPolicy Bypass -Scope Process; & C:\mnt\artifacts\vm_test_runner.ps1"
7) Documentation maintenance policy¶
When changing behavior, update at least:
- TOOL_MATRIX.md
- VM_TESTING_AND_RELEASE_VALIDATION.md
- TROUBLESHOOTING_AND_POSTMORTEMS.md
- Relevant folder README in
api/,core/,scripts/,tests/,frontend/, ortemplates/
8) Docker build¶
The project uses a single Dockerfile with a uv-based multi-stage build:
The previous Dockerfile.rocky has been removed. All container builds go through the single Dockerfile.
9) Incident log expectations¶
Each significant failure should capture:
- Symptom and exact error signature.
- Reproduction path.
- Root cause.
- Code/script/workflow fix.
- Validation evidence.