Skip to content

Troubleshooting and Postmortems

This document captures recurring failures observed during cross-OS packaging and VM validation, plus the implemented fixes.

Failure triage decision tree

Diagram source: triage-decision-tree.mmd

1) GLIBCXX_* not found when calling system binaries from packaged app

Symptoms

  • apt-get / oscap failures with errors like:
  • libstdc++.so.6: version GLIBCXX_3.4.29 not found

Root cause

Packaged runtime (AppImage/PyInstaller) leaked LD_LIBRARY_PATH into subprocesses, causing host binaries to link against bundled incompatible libs.

Fix

  • Added get_clean_env() in core/utils.py
  • Routed subprocess execution through sanitized env in ToolManager and wrappers

Validation

System package commands and scanner binaries run with host libraries instead of packaged library overrides.


2) Server start failures in packaged builds due to missing frontend assets

Symptoms

  • server start --detach fails in VM runner
  • web routes return frontend missing errors

Root cause

PyInstaller bundle omitted frontend assets.

Fix

  • Build workflow includes --add-data "frontend:frontend" (Linux) and equivalent Windows data embedding.

3) OpenSCAP package name drift across distributions/versions

Symptoms

  • Install failures on Debian/Ubuntu variants because package names differ.

Root cause

Single static package list is insufficient for mixed distro/version targets.

Fix

  • Added PACKAGE_MAPPINGS with distro+version keys
  • Added fallback by distro and defaults
  • Added unsupported guards where packages are unavailable (Debian 11 OpenSCAP)

4) Debian 11 OpenSCAP unavailability

Findings

  • Confirmed from Debian package search and in-VM apt-cache checks:
  • no usable OpenSCAP packages in default Debian 11 repos

External evidence used during triage:

  • Debian package search (bullseye): no openscap packages returned
  • Debian package search (bookworm): openscap-utils / openscap-scanner available

Resolution

  • Removed Debian 11 from active VM matrix
  • Added unsupported-version skip behavior for OpenSCAP on Debian 11

5) OpenSUSE and package manager coverage

Symptoms

  • Tool install paths skipped on SUSE before zypper support.

Fix

  • Added zypper package manager path in ToolManager
  • Added install_pkgs_zypper in tool config

6) AlmaLinux installation instability under low memory

Symptoms

  • package operations terminated or inconsistent under constrained memory.

Fix

  • Increased AlmaLinux VM memory in Vagrantfile to 2048 MB.

7) Windows CIS-CAT not running due to Java prerequisite

Symptoms

  • CISCAT install/scan path incomplete on Windows.

Root cause

Java runtime not always available by default.

Fix

  • Shifted Java prerequisite handling into ToolManager._install_windows() for ciscat.
  • Added runtime detection and installer fallbacks (winget/choco).

8) USG false-failure noise on non-Ubuntu systems

Symptoms

  • VM tests showing expected non-support as failures.

Fix

  • Added distro support constraints (supported_distros) for usg
  • Skip unsupported distro installs cleanly
  • Keep explicit note that Ubuntu Pro entitlement may still be required

9) VM artifact folder accidentally tracked in git

Symptoms

  • Large generated VM artifacts polluted git history.

Fix

  • Added vms/ to .gitignore
  • Removed tracked artifacts and rewrote affected commit/tag state

10) Practical triage playbook

  1. Check per-VM log tails first.
  2. Separate environment failures (box download, OOM, ports) from code failures.
  3. Reproduce in single VM with manual install command.
  4. Patch installer config (tools_config) and installer logic (tool_manager) together.
  5. Re-run targeted VM tests before full concurrent suite.

11) File path changes to be aware of during troubleshooting

When investigating issues, note these structural changes:

  • CLI: cli.py has been replaced by the cli/ package (8 modules). Stack traces will reference cli/<module>.py instead of cli.py.
  • Reporter: report_gen.py has been deleted and merged into reporter.py. The parsing logic now lives in core/parsers/ (a package with per-format parser modules). The slim orchestrator remains in reporter.py.
  • CIS-CAT Linux: CIS-CAT execution logic was extracted from linux.py to wrappers/ciscat_linux.py.
  • New core modules: core/deps.py (dependency checks), core/constants.py (shared constants), core/os_detect.py (OS detection), core/ws_patch.py (WebSocket patching).
  • Database: database.py now uses lazy initialization. The scans table includes a completed_at column. New helpers: delete_scan, pagination support.
  • Logging: scan_logger.py now writes per-tool log files. logger.py supports LOG_FORMAT=json.
  • Docker: Single Dockerfile (uv multi-stage); Dockerfile.rocky removed.
  • setup.py: Deleted. Use pyproject.toml only.

Postmortem policy going forward

For each new issue, capture:

  1. Signature (exact error snippets)
  2. Scope (which OS/version/artifact)
  3. Root cause class (packaging/runtime/config/repo/network)
  4. Patch summary (files + behavior)
  5. Validation proof (command + log evidence)