The vq calculation queue¶

vq is vibe-qc’s calculation queue, a small SSH-backed job-submission tool that lets you run vibe-qc (and CRYSTAL / ORCA) calculations on a remote compute box without writing shell glue. Configure it once, then vq submit my_calc.py from your laptop and the job is queued, dispatched, resource-capped, and watched on the remote host. Outputs come back the same way.

vq is co-shipped with vibe-qc in the vibe-queue/ subpackage but is independently versioned (at the time of writing it’s vq, version 0.12.0). It’s engine-agnostic: vibe-qc is the primary workload, but anything you can call from a shell, CRYSTAL14, ORCA 6.1, PySCF scripts, submits the same way through contrib/ wrappers.

When to use vq¶

Laptop runs out of cores or RAM. Your MacBook has 16 GB and 10 cores; the remote has 128 GB and 32 cores. Queue the big runs; keep the laptop for development.
You want a record of what you ran. Every submission is a JobSpec stored on the daemon, with a unique short-hash id, full command, environment, resource caps, terminal state, and outputs.
You’re running many jobs. vq dispatches one at a time (by default; see § Concurrency below) and records every one, so you don’t lose track when a sweep takes hours.
You want resource enforcement. cgroup-v2 caps mean a runaway job doesn’t bring down the box.

When NOT to use vq¶

Tiny molecules on the laptop. .venv/bin/python h2o.py runs in 3 s; the queue + ssh round-trip adds latency for zero gain.
Truly interactive sessions. vq is batch-shaped; use ssh
- a remote venv directly, or set up the Jupyter Lab integration for notebooks.
HPC cluster job arrays. vq targets a single single-node host. SLURM is the right tool for cluster scheduling; a SLURM backend for vq is on the v1.0 roadmap but doesn’t ship yet.

Architecture¶

┌────────────────────┐         SSH        ┌──────────────────────────┐
│  Your laptop       │ ─────────────────→ │  Remote compute host     │
│                    │                    │                          │
│  vq CLI            │                    │  vq-daemon.service       │
│  ~/.config/vq/     │  vq submit         │  (systemd --user)        │
│   config.toml      │                    │                          │
│                    │ ←───── stdout ──── │  Queue (SQLite-backed)   │
│  ssh-key auth      │                    │   ↓                      │
│                    │                    │  systemd-run scope       │
│                    │                    │   (cgroup-v2 caps)       │
│                    │                    │   ↓                      │
│                    │                    │  your Python / ORCA /    │
│                    │                    │  CRYSTAL14 process       │
│                    │                    │                          │
│                    │  vq-web.service    │  Web UI (FastAPI+htmx)   │
│  browser ──────────┼───── port 8765 ───→│  :8765/queue,            │
│                    │  bearer token      │  /jobs/<id>              │
└────────────────────┘                    └──────────────────────────┘

Pieces that need to be running:

vq-daemon.service on the remote, accepts submissions (over SSH), maintains the queue, dispatches jobs into cgroup scopes, survives reboots via loginctl enable-linger.
vq-web.service on the remote, read-only-plus-write REST + HTML UI, port 8765 by default, bearer-token-protected.
vq CLI on the laptop, wraps ssh remote vq … so the laptop never deals with the queue state directly.

Installation¶

Two sides, local (laptop) and remote (compute box). Both run the same pip install.

Local (laptop)¶

# Inside your vibe-qc checkout
cd vibe-queue
python3 -m venv .venv
.venv/bin/pip install -e .

# Put vq on PATH:
ln -s ~/path/to/vibe-queue/.venv/bin/vq ~/.local/bin/vq
# or in zshrc:
#   alias vq="$HOME/path/to/vibe-queue/.venv/bin/vq"

The local install needs only the CLI dependencies (no FastAPI / systemd). Test:

vq --version            # vq, version 0.11.0

Remote (compute box)¶

# 1. Install vq from a vibeqc-queue clone:
git clone https://gitlab.peintinger.com/mpei/vibeqc.git ~/vibeqc-queue
cd ~/vibeqc-queue/vibe-queue
python3 -m venv .venv
.venv/bin/pip install -e '.[web]'      # [web] pulls FastAPI + uvicorn

# 2. Install the systemd-user units:
mkdir -p ~/.config/systemd/user
cp contrib/vq-daemon.service ~/.config/systemd/user/
cp contrib/vq-web.service    ~/.config/systemd/user/

# Edit ExecStart in both unit files to use the venv's
# absolute vq path (e.g. /home/user/vibeqc-queue/vibe-queue/.venv/bin/vq).

# 3. Enable the daemon to start at boot (linger keeps the
# user instance alive without an active session):
sudo loginctl enable-linger $USER

systemctl --user daemon-reload
systemctl --user enable --now vq-daemon.service
systemctl --user enable --now vq-web.service

# 4. Verify:
systemctl --user status vq-daemon vq-web
journalctl --user -u vq-daemon -f       # live log tail

The bearer token for the web UI is generated on first daemon-start and written to ~/.config/vq/web-token mode 0600 on the remote. Print it once and store it locally; you’ll need it to access the web UI from a browser. Re-generate by deleting the file and restarting the daemon.

Configuration¶

vq reads ~/.config/vq/config.toml on the laptop. The remote daemon doesn’t need a config file. Copy the template from the repository and edit:

cp vibe-queue/docs/config.toml.example ~/.config/vq/config.toml

A working minimal config:

# ~/.config/vq/config.toml on your laptop

# Default host when you omit it from `vq <subcommand> ...`.
# Match a [hosts.<name>] block below.
default_host = "compute"

[hosts.compute]
ssh = "compute"
# 'compute' must be an SSH alias defined in ~/.ssh/config,
# or a literal user@host.example.com. Test with:
#   ssh compute hostname

# Absolute path to vq on the remote. The remote shell's
# default PATH usually doesn't include the venv vq lives in.
remote_vq = "/home/USER/vibeqc-queue/vibe-queue/.venv/bin/vq"

# Default Python interpreter for single-file submits. Point
# at a venv where vibe-qc is installed.
remote_python = "/home/USER/vibeqc-dev/.venv/bin/python"

# Optional: multi-venv routing for --branch (v0.5.6+).
# Lets `vq submit foo.py --branch release` pick the right
# vibe-qc clone without hard-coding the path.
[hosts.compute.branches]
main    = "/home/USER/vibeqc-dev/.venv/bin/python"
release = "/home/USER/vibeqc-release/.venv/bin/python"

[hosts.compute.branch_aliases]
dev         = "main"
development = "main"
latest      = "release"

The full annotated example is at vibe-queue/docs/config.toml.example.

Multi-host¶

Add another [hosts.<name>] block:

[hosts.compute2]
ssh = "compute2"
remote_vq = "/home/USER/vibeqc-queue/vibe-queue/.venv/bin/vq"
remote_python = "/home/USER/vibeqc-dev/.venv/bin/python"

Then vq submit foo.py --host compute2 routes to that machine. Omit --host to use default_host.

Your first job¶

# A trivial vibe-qc water RHF script.
cat > water.py <<'EOF'
from vibeqc import Atom, Molecule, run_job
mol = Molecule([
    Atom(8, [0.0,  0.00,  0.00]),
    Atom(1, [0.0,  1.43, -0.98]),
    Atom(1, [0.0, -1.43, -0.98]),
])
run_job(mol, basis="sto-3g", method="rhf", output="water")
EOF

# Submit it.
vq submit water.py
# → printed to stdout: jobid (e.g. "c0ff50a06462") + a watch hint.

# Poll the queue:
vq list

# Wait for it (Ctrl-C exits the watcher; the job keeps running):
vq watch c0ff50a06462

# Once it finishes, pull the outputs back:
vq fetch c0ff50a06462 ./outputs/
# → ./outputs/water.out / .molden / .traj / stdout.log / stderr.log

That’s the entire core workflow.

Submission forms¶

vq accepts three submission shapes:

Single file (most common)¶

vq submit my_script.py
# Equivalent to:
#   ssh <host> cd <remote-workspace> && <remote_python> my_script.py

The laptop copies my_script.py into a fresh per-job workspace on the remote, runs it with the configured remote_python (or the --branch-resolved one), captures stdout / stderr, and tracks the result.

Directory submit (sweeps + multi-file inputs)¶

vq submit -d ./my_sweep_dir -- python run.py --basis def2-svp
# -d <path>                = the directory to copy across to the workspace
# --                       = end of vq flags
# python run.py …          = the literal command to run inside the workspace

Use this when:

Your script imports local modules (from helpers import ...).
You need multiple input files in the workspace (run.py reads geometry.xyz, basis_def.g94, etc.).
You want to encode the interpreter / engine in the command (e.g. running ORCA: -- orca input.inp).

Pre-packed tarball¶

vq submit -t my_inputs.tar.gz -- bash run.sh
# vq unpacks the tarball into the workspace before dispatching.

For reproducibility, the tarball + command + JobSpec are a complete reproducible-run unit.

Resource caps¶

Every job dispatched after v0.4.0 runs inside its own systemd-run user scope so cgroup-v2 memory + CPU caps apply. Wall-time enforcement is Python-watchdog-based (vq.watchdog); cgroup RuntimeMaxSec was tried in v0.4 → v0.5.7 and dropped in v0.5.8 as not runtime-mutable via systemctl --user set-property (see vibe-queue/docs/wall_time_design.md for the postmortem). The watchdog subtracts paused_seconds_total from elapsed, so wall-time is naturally pause-aware.

vq submit my_calc.py \
    --cpus 8 \
    --mem-mb 16000 \
    --wall-time-seconds 7200    # 2-hour cap (watchdog-enforced)

If the job exceeds any cap, the cgroup or the watchdog kills it cleanly and the queue records a labelled terminal state:

Terminal state	Trigger	Owner	Recovery
`COMPLETED`	exit code 0	-	nothing, outputs ready to fetch
`FAILED`	non-zero exit code	-	inspect stderr.log; re-submit
`OOM_KILLED`	exceeded `--mem-mb` cap	cgroup	bump `--mem-mb` or split the job
`TIME_EXCEEDED`	exceeded `--wall-time-seconds` cap	watchdog	bump the cap, or checkpoint if vibe-qc supports it for the workload
`STARVED`	CPU-underutilisation watchdog (5 min < 10% CPU summed over the pgid descendants)	watchdog	check stderr.log, typically a hanging worker; v0.5.12+ samples the whole pgroup so the bash wrapper no longer false-positives
`ABORTED_BY_QUEUE`	terminated by `vq kill`, queue-wide pause→kill, or daemon restart that lost the exit code	daemon	intentional; resubmit if needed

Always pass --wall-time-seconds N for non-trivial jobs, that’s the only guard against a wedged SCF eating cores indefinitely.

Orphan exit-code recovery (v0.5.9+)¶

If the daemon restarts mid-job (deliberately via systemctl --user restart vq-daemon or via Restart=on-failure), the dispatched bash wrapper writes the inner process’s exit code to <workspace>/_vq/exit-code on graceful exit. When the new daemon reconciles orphans, it reads the marker and classifies as COMPLETED (rc=0) or FAILED (rc≠0). Pre-v0.5.9 behaviour was to mark every restart-orphan as ABORTED_BY_QUEUE even on clean completion; v0.5.9 fixes that and is what makes vq admin update (v0.5.20+) safe to use, it deliberately pause-restart-resumes the daemon.

Multi-venv `--branch` routing (v0.5.6+)¶

The remote may host multiple vibe-qc clones, typically vibeqc-dev (tracking main) and vibeqc-release (tracking the latest tag). Pick one per submit:

vq submit my_calc.py                        # default_host's default
vq submit my_calc.py --branch main          # dev venv
vq submit my_calc.py --branch release       # release venv
vq submit my_calc.py --branch latest        # = release (alias)

--branch is mutually exclusive with --python, and only applies to single-file submits. For -d / -t submits, encode the interpreter in the explicit command.

The mapping is per-host config, [hosts.<name>.branches] + [hosts.<name>.branch_aliases]. Add new entries by editing ~/.config/vq/config.toml on the laptop; no remote restart needed.

External-program workflows (CRYSTAL / ORCA / PySCF)¶

vibe-qc treats other QC programs as external, see CLAUDE.md § 10 for the policy. vq dispatches them through contrib/ wrappers that handle each program’s I/O conventions:

CRYSTAL14 (Pcrystal + PROPERTIES14)¶

# Parallel CRYSTAL14 (default --np 14):
vq submit -d ./calc --cpus 14 \
    -- bash /home/USER/vibeqc-queue/vibe-queue/contrib/run-crystal.sh INPUT.d12

# Serial:
vq submit -d ./calc --cpus 1 \
    -- bash /home/USER/vibeqc-queue/vibe-queue/contrib/run-crystal.sh --serial INPUT.d12

# Custom MPI rank count:
vq submit -d ./calc --cpus 8 \
    -- bash /home/USER/vibeqc-queue/vibe-queue/contrib/run-crystal.sh --np 8 INPUT.d12

# PROPERTIES14 (parallel):
vq submit -d ./prop --cpus 14 \
    -- bash /home/USER/vibeqc-queue/vibe-queue/contrib/run-crystal.sh --properties prop.d3

The wrapper stages the input file as ./INPUT, runs mpirun -np N Pcrystal > out.out, restores any pre-existing INPUT on exit.

ORCA 6.1¶

ORCA spawns its own MPI internally, don’t wrap with mpirun:

vq submit -d ./orca_run --cpus 8 -- orca input.inp

ORCA reads --cpus-equivalent info from the ! PAL N line in the input file; declare --cpus N matching for cgroup accounting.

PySCF (as a comparison / parity reference)¶

vq submit my_pyscf_script.py                # PySCF is in both vibe-qc venvs

Both the dev and release vibe-qc venvs have PySCF installed (it’s in [test]), so PySCF scripts submit the same way as vibe-qc scripts.

Monitoring + management¶

# Snapshot the queue:
vq queue                                  # all states
vq queue --active                         # running + pending + suspended
vq queue -s running                       # only running (v0.5.27)
vq queue -s running -s pending            # explicit two-state filter
vq queue -s failed -s killed              # terminal-failure forensics

# Per-job snapshot (metadata + tail of stdout/stderr):
vq status <jobid>                         # last 50 lines
vq status <jobid> -n 200                  # last 200 lines
vq status <jobid> -n 0                    # full output

# Live tail of a workspace file (v0.5.26):
vq tail <jobid>                           # follow stdout.log
vq tail <jobid> -f                        # live-stream (Ctrl-C to stop)
vq tail <jobid> --name vibeqc.log -f      # custom logger file
vq tail <jobid> --name mgo.out -f         # CRYSTAL output
vq tail <jobid> --name h2.out -f          # ORCA / Psi4 output

# Fetch outputs back to the laptop (live job: workspace dir;
# completed: workspace dir; archived: un-tars from the .tar.bz2):
vq fetch <jobid> -o ./results

# Cancel:
vq kill <jobid>                           # SIGTERM the process group,
                                          # then SIGKILL after grace
                                          # → terminal state KILLED

# Pause / resume (v0.5.1+):
vq pause <jobid>                          # SIGSTOP the job
vq resume <jobid>                         # SIGCONT
vq pause --all                            # pause every running job
vq resume --all                           # resume every paused job

vq tail is the canonical “watch the SCF converge live” verb: it execs tail -f directly (locally) or via ssh (remotely), so SIGINT goes straight through and there’s no Python buffering layer between the job’s logger and your terminal. Use --name to target whatever file vibe-qc’s logger is writing to (e.g. logging.basicConfig(filename='vibeqc.log') → vq tail JOBID --name vibeqc.log -f).

The pause / resume flow is the right tool when you need to free the box temporarily (kids gaming, an interactive workload) without losing in-flight jobs. For automated venv refresh use vq admin update instead (it pauses-pulls-builds-resumes in one verb; see Refreshing the remote vibe-qc venv below).

Web dashboard¶

If vq-web.service is running, open http://<remote>:8765/queue in a browser. First-time access prompts for the bearer token (stored at ~/.config/vq/web-token on the remote).

Endpoints:

Endpoint	Purpose
`GET /queue`	Live queue table (htmx auto-refresh)
`GET /jobs/<id>`	Per-job detail: spec, resource history, log tail, exit status
`GET /health/{live,ready}`	Kubernetes-style probes for external monitoring
`POST /api/v1/jobs/<id>/{kill,pause,resume}`	Per-job write actions (v0.5.1+)
`POST /api/v1/queue/{pause,resume}`	Queue-wide actions (v0.5.2+)

All write endpoints require the bearer token in an Authorization: Bearer <token> header. For browser use, htmx + a small form prompts once and stores it in sessionStorage.

Architecture detail (auth, request shapes, error handling) is in vibe-queue/docs/web.md.

Fetching outputs¶

When a job completes, the workspace on the remote contains the outputs your script wrote (water.out, water.molden, …) plus the queue-side capture files (stdout.log, stderr.log, _vq/events.jsonl, _vq/exit-code).

vq fetch <jobid> ./local_outputs/       # rsync the whole workspace
vq fetch <jobid> ./outputs/ --files stdout.log water.out
                                         # specific files only

vq fetch is archive-aware (v0.5.11+): if the workspace was archived via vq cleanup --archive (see next section), fetch streams the .tar.bz2 over SSH and reconstructs the original directory layout on the laptop. No special flag needed; the same vq fetch <jobid> <local-dir> command works for both live and archived workspaces.

Operator controls (pause / resume / throttle / drain)¶

When the box gets busy for non-queue reasons (kids gaming, an interactive session, an urgent job from another chat), three knobs let vq step aside without losing in-flight work:

# Hard freeze (SIGSTOP); RAM stays allocated, no CPU used.
vq pause <jobid>             # one job
vq pause --all               # every running job
vq resume <jobid>             # SIGCONT
vq resume --all

# Soft throttle (cgroup CPUWeight, renice fallback v0.5.21+).
# weight=100 is default; weight=20 = "step aside" under contention.
vq throttle <jobid> --weight 20
vq throttle --all --weight 20 --persist     # persist across new dispatches
vq throttle --all --weight 20 --persist --duration 2h   # auto-release after 2h
vq throttle --release-persist                # clear persistent state
vq throttle --status                         # what's the current state?

# Drain (don't dispatch NEW jobs; running ones continue).
vq drain                     # full drain (no new dispatches)
vq drain --max-jobs 0        # explicit full drain
vq drain --max-jobs 2        # partial drain (cap at 2 concurrent)
vq drain --release           # back to daemon's configured max
vq drain --duration 1h       # auto-release after 1h
vq drain --status

Composable: vq drain + vq pause --all + vq throttle --all cover the operator-control story. All four state files (drain.json, throttle.json, auto-cleanup.json, plus the per-job suspended-state on the spec) live under <state_root> and survive daemon restarts.

Workspace cleanup (v0.5.10+)¶

Long-running queues accumulate workspaces. vq cleanup is the manual housekeeping verb; it operates only on terminal-state jobs (active / pending / suspended jobs are never touched).

# List terminal-state jobs and their workspace ages:
vq cleanup
# → table: jobid, terminal_state, finished_at, workspace_size_mb

# Dry-run preview — show what would be archived:
vq cleanup --archive --older-than 30d

# Actually archive (add -x to "execute"):
vq cleanup --archive --older-than 30d -x
# → workspaces become tar.bz2 files under <state_root>/archive/

# Hard delete archived workspaces older than 90 days:
vq cleanup --delete --older-than 90d -x

# Restore an archived workspace (un-tar in place):
vq cleanup --restore <jobid> -x

The archive→restore round-trip is lossless: the directory tree after --restore is byte-identical to what was archived.

Auto-policy (v0.5.17+): instead of running the verb manually, register a daemon-side policy:

# Daemon runs the sweep once per --interval (default 24h):
vq cleanup --auto-enable --archive-after 30d --delete-after 90d

# Per-state retention (v0.5.23+) — keep failed-job forensics longer:
vq cleanup --auto-enable --archive-after 30d \
           --archive-after-state failed:90d --delete-after 180d

# Read-only status:
vq cleanup --auto-status

# Disable:
vq cleanup --auto-disable

Configurable archive location (v0.5.22+): default <state_root>/archive/ may live on a small partition. Override via:

$VQ_ARCHIVE_DIR env var on the daemon host (applies to all archive paths globally)
--archive-dir DIR flag on the verb (per-policy with --auto-enable, per-invocation with one-shot --archive)

Why this matters: when the queue gets busy, workspaces add up fast (~10s of MB per typical SCF, ~hundreds of MB for big periodic + Molden + cube + .traj). Without cleanup, the <state_root> filesystem fills. With cleanup, you get a straightforward archive → delete pipeline that preserves the artefact history (every spec + final outputs) at small storage cost (~5× compression for typical output mixes).

Daemon admin¶

What happens at host reboot¶

The daemon survives if loginctl enable-linger is set:

Daemon restart only, running jobs become orphans with their pgids preserved; the new daemon re-attaches at startup. Job completes normally; exit code is read from the dispatched job’s _vq/exit-code file (so re-attach works even after a restart that wiped the Popen handle). This is v0.5.9’s _vq/exit-code marker, pre-v0.5.9 restart-orphans got marked ABORTED_BY_QUEUE even on clean completion.
Full host reboot, kernel kills everything; all RUNNING jobs are marked ABORTED_BY_QUEUE on next daemon start. Resubmit using the JobSpecs in the queue history.

Note

Wall-time enforcement gap when the daemon is down. Because v0.5.8 dropped cgroup RuntimeMaxSec (it wasn’t runtime-mutable on pause; see vibe-queue/docs/wall_time_design.md), wall-time enforcement is now the Python watchdog only. If the daemon crashes and stays down beyond the watchdog’s poll interval, a job that should have hit its --wall-time-seconds cap during the outage isn’t killed by the kernel — it keeps running until the daemon comes back and the watchdog catches up. In practice, Restart=on-failure on the systemd unit keeps the gap to a few seconds. The trade-off is documented in vibe-queue/docs/wall_time_design.md.

Refreshing the remote vibe-qc venv after a release (v0.5.20+)¶

As of vq v0.5.20, this is one verb:

vq admin update vibeqc-release

Which does pause-all → git -C <git_dir> pull → bash <update_script> → resume-all (always, even on Ctrl-C or pull failure, resume is in a finally block so the queue always comes back up). Reads git_dir / branch / update_script from the host’s [programs.X] registry (see vq programs below).

Verifying a tagged release (v0.5.24+):

git push --tags
vq admin update vibeqc-release --tag v0.8.0
vq submit smoke_test.py --branch release

--tag v0.X.Y runs git describe --exact-match --tags HEAD after the pull and fails the update (skips the build, exits non-zero) if HEAD isn’t exactly at the expected tag, the libint-vanishing class of “pull succeeded but landed on the wrong commit” failures.

Checking remote state (v0.5.25+):

vq admin status
# NAME            BRANCH   SHA           DESCRIBE             DIRTY  LAST_UPDATED_AT             LAST OK
# vibeqc-dev      main     abc12345defg  v0.7.3-12-gabc1234   no     2026-05-13T14:30:00+00:00   True
# vibeqc-release  release  fedcba987654  v0.8.0                no     2026-05-13T14:35:12+00:00   True

Compare SHA to your laptop’s git rev-parse --short=12 HEAD to answer “is planetx at the commit I just pushed?” without ssh.

Chat workflow for testing a just-pushed feature:

git push                                    # laptop
vq admin update vibeqc-dev                  # refresh planetx
vq submit my_feature_test.py --branch main  # exercise the new code

This is the canonical pattern, always vq admin update between push and submit if you need planetx at your latest commit.

vq programs (v0.5.18+), list registered programs:

vq programs              # human-readable table
vq programs --json       # machine-readable; for scripts

The registry lives at ~/.config/vq/config.toml on the remote host under [programs.X]. Three kinds:

binary, CRYSTAL, ORCA, Psi4 (an executable on disk)
venv, vibeqc-dev, vibeqc-release (a Python venv + git checkout that vq admin update knows how to refresh)
import, pyscf (a module that should be importable from a specific Python)

venv records can also declare import_check, import_symbols, and healthcheck_command. vq programs runs the healthcheck from the program’s git_dir and reports the program as missing if it exits non-zero. This is useful for tools whose real readiness is more than an import, such as headless rendering:

[programs.vibeview-dev]
kind = "venv"
python = "/home/USER/vibeqc-dev/.venv-vibeview/bin/python"
git_dir = "/home/USER/vibeqc-dev"
branch = "main"
update_script = "scripts/update_vibeview_capture_env.sh"
import_check = "vibeview"
healthcheck_command = "xvfb-run -a .venv-vibeview/bin/vibe-view capture-selftest"
description = "vibe-view headless capture environment (main branch)"

After adding that program to the host config and updating the queue code, provision or refresh it through the normal managed path:

vq admin update vibeview-dev HOST --show-output
vq programs HOST

Watching the daemon¶

journalctl --user -u vq-daemon -f       # live tail
systemctl --user status vq-daemon       # service health

Concurrency¶

The default daemon configuration is single-job dispatch (--max-jobs 1 in the systemd unit). This is the test-phase default, change to --max-jobs N in the unit file’s ExecStart and restart the daemon to parallel-dispatch.

Set --max-jobs honestly against the CPU budget: if jobs declare --cpus 8 and the box has 32 cores, --max-jobs 4 is the safe ceiling. The daemon does not currently enforce this; it accepts whatever you set.

Troubleshooting¶

Symptom	Likely cause	Fix
`vq: command not found` on laptop	venv not on PATH	symlink to `~/.local/bin/vq` or add the venv bin to PATH
`ssh: command not found` on submit	local SSH client not installed	install OpenSSH client
`Permission denied (publickey)` on submit	SSH key not authorised on remote	add laptop’s `~/.ssh/id_*.pub` to remote’s `~/.ssh/authorized_keys`
Job hangs in `QUEUED` indefinitely	daemon not running or `--max-jobs 0`	`systemctl --user status vq-daemon`; restart if dead
Job terminates `STARVED` at the 5-min mark	pre-v0.5.12, the watchdog read CPU from the wrapper PID only (bash sleeping in `wait()` shows 0% CPU even when the child process is using 16 cores)	upgrade to vq v0.5.12+; the watchdog now sums CPU across the whole pgid descendant set. As a workaround on older versions: `exec command` inside the wrapper so the worker becomes the dispatched PID
`import vibeqc` fails on remote	wrong `remote_python` (system Python instead of vibe-qc venv)	check `~/.config/vq/config.toml` `[hosts.X].remote_python`
Web UI says “401 unauthorised”	bearer token expired or wrong	re-read `~/.config/vq/web-token` on remote; for browsers, clear sessionStorage

Comprehensive troubleshooting table in vibe-queue/docs/handover.md § Troubleshooting.

Version history (recent)¶

vq version	Headline
v0.11.0	`vq submit auto`, per-job `--refresh`, serial-by-default fleet updates, and no-host per-job lookup around a down default host
v0.10.0	`vq host down/up/list`, client-side host availability hints for fan-out and submit routing
v0.9.2	`vq status` shows a PENDING job’s queue position
v0.9.1	`vq top --watch` + `--json`
v0.9.0	`vq top`, live per-job resource view
v0.5.27	`vq queue --state STATE` + `--active` shortcut, filter the listing
v0.5.26	`vq tail [HOST] JOBID --name FILENAME -f`, live-stream workspace files (default stdout.log; `--name` for vibe-qc’s logger output / engine-native files)
v0.5.25	`vq admin status`, live SHA / DESCRIBE / DIRTY + last-update record per venv env
v0.5.24	`vq admin update --tag v0.X.Y`, verify HEAD is at the expected tag post-pull
v0.5.23	per-state retention overrides: `--archive-after-state failed:7d`
v0.5.22	configurable archive_dir (`$VQ_ARCHIVE_DIR` env + `--archive-dir` flag + `AutoCleanupPolicy.archive_dir`)
v0.5.21	`renice` fallback for `vq throttle` on non-cgroup hosts
v0.5.20	`vq admin update <env>` minimal, one verb for pause / git pull / build / resume
v0.5.19	smoke test consumes absolute paths from `vq programs --json`
v0.5.18	`vq programs` verb + `[programs.X]` registry (binary / venv / import kinds)
v0.5.17	auto-cleanup policy (daemon main-loop hook reads `auto-cleanup.json`)
v0.5.13-.16	`vq throttle` / `vq drain` + `--persist` + `--duration` auto-release
v0.5.12	watchdog samples pgid descendants (fixes STARVED false-positive when bash-wrapped jobs sleep in `wait()`)
v0.5.11	archive-aware remote `vq fetch` from the laptop (v0.5.10’s cleanup verb produced archives the SSH-side fetch couldn’t see)
v0.5.10	`vq cleanup` verb (archive / delete / restore terminal-state workspaces)
v0.5.9	orphan exit-code recovery via `_vq/exit-code` marker, restart-orphans that complete normally are now `COMPLETED`/`FAILED`, not `ABORTED_BY_QUEUE`
v0.5.8	drop broken cgroup `RuntimeMaxSec`; Python watchdog is the single owner of wall-time
v0.5.7	`run-crystal.sh` cleans per-rank scratch on success; `--keep-scratch` opt-out
v0.5.6	`--branch` multi-venv routing (vibeqc-dev vs vibeqc-release)
v0.5.0-.5	web dashboard, pause/resume, bearer-token auth, CRYSTAL14 parallel dispatch
v0.4	cgroup-v2 enforcement, pgid recovery, event log
v0.3	resource watchdog (mem cap, wall-time, terminal-state machine)

Full per-version detail at vibe-queue/docs/handover.md § “What’s NEW in …” (the handover is the deeper reference; this page is the user-facing entry).

Roadmap (vq’s own)¶

vq has its own roadmap independent of vibe-qc, see vibe-queue/docs/roadmap.md. Near-term:

v0.11.x, harden memory-aware auto-placement and refresh-before-run against fleet edge cases.
v0.12+, continue the PBS / HPC-dispatcher work without changing the single-node vq submit shape.
v1.0, SLURM / PBS backend so the same vq submit shape works against HPC clusters.