Chapter 11

Troubleshooting

Diagnosing and resolving common startup, licensing, collection, and UI issues.

11.1 Startup Issues

API refuses to start — "API_KEY not set"

MeshOptixIQ v0.9.0 ships with a community-plan fallback: if neither API_KEY nor a license key is present, the API starts in community mode without requiring an API key. If you see a validation error at startup, ensure you are running the latest container image.

Neo4j connection refused at startup

The API performs a connectivity check in the lifespan hook. If Neo4j is not reachable the process exits immediately with a log line similar to:

ERROR:     Neo4j connection failed: ServiceUnavailable: ...

Resolution steps:

  1. Confirm Neo4j is running: docker ps | grep neo4j or kubectl get pods -l app=neo4j.
  2. Verify NEO4J_URI, NEO4J_USER, and NEO4J_PASSWORD match the database credentials.
  3. When the API and Neo4j run in separate Docker networks, add the API container to the same network or use the correct service hostname.
  4. Set GRAPH_BACKEND=inmemory to start without any database (data will not persist).

Cython licensing module missing

If the compiled .so is absent the API falls back to the Python source module. A warning is logged:

WARNING:  Licensing .so not found; using Python fallback

This is harmless in development. For production, run:

python3 setup_licensing.py build_ext --inplace

Port 8000 already in use

ERROR:    [Errno 98] Address already in use

Find and stop the conflicting process: lsof -ti :8000 | xargs kill -9, or change the port via uvicorn … --port 8001 and set VITE_API_URL=http://localhost:8001 for the UI.

11.2 License Issues

Feature gated — "requires pro or higher"

The API returns HTTP 402 when a feature is not available on the current plan. Check your plan:

curl -s http://localhost:8000/health/license | jq .
# → {"plan":"community","expires":null,"days_remaining":null,"demo_mode":false}

Upgrade your license key or contact .

License expired

When fewer than 30 days remain, the API logs a WARNING on every startup. At expiry the plan reverts to community. The /health/license endpoint returns days_remaining: 0 (or a negative value). Renew your key at the customer portal and update MESHOPTIXIQ_LICENSE_KEY.

License invalid / 403 from license server

Possible causes:

  • Key does not exist in the registry (typo or key was never issued).
  • Key was revoked (contact support).
  • Clock skew — ensure the host time is accurate (within a few minutes).
  • Firewall blocking outbound HTTPS to api.meshoptixiq.com.

Test connectivity: curl -I https://api.meshoptixiq.com/v1/license/validate

Plan does not update after key change

The API caches the resolved plan in memory. Force a reload:

curl -X POST http://localhost:8000/admin/license/reload \
  -H "X-API-Key: $MESHOPTIXIQ_API_KEY"

Or restart the API container.

11.3 Collection Failures

SSH authentication error

ERROR  SSH auth failed for 192.168.1.1: Authentication failed.

Verify credentials in your inventory file. Test manually: ssh -i /path/to/key user@192.168.1.1. Confirm the device allows key-based or password authentication for the configured user.

SSH timeout / no route to host

Ensure the collection host has layer-3 reachability to the device management interface. Check SSH_TIMEOUT (default: 30 s) and increase it for slow WAN links.

Parser returns empty results

The vendor parser matched no data. Common causes:

  • Wrong vendor field in the inventory (e.g., cisco_ios vs cisco_nxos).
  • The collected command output is from a newer OS version whose format differs from the parser patterns.
  • The raw output file is truncated — increase SSH buffer size.

Inspect the raw capture: cat /tmp/meshq_raw_<device>.txt (written when LOG_LEVEL=DEBUG).

Ingest skips devices silently

Set LOG_LEVEL=DEBUG and re-run. Every device parse step is logged at DEBUG level. Watch for SKIP prefixed log entries which include the reason.

Distributed worker stuck

If a worker process crashed mid-collection, the device remains in the processing hash past its TTL. Run the recovery endpoint:

curl -X POST http://localhost:8000/collect/recover \
  -H "X-API-Key: $MESHOPTIXIQ_API_KEY"

This re-queues any device whose processing entry is older than 10 minutes.

11.4 API Errors

HTTP 401 — missing or invalid API key

All authenticated endpoints require the X-API-Key header or ?api_key= query parameter. Confirm the value matches API_KEY in the API environment.

HTTP 429 — rate limit exceeded

Default limits: 60 requests/minute for queries, 10 requests/minute for the what-if endpoint. Wait for the window to reset or reduce request frequency. In clustered mode, limits are shared across all API pods via Redis.

HTTP 422 — unprocessable entity

Request body failed FastAPI / Pydantic validation. The response body contains a detail array listing each field error with the location, message, and expected type.

Query timeout / 504 from ingress

Large graph traversals can exceed the nginx proxy timeout (default: 60 s). Either:

  • Add an index on the relevant Neo4j property: CREATE INDEX FOR (d:Device) ON (d.hostname).
  • Increase nginx proxy_read_timeout; the cluster Helm chart sets it to 3600 s for SSE routes.
  • Narrow the query with additional filter parameters.

SSE stream drops immediately

EventSource connections are dropped by some proxies that buffer responses. Ensure the ingress passes X-Accel-Buffering: no (nginx) or equivalent. The UI will fall back to 30-second polling automatically, but the live indicator in the TopBar will remain grey.

CORS errors in browser

By default the API allows all origins (*). If you have restricted CORS_ORIGINS, add your UI origin. When using ?api_key= in SSE requests, ensure the query param is not stripped by the proxy.

11.5 UI Issues

Topology does not load

Open the browser developer console. A network error on /queries/execute usually means the UI VITE_API_URL is incorrect or the API is unreachable. Re-check the value in ui/.env.local and ensure CORS allows the browser origin.

Dark / light mode does not persist

The theme preference is stored in localStorage under the key meshq-theme. Private browsing sessions do not persist localStorage; the theme will reset on each tab.

Command palette (Cmd+K) does nothing

The keyboard shortcut requires focus inside the main <body>. If a dialog or an <input> has focus, the shortcut is captured by the element. Click on an empty area of the page first.