Troubleshooting Guide

Common issues and solutions for MeshOptixIQ deployment and operation.

License Issues

Error: "No license key found"

Cause: License key not configured in environment or file system.

Solutions:

  1. Set environment variable (recommended for Docker):
    export MESHOPTIXIQ_LICENSE_KEY="mq-prod-xxxxxxxxxx"
    meshq version
  2. Create license file (recommended for persistent installations):
    mkdir -p ~/.meshoptixiq
    echo "mq-prod-xxxxxxxxxx" > ~/.meshoptixiq/license.key
    chmod 600 ~/.meshoptixiq/license.key
    meshq version

Error: "License has expired"

Cause: Your license expiration date has passed.

Solutions:

  1. Renew your license at: https://meshoptixiq.com/renew
  2. Update your license key using one of the methods above
  3. Restart the application or container

Error: "Device limit exceeded"

Cause: Your license tier allows a specific number of device installations, and you've reached that limit.

Solutions:

  1. Check your license tier and device limit in the customer dashboard
  2. Deactivate unused devices (contact for assistance)
  3. Upgrade your license at: https://meshoptixiq.com/upgrade

Warning: "Running in grace period"

Cause: Cannot reach license validation server (network connectivity issue).

Impact: Application continues working for 72 hours without server contact.

Solutions:

  1. Check internet connectivity:
    ping api.meshoptixiq.com
    curl https://api.meshoptixiq.com/health
  2. Check DNS resolution:
    nslookup api.meshoptixiq.com
    dig api.meshoptixiq.com
  3. Check firewall: Ensure HTTPS outbound (TCP/443) is allowed to api.meshoptixiq.com
  4. Once connectivity is restored, grace period automatically resets on next validation

CLI Issues

Error: "meshq: command not found"

Cause: Package not installed or not in system PATH.

Solutions:

  1. Verify installation:
    pip list | grep meshoptixiq
  2. Reinstall package:
    cd network_discovery
    pip install -e .
  3. Check PATH:
    which meshq
    echo $PATH

Error: "No data collected from any device"

Causes: SSH connectivity issues, incorrect credentials, or inventory syntax errors.

Solutions:

  1. Verify inventory YAML syntax:
    python -c "import yaml; yaml.safe_load(open('inventory.yaml'))"
  2. Test SSH manually:
    ssh username@device-ip
    # Try the exact credentials from your inventory file
  3. Check credentials: Verify username/password in inventory file match device configuration
  4. Verify firewall rules: Ensure SSH (TCP/22) is allowed from the MeshOptixIQ host to devices
  5. Check device logs: Look for authentication failures in device logs

Error: "Parser failed for device X"

Cause: Unexpected CLI output format from the device.

Solutions:

  1. Check vendor type: Ensure vendor field in inventory matches device type (e.g., cisco_ios, arista_eos)
  2. Check device version: Very old or very new device firmware may have different output formats
  3. Report issue: Contact with device type and version

API Issues

Error: 401 Unauthorized

Cause: Missing or invalid API key in the request.

Solutions:

  1. Include API key header:
    curl -H "X-API-Key: your-api-key" \
      https://api.meshoptixiq.com/queries/
  2. Verify API key: Check your API key in the customer dashboard
  3. Check key expiration: API keys may expire based on your license

API server fails to start

Note: As of v0.9.0, API_KEY is optional. When not set, the server starts in open-access mode — all requests are admitted without authentication. This is the recommended default for community self-hosted deployments.

To enable authentication, set a strong random key:

# Generate a key
openssl rand -hex 32

# Pass it to the container
docker run -e API_KEY=<generated-value> ... meshoptixiq/meshoptixiq:latest

For Docker Compose, add it to your .env file and reference it as API_KEY: ${API_KEY}.

Error: 504 Gateway Timeout

Cause: Query too complex or database performance issues.

Solutions:

  1. Simplify query: Reduce scope or use pagination for large result sets
  2. Use pagination:
    GET /queries/list_devices?limit=100&offset=0
  3. Check database performance: Verify Neo4j or PostgreSQL has adequate resources
  4. Contact support: If timeouts persist, contact

Docker Issues

Container Status: "Unhealthy"

Cause: Application startup failure or health check failure.

Solutions:

  1. Check container logs:
    docker logs meshoptixiq
    docker logs --tail 100 meshoptixiq
  2. Verify environment variables: Ensure license key and database URI are set correctly
  3. Test health endpoint:
    curl http://localhost:8000/health
  4. Check database connectivity:
    curl http://localhost:8000/health/ready

Error: "Cannot connect to Neo4j"

Cause: Database not accessible from container.

Solutions:

  1. Use correct hostname: From inside Docker, use host.docker.internal instead of localhost:
    docker run -e NEO4J_URI="bolt://host.docker.internal:7687" ...
  2. Check Neo4j is running:
    docker ps | grep neo4j
    # OR
    systemctl status neo4j
  3. Verify Neo4j password: Match NEO4J_PASSWORD env var with actual Neo4j password

Database Issues

Neo4j: High Memory Usage

Cause: Large graph data or inefficient queries.

Solutions:

  1. Increase heap size: Edit neo4j.conf:
    dbms.memory.heap.initial_size=4G
    dbms.memory.heap.max_size=8G
  2. Create indexes: Ensure common query patterns are indexed:
    CREATE INDEX FOR (d:Device) ON (d.hostname);
    CREATE INDEX FOR (i:Interface) ON (i.name);
  3. Archive old data: Consider archiving snapshots older than 90 days

PostgreSQL: Slow Query Performance

Cause: Missing indexes or insufficient resources.

Solutions:

  1. Check query execution plan:
    EXPLAIN ANALYZE SELECT * FROM devices WHERE hostname = 'switch-01';
  2. Ensure indexes exist: Check application logs for index creation confirmations
  3. Increase connection pool: Adjust max_connections in postgresql.conf if needed

Firewall Collection Issues

Error: "Parser failed" for firewall device

Cause: Unexpected CLI output format, unsupported firmware version, or wrong vendor type in inventory.

Solutions:

  1. Check vendor type: Ensure the vendor field in your inventory file matches one of the supported firewall vendors: paloalto_panos, juniper_srx, fortinet_fortios, cisco_asa
  2. Verify SSH user permissions: The collection account must have at minimum read access to security policies. For PAN-OS: vsys reader role. For Fortinet: readonly profile with system + firewall access.
  3. Check firmware version: Very old firmware may produce different CLI output. Check the supported version matrix in the vendor matrix appendix.
  4. Enable verbose logging: Set MESHOPTIXIQ_LOG_LEVEL=DEBUG and re-run collection to see the raw parser output.

Flow Collection Not Receiving Data

No flows appearing in Flow Analytics

Cause: Flow listener not enabled, wrong port, or router not exporting to the MeshOptixIQ host.

Solutions:

  1. Enable the sFlow listener: Set SFLOW_ENABLED=true in your environment (Enterprise license required). For NetFlow/IPFIX, configure your routers to export to UDP port 2055 (NetFlow v9) or 9995 (IPFIX). sFlow uses UDP port 6343.
  2. Open the UDP port:
    # Check if the listener is receiving packets
    sudo tcpdump -i eth0 udp port 6343 -c 10  # sFlow
    sudo tcpdump -i eth0 udp port 2055 -c 10  # NetFlow
  3. Verify router export config: Confirm the router exports to the MeshOptixIQ host IP. sFlow agent IP must be set correctly on the device.
  4. Check flow store status:
    curl -H "X-API-Key: your-key" http://localhost:8000/flows/status
    # Returns: {"enabled": true, "total_flows": 4521, "ring_buffer_capacity": 100000}

License Server Unreachable

What happens when the license server is unreachable

The API server includes a 72-hour grace period for offline operation.

  • 0–72 hours offline: All features work normally. A warning appears in logs every validation cycle. GET /health/license shows "grace_period_active": true.
  • 72+ hours offline: The API returns 402 Payment Required on plan-gated endpoints with {"detail": "license_required"}. The UI, /health, /health/license, and Community-tier features remain open.
  • After reconnection: Validation resumes automatically on the next 24-hour check cycle or on restart. Full functionality restores immediately after a successful check.

To diagnose:

# Check grace period status
curl http://localhost:8000/health/license
# {"plan":"pro","expires":"2027-01-01","days_remaining":300,"demo_mode":false,"grace_period_active":true,"grace_hours_remaining":48}

# Test connectivity to license server
curl -I https://api.meshoptixiq.com/health

# Force an immediate re-check
meshq license info --refresh

BGP Data Missing

BGP routing queries return empty results

Cause: BGP data requires SSH collection with parser support for the device vendor, and a Pro+ license with the bgp_intelligence feature flag.

Solutions:

  1. Verify license:
    meshq license info
    # Should show: bgp_intelligence: true
  2. Check vendor support: BGP collection is supported for Cisco IOS/IOS-XE, Arista EOS, and JunOS. Other vendors may not parse BGP state tables.
  3. Re-run collection: BGP peer state is collected via show bgp summary (Cisco/Arista) or show bgp neighbor (JunOS). Run meshq collect and check for parse errors in the logs.
  4. Verify graph data:
    curl -H "X-API-Key: your-key" \
      -X POST http://localhost:8000/queries/bgp_topology/execute \
      -H "Content-Type: application/json" \
      -d '{"parameters": {}}'

Diagnostic Scripts

Comprehensive Health Check

#!/bin/bash
# Save as health-check.sh

echo "=== MeshOptixIQ Health Check ==="
echo ""

# 1. Check license
echo "1. Checking license..."
meshq version || echo "ERROR: CLI not working"
echo ""

# 2. Check database connectivity
echo "2. Checking database..."
curl -s http://localhost:8000/health/ready | jq . || echo "ERROR: API not responding"
echo ""

# 3. Check license server connectivity
echo "3. Checking license server..."
curl -s https://api.meshoptixiq.com/health | jq . || echo "ERROR: Cannot reach license server"
echo ""

# 4. Check Docker containers
echo "4. Checking Docker containers..."
docker ps | grep -E "meshoptixiq|neo4j|postgres"
echo ""

# 5. Check disk space
echo "5. Checking disk space..."
df -h | grep -E "Filesystem|/$"
echo ""

echo "=== Health Check Complete ==="

FAQ

Q: How often should I run discovery?

A: For most networks, daily discovery (via cron or scheduled task) is sufficient. For highly dynamic environments, consider every 6-12 hours.

Q: Can I run multiple discovery agents?

A: Yes, but each installation counts toward your device limit. Pro plan allows 5 installations, Enterprise allows unlimited.

Q: How do I backup my graph data?

A: For Neo4j, use neo4j-admin dump. For PostgreSQL, use pg_dump. See Monitoring & Operations guide for details.

Q: What network permissions does MeshOptixIQ need?

A: Requires SSH (TCP/22) to network devices, HTTPS (TCP/443) to api.meshoptixiq.com for licensing, and connectivity to your graph database.

Getting Help

If you're still experiencing issues after trying these solutions:

Pro Tip: When contacting support, include:

  • Your license tier (Starter/Pro/Enterprise)
  • Operating system and version
  • Complete error messages from logs
  • Output from diagnostic health check script