Zum Hauptinhalt springen
Version: aktuell

My connector is offline

A connector normally flips to Online within 30 seconds of install. If it does not, or if a previously healthy connector drops to Offline, Degraded, or Needs recovery, use this runbook.

What the status means

  • Offline: the connector is not currently connected, but it may recover on its own after a transient network interruption or process restart.
  • Degraded: at least one runtime path is unhealthy, but not every connector path is down.
  • Needs recovery: the connector is no longer in a normal reconnect path. Treat this as durable identity loss, invalid bootstrap fallback, or another bounded recovery case.

Quick checks

  1. Is the host up? SSH or RDP into the connector machine; confirm Docker / the native service is running.
    • Docker: docker ps | grep vaultpam-connector — the container should be Up.
    • systemd: systemctl status vaultpam-connector.
  2. Can the host reach the control plane? From the connector machine:
    curl -v https://dev.euwarden.com/healthz
    If this fails, you have a network/DNS/firewall issue. Ports 443 (control plane HTTPS) and optionally 51820/udp (VPN reverse tunnel) must be open outbound.
  3. Is the enrolment token still valid? Tokens expire after 4 hours by default. If the connector has never come online and the token expired, generate a new one from Connectors → pick connector → Regenerate token.
  4. Did the connector lose its durable identity storage? A previously paired connector should restart from the persisted identity under /data or PAM_AGENT_DATA_DIR.
    • Docker: confirm the container still mounts the same named volume or bind mount at /data.
    • Native / VM: confirm the service still points at the same durable host path and that the connector user can read it.
    • Kubernetes: confirm the pod still mounts the expected PVC and was not redeployed onto emptyDir or another ephemeral layer.

Identify the failure class

1. First-boot bootstrap failure

Use this branch if the connector never paired successfully.

Common signals:

  • the connector never became Online
  • logs show token expiry, token revoke, CSR validation failure, or CA trust failure
  • the UI still shows an onboarding or activation state rather than ready

Recovery:

  1. Fix the trust, network, or token problem.
  2. Generate a fresh enrollment token if the original one expired or was revoked.
  3. Retry pairing.

2. Normal reconnect failure

Use this branch if the connector paired before and still has its durable identity.

Common signals:

  • temporary Offline after reboot, deploy, or network flap
  • cp_tunnel_heartbeat_timeouts_total alert activity
  • the local data directory is still present and readable

Recovery:

  1. Restore outbound HTTPS reachability to the control plane.
  2. Confirm the connector still has its local identity bundle on durable storage.
  3. Restart the connector process or pod once.
  4. Validate that the connector returns to Online without using a new enrollment token.

3. Durable identity loss or invalid bootstrap fallback

Use this branch if the connector paired before, but now behaves like a brand-new connector.

Common signals:

  • UI or API shows Needs recovery
  • logs show 401 ENROLLMENT_TOKEN_INVALID after a previously successful pairing
  • enrollment endpoint 401 alerts spike after restart
  • a sandbox connector pod is crash-looping while the sandbox lifecycle still reads ready
  • the durable storage mount was replaced, removed, or recreated empty

Recovery:

  1. Stop the connector or scale the pod to zero before changing credentials.
  2. Try to reattach the original durable storage first.
  3. If the original identity is gone, treat this as controlled reprovision:
    • revoke the stale bootstrap token if it still exists
    • mint a fresh single-use enrollment token
    • ensure durable storage is mounted correctly before starting again
    • pair exactly once with the new token
  4. Validate that subsequent restarts reuse the persisted identity instead of asking for another token.

Do not treat repeated bootstrap retries as a normal restart path.

Logs to inspect

  • Docker: docker logs -n 200 vaultpam-connector.
  • Native: /var/log/vaultpam/connector.log (Linux) or %PROGRAMDATA%\VaultPAM\connector.log (Windows).

Common errors:

Log fragmentMeaningFix
x509: certificate signed by unknown authorityHost does not trust the CAImport the CA cert bundle (/etc/vaultpam/ca.crt) — see the runbook printed during install
connection refusedNetwork blockedCheck corporate firewall outbound allowlist
enrolment token revokedSomeone clicked Revoke in the UIGenerate a new token
401 ENROLLMENT_TOKEN_INVALID after the connector was already paired onceThe connector lost its persisted identity and fell back to bootstrapReattach the original durable data directory if available. If the identity is gone, generate a fresh token and pair again on durable storage

Alert to runbook map

  • ControlPlaneConnectorBootstrapFallbackSuspectedHigh Treat this as a likely durable-identity loss case until proven otherwise.
  • ControlPlaneTunnelHeartbeatTimeoutsHigh Follow the reconnect-failure branch first.
  • SandboxConnectorCrashLoopHigh Confirm whether the sandbox still reports ready; if it does, this is a lifecycle versus runtime mismatch and usually points to identity loss or invalid runtime configuration.
  • SandboxConnectorCrashLoopBackOffCritical Investigate the pod logs immediately and validate the durable storage mount before retrying.

Post-recovery validation

After any fix:

  1. Confirm the connector reaches Online.
  2. Restart the connector one more time.
  3. Confirm it reconnects without a fresh enrollment token.
  4. Confirm the durable storage path still contains the connector state after the restart.
  5. Capture the evidence for audit:
    • connector status before and after
    • the log line that proves reconnect or renewal succeeded
    • any endpoint.cert_renewed or gateway.enrollment_failed audit entries tied to the incident
    • the alert name and time window if monitoring fired

Still stuck

Contact support and include: connector version, log tail, output of curl -v https://dev.euwarden.com/healthz.