My connector is offline
A connector normally flips to Online within 30 seconds of install. If it does not, or if a previously healthy connector drops to Offline, Degraded, or Needs recovery, use this runbook.
What the status means
- Offline: the connector is not currently connected, but it may recover on its own after a transient network interruption or process restart.
- Degraded: at least one runtime path is unhealthy, but not every connector path is down.
- Needs recovery: the connector is no longer in a normal reconnect path. Treat this as durable identity loss, invalid bootstrap fallback, or another bounded recovery case.
Quick checks
- Is the host up? SSH or RDP into the connector machine; confirm Docker / the native service is running.
- Docker:
docker ps | grep vaultpam-connector— the container should beUp. - systemd:
systemctl status vaultpam-connector.
- Docker:
- Can the host reach the control plane? From the connector machine:
If this fails, you have a network/DNS/firewall issue. Ports 443 (control plane HTTPS) and optionally 51820/udp (VPN reverse tunnel) must be open outbound.curl -v https://dev.euwarden.com/healthz
- Is the enrolment token still valid? Tokens expire after 4 hours by default. If the connector has never come online and the token expired, generate a new one from Connectors → pick connector → Regenerate token.
- Did the connector lose its durable identity storage? A previously paired connector should restart from the persisted identity under
/dataorPAM_AGENT_DATA_DIR.- Docker: confirm the container still mounts the same named volume or bind mount at
/data. - Native / VM: confirm the service still points at the same durable host path and that the connector user can read it.
- Kubernetes: confirm the pod still mounts the expected PVC and was not redeployed onto
emptyDiror another ephemeral layer.
- Docker: confirm the container still mounts the same named volume or bind mount at
Identify the failure class
1. First-boot bootstrap failure
Use this branch if the connector never paired successfully.
Common signals:
- the connector never became
Online - logs show token expiry, token revoke, CSR validation failure, or CA trust failure
- the UI still shows an onboarding or activation state rather than
ready
Recovery:
- Fix the trust, network, or token problem.
- Generate a fresh enrollment token if the original one expired or was revoked.
- Retry pairing.
2. Normal reconnect failure
Use this branch if the connector paired before and still has its durable identity.
Common signals:
- temporary
Offlineafter reboot, deploy, or network flap cp_tunnel_heartbeat_timeouts_totalalert activity- the local data directory is still present and readable
Recovery:
- Restore outbound HTTPS reachability to the control plane.
- Confirm the connector still has its local identity bundle on durable storage.
- Restart the connector process or pod once.
- Validate that the connector returns to
Onlinewithout using a new enrollment token.
3. Durable identity loss or invalid bootstrap fallback
Use this branch if the connector paired before, but now behaves like a brand-new connector.
Common signals:
- UI or API shows
Needs recovery - logs show
401 ENROLLMENT_TOKEN_INVALIDafter a previously successful pairing - enrollment endpoint 401 alerts spike after restart
- a sandbox connector pod is crash-looping while the sandbox lifecycle still reads
ready - the durable storage mount was replaced, removed, or recreated empty
Recovery:
- Stop the connector or scale the pod to zero before changing credentials.
- Try to reattach the original durable storage first.
- If the original identity is gone, treat this as controlled reprovision:
- revoke the stale bootstrap token if it still exists
- mint a fresh single-use enrollment token
- ensure durable storage is mounted correctly before starting again
- pair exactly once with the new token
- Validate that subsequent restarts reuse the persisted identity instead of asking for another token.
Do not treat repeated bootstrap retries as a normal restart path.
Logs to inspect
- Docker:
docker logs -n 200 vaultpam-connector. - Native:
/var/log/vaultpam/connector.log(Linux) or%PROGRAMDATA%\VaultPAM\connector.log(Windows).
Common errors:
| Log fragment | Meaning | Fix |
|---|---|---|
x509: certificate signed by unknown authority | Host does not trust the CA | Import the CA cert bundle (/etc/vaultpam/ca.crt) — see the runbook printed during install |
connection refused | Network blocked | Check corporate firewall outbound allowlist |
enrolment token revoked | Someone clicked Revoke in the UI | Generate a new token |
401 ENROLLMENT_TOKEN_INVALID after the connector was already paired once | The connector lost its persisted identity and fell back to bootstrap | Reattach the original durable data directory if available. If the identity is gone, generate a fresh token and pair again on durable storage |
Alert to runbook map
ControlPlaneConnectorBootstrapFallbackSuspectedHighTreat this as a likely durable-identity loss case until proven otherwise.ControlPlaneTunnelHeartbeatTimeoutsHighFollow the reconnect-failure branch first.SandboxConnectorCrashLoopHighConfirm whether the sandbox still reportsready; if it does, this is a lifecycle versus runtime mismatch and usually points to identity loss or invalid runtime configuration.SandboxConnectorCrashLoopBackOffCriticalInvestigate the pod logs immediately and validate the durable storage mount before retrying.
Post-recovery validation
After any fix:
- Confirm the connector reaches
Online. - Restart the connector one more time.
- Confirm it reconnects without a fresh enrollment token.
- Confirm the durable storage path still contains the connector state after the restart.
- Capture the evidence for audit:
- connector status before and after
- the log line that proves reconnect or renewal succeeded
- any
endpoint.cert_renewedorgateway.enrollment_failedaudit entries tied to the incident - the alert name and time window if monitoring fired
Still stuck
Contact support and include: connector version, log tail, output of curl -v https://dev.euwarden.com/healthz.