Skip to main content

Validator Best Practices

This guide covers operational best practices for running a reliable validator.

Infrastructure

Hardware

PracticeRecommendation
Use quality hardwareDon't cut corners on critical infrastructure
Provision headroom2x minimum specs for safety margin
Use NVMe storageSSD at minimum, NVMe preferred
Monitor hardware healthTrack disk, CPU, memory, network

Network

PracticeRecommendation
Use dedicated IPStatic IP required for P2P
Configure firewallOnly open necessary ports
Use redundant connectivityMultiple network paths if possible
Monitor latencyHigh latency affects consensus participation

Hosting

PracticeRecommendation
Choose reliable providersTrack record matters
Geographic diversityConsider disaster scenarios
Avoid shared hostingDedicated servers preferred
Plan for scalingStorage will grow

Security

Key Management

PracticeRecommendation
Secure priv_validator_key.jsonMost critical file
Use hardware securityHSM for production validators
Backup keys securelyOffline, encrypted, multiple locations
Never copy keys between active nodesCauses double-signing

Access Control

PracticeRecommendation
Use SSH keys onlyDisable password auth
Restrict SSH accessWhitelist IPs if possible
Use non-root userRun node as unprivileged user
Enable fail2banBlock brute force attempts

Network Security

PracticeRecommendation
Minimal open ports26656 required, others as needed
Use firewalliptables, ufw, or cloud firewall
Consider sentry architectureProtects validator from DDoS
Monitor for intrusionLog analysis, IDS

Sentry Architecture

For production validators:

                  Internet
|
+-----------+-----------+
| | |
Sentry1 Sentry2 Sentry3
| | |
+-----------+-----------+
|
Private Network
|
Validator

Sentry Configuration

# On sentries
pex = true
persistent_peers = "validator_id@private-ip:26656"
private_peer_ids = "validator_id"

Validator Configuration

# On validator
pex = false
persistent_peers = "sentry1_id@ip:26656,sentry2_id@ip:26656"

Monitoring

Essential Metrics

MetricAlert Threshold
Node healthAny failure
Block heightNot increasing
PeersBelow 5
Disk spaceBelow 20%
MemoryAbove 90%
CPUSustained above 80%
Missed blocksAny

Tools

ToolUse Case
PrometheusMetrics collection
GrafanaVisualization
PagerDuty/OpsGenieAlerting
Tendermint metricsNative node metrics

Alerting

Configure alerts for:

  • Node offline
  • Missing blocks
  • Low peers
  • High resource usage
  • Disk space warning
  • Upgrade announcements

Upgrades

Binary Upgrades

  1. Monitor for upgrade announcements
  2. Test new binary in advance
  3. Use Cosmovisor for automatic upgrades
  4. Have recovery plan (forward-only recovery)

Using Cosmovisor

# Directory structure
~/.monod/cosmovisor/
├── genesis/bin/monod
├── upgrades/
│ └── upgrade-name/bin/monod
└── current -> genesis (symlink)

Cosmovisor automatically switches binaries at upgrade height.

Upgrade Checklist

  • Test binary on testnet first
  • Verify binary checksums
  • Prepare Cosmovisor if using
  • Schedule maintenance window
  • Have team on standby
  • Monitor during upgrade

Backup and Recovery

What to Backup

ItemFrequencyStorage
priv_validator_key.jsonOnce (secure)Encrypted, offline
node_key.jsonOnceSecure storage
Operator keysOnceHardware wallet
Config filesAfter changesVersion control

What NOT to Backup

  • Chain data (re-sync instead)
  • Temporary files

Recovery Procedure

  1. Provision new server
  2. Install binary
  3. Restore configuration
  4. Restore validator key (carefully!)
  5. Sync from genesis or snapshot
  6. Verify before starting validation

!!! danger "Avoid Double-Signing" Never run two validators with the same key simultaneously. Always stop the old node first.

Communication

With Delegators

PracticeRecommendation
Provide contact infoLet delegators reach you
Announce maintenanceAdvance notice
Post incident reportsTransparency builds trust
Share performance dataUptime, rewards

With Community

PracticeRecommendation
Join validator chatsCoordinate with peers
Participate in governanceVote and discuss
Respond to upgrade callsBe prepared
Share knowledgeHelp other validators

Incident Response

Preparation

  1. Document runbooks
  2. Define escalation paths
  3. Practice recovery procedures
  4. Have backup contact methods

During Incident

  1. Acknowledge the incident
  2. Assess severity
  3. Follow runbooks
  4. Communicate status
  5. Escalate if needed

Post-Incident

  1. Document timeline
  2. Identify root cause
  3. Implement fixes
  4. Update procedures
  5. Share learnings

Checklist

Regular validator health check:

  • Node is synced and producing blocks
  • All monitoring is functioning
  • Alerting is working (test it)
  • Disk space is adequate
  • Keys are securely backed up
  • Software is up to date
  • Security patches applied
  • Team knows procedures

FAQ

How often should I check my validator?

Automated monitoring 24/7. Manual check weekly.

What uptime should I target?

99.9% or higher. Every missed block is lost revenue and reputation.

Should I run my own sentries or use a service?

Running your own gives more control. Services add complexity but may offer benefits.

How do I handle planned maintenance?

Brief maintenance (< jail threshold) is fine. Announce and choose low-traffic times.