Validator Best Practices
This guide covers operational best practices for running a reliable validator.
Infrastructure
Hardware
| Practice | Recommendation |
|---|---|
| Use quality hardware | Don't cut corners on critical infrastructure |
| Provision headroom | 2x minimum specs for safety margin |
| Use NVMe storage | SSD at minimum, NVMe preferred |
| Monitor hardware health | Track disk, CPU, memory, network |
Network
| Practice | Recommendation |
|---|---|
| Use dedicated IP | Static IP required for P2P |
| Configure firewall | Only open necessary ports |
| Use redundant connectivity | Multiple network paths if possible |
| Monitor latency | High latency affects consensus participation |
Hosting
| Practice | Recommendation |
|---|---|
| Choose reliable providers | Track record matters |
| Geographic diversity | Consider disaster scenarios |
| Avoid shared hosting | Dedicated servers preferred |
| Plan for scaling | Storage will grow |
Security
Key Management
| Practice | Recommendation |
|---|---|
Secure priv_validator_key.json | Most critical file |
| Use hardware security | HSM for production validators |
| Backup keys securely | Offline, encrypted, multiple locations |
| Never copy keys between active nodes | Causes double-signing |
Access Control
| Practice | Recommendation |
|---|---|
| Use SSH keys only | Disable password auth |
| Restrict SSH access | Whitelist IPs if possible |
| Use non-root user | Run node as unprivileged user |
| Enable fail2ban | Block brute force attempts |
Network Security
| Practice | Recommendation |
|---|---|
| Minimal open ports | 26656 required, others as needed |
| Use firewall | iptables, ufw, or cloud firewall |
| Consider sentry architecture | Protects validator from DDoS |
| Monitor for intrusion | Log analysis, IDS |
Sentry Architecture
For production validators:
Internet
|
+-----------+-----------+
| | |
Sentry1 Sentry2 Sentry3
| | |
+-----------+-----------+
|
Private Network
|
Validator
Sentry Configuration
# On sentries
pex = true
persistent_peers = "validator_id@private-ip:26656"
private_peer_ids = "validator_id"
Validator Configuration
# On validator
pex = false
persistent_peers = "sentry1_id@ip:26656,sentry2_id@ip:26656"
Monitoring
Essential Metrics
| Metric | Alert Threshold |
|---|---|
| Node health | Any failure |
| Block height | Not increasing |
| Peers | Below 5 |
| Disk space | Below 20% |
| Memory | Above 90% |
| CPU | Sustained above 80% |
| Missed blocks | Any |
Tools
| Tool | Use Case |
|---|---|
| Prometheus | Metrics collection |
| Grafana | Visualization |
| PagerDuty/OpsGenie | Alerting |
| Tendermint metrics | Native node metrics |
Alerting
Configure alerts for:
- Node offline
- Missing blocks
- Low peers
- High resource usage
- Disk space warning
- Upgrade announcements
Upgrades
Binary Upgrades
- Monitor for upgrade announcements
- Test new binary in advance
- Use Cosmovisor for automatic upgrades
- Have recovery plan (forward-only recovery)
Using Cosmovisor
# Directory structure
~/.monod/cosmovisor/
├── genesis/bin/monod
├── upgrades/
│ └── upgrade-name/bin/monod
└── current -> genesis (symlink)
Cosmovisor automatically switches binaries at upgrade height.
Upgrade Checklist
- Test binary on testnet first
- Verify binary checksums
- Prepare Cosmovisor if using
- Schedule maintenance window
- Have team on standby
- Monitor during upgrade
Backup and Recovery
What to Backup
| Item | Frequency | Storage |
|---|---|---|
priv_validator_key.json | Once (secure) | Encrypted, offline |
node_key.json | Once | Secure storage |
| Operator keys | Once | Hardware wallet |
| Config files | After changes | Version control |
What NOT to Backup
- Chain data (re-sync instead)
- Temporary files
Recovery Procedure
- Provision new server
- Install binary
- Restore configuration
- Restore validator key (carefully!)
- Sync from genesis or snapshot
- Verify before starting validation
!!! danger "Avoid Double-Signing" Never run two validators with the same key simultaneously. Always stop the old node first.
Communication
With Delegators
| Practice | Recommendation |
|---|---|
| Provide contact info | Let delegators reach you |
| Announce maintenance | Advance notice |
| Post incident reports | Transparency builds trust |
| Share performance data | Uptime, rewards |
With Community
| Practice | Recommendation |
|---|---|
| Join validator chats | Coordinate with peers |
| Participate in governance | Vote and discuss |
| Respond to upgrade calls | Be prepared |
| Share knowledge | Help other validators |
Incident Response
Preparation
- Document runbooks
- Define escalation paths
- Practice recovery procedures
- Have backup contact methods
During Incident
- Acknowledge the incident
- Assess severity
- Follow runbooks
- Communicate status
- Escalate if needed
Post-Incident
- Document timeline
- Identify root cause
- Implement fixes
- Update procedures
- Share learnings
Checklist
Regular validator health check:
- Node is synced and producing blocks
- All monitoring is functioning
- Alerting is working (test it)
- Disk space is adequate
- Keys are securely backed up
- Software is up to date
- Security patches applied
- Team knows procedures
FAQ
How often should I check my validator?
Automated monitoring 24/7. Manual check weekly.
What uptime should I target?
99.9% or higher. Every missed block is lost revenue and reputation.
Should I run my own sentries or use a service?
Running your own gives more control. Services add complexity but may offer benefits.
How do I handle planned maintenance?
Brief maintenance (< jail threshold) is fine. Announce and choose low-traffic times.
Related
- Requirements - Infrastructure needs
- Slashing - Avoiding penalties
- Monitoring - Monitoring setup
- Upgrades - Upgrade procedures