Monitoring
This guide covers monitoring your Monolythium node for health and performance.
Key Metrics
Node Health
| Metric | What to Monitor |
|---|---|
| Process running | Is monod process alive |
| Block height | Is it increasing |
| Sync status | Is node caught up |
| Peer count | Are peers connected |
System Resources
| Metric | Alert Threshold |
|---|---|
| CPU usage | > 80% sustained |
| Memory usage | > 90% |
| Disk usage | > 80% |
| Disk I/O | High sustained |
Validator-Specific
| Metric | What to Monitor |
|---|---|
| Missed blocks | Any misses |
| Signing status | Pre-commits signed |
| Jailed status | Not jailed |
Prometheus Metrics
Monod exposes Prometheus metrics when enabled.
Enable Metrics
Edit config.toml:
[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"
Key Metrics
| Metric | Description |
|---|---|
tendermint_consensus_height | Current block height |
tendermint_consensus_validators | Number of validators |
tendermint_consensus_missing_validators | Validators not signing |
tendermint_p2p_peers | Connected peer count |
tendermint_mempool_size | Pending transactions |
Prometheus Setup
Install Prometheus
# Download
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-*.tar.gz
cd prometheus-*
# Configure
cat > prometheus.yml <<EOF
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'monod'
static_configs:
- targets: ['localhost:26660']
EOF
# Run
./prometheus
Access
Open http://localhost:9090 in browser.
Grafana Dashboards
Install Grafana
# Ubuntu
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install grafana
# Start
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Access
Open http://localhost:3000 (default: admin/admin)
Dashboard Setup
- Add Prometheus data source
- Import Cosmos/Tendermint dashboard
- Customize for your needs
Alerting
Prometheus Alertmanager
# alertmanager.yml
global:
smtp_smarthost: 'smtp.example.com:587'
smtp_from: 'alerts@example.com'
route:
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'operator@example.com'
Alert Rules
# alerts.yml
groups:
- name: monod
rules:
- alert: NodeDown
expr: up{job="monod"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Monod node is down"
- alert: MissedBlocks
expr: tendermint_consensus_missing_validators > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Validator missing blocks"
- alert: LowPeers
expr: tendermint_p2p_peers < 5
for: 5m
labels:
severity: warning
annotations:
summary: "Low peer count"
Command Line Checks
Node Status
# Check if synced
curl -s localhost:26657/status | jq '.result.sync_info'
# Check peers
curl -s localhost:26657/net_info | jq '.result.n_peers'
# Check health
curl -s localhost:26657/health
Validator Status
# Check signing info
monod query slashing signing-info $(monod tendermint show-validator)
# Check if jailed
monod query staking validator $(monod keys show validator --bech val -a) | grep jailed
Log Monitoring
View Logs
# Recent logs
sudo journalctl -u monod -n 100
# Follow logs
sudo journalctl -u monod -f
# Filter errors
sudo journalctl -u monod | grep -i error
Log Aggregation
Consider tools like:
- Loki + Grafana
- ELK Stack (Elasticsearch, Logstash, Kibana)
- Datadog
- Splunk
Health Check Script
#!/bin/bash
# health-check.sh
# Check process
if ! pgrep -x "monod" > /dev/null; then
echo "CRITICAL: monod not running"
exit 2
fi
# Check sync status
CATCHING_UP=$(curl -s localhost:26657/status | jq -r '.result.sync_info.catching_up')
if [ "$CATCHING_UP" = "true" ]; then
echo "WARNING: Node is catching up"
exit 1
fi
# Check peers
PEERS=$(curl -s localhost:26657/net_info | jq -r '.result.n_peers')
if [ "$PEERS" -lt 3 ]; then
echo "WARNING: Low peer count: $PEERS"
exit 1
fi
echo "OK: Node healthy"
exit 0
Uptime Monitoring
External services to monitor your node:
- UptimeRobot
- Pingdom
- StatusCake
- Custom scripts
Monitor RPC endpoint externally:
curl -s https://your-node:26657/health
Opt-In Node Monitoring
Monolythium provides an opt-in monitoring service that tracks your node's health and sends alerts via Telegram. This service is privacy-preserving: no IPs, hostnames, or RPC URLs are ever exposed publicly.
What Gets Monitored
| Data Sent | Purpose |
|---|---|
| Node ID | Unique identifier (derived from node_key.json) |
| Block height | Sync status tracking |
| Version info | monod and monoctl versions |
| Chain ID | Network verification |
| Capabilities | Archive, state-sync, snapshots |
Never sent: IP addresses, hostnames, RPC URLs, private keys.
Register Your Node
# Register with the monitoring service
monoctl monitor register --network Sprintnet --moniker "my-node" --role fullnode --home ~/.monod
# Roles: validator, seed, bootstrap, fullnode, gateway, snapshot
This generates an ed25519 keypair for signing heartbeats and returns a link token. Send this token to the @MonolythiumBot on Telegram to complete registration and receive alerts.
Install Automatic Heartbeats
# Install systemd timer (sends heartbeat every 30 seconds)
sudo monoctl monitor install --network Sprintnet --user $(whoami) --home ~/.monod
# Verify timer is active
sudo systemctl status monoctl-monitor@Sprintnet.timer
Manual Heartbeat
# Send a single heartbeat (useful for testing)
monoctl monitor heartbeat --network Sprintnet --home ~/.monod
Control Visibility
By default, registered nodes are private and don't appear on the public explorer. To make your node visible:
# Make node public (visible on explorer)
monoctl monitor visibility --public --network Sprintnet --home ~/.monod
# Make node private again
monoctl monitor visibility --private --network Sprintnet --home ~/.monod
Public nodes appear at monoscan.xyz/sprintnet/nodes.
Uninstall
# Remove the systemd timer
sudo monoctl monitor uninstall --network Sprintnet
Telegram Bot Commands
Once registered, you can interact with @MonolythiumBot:
| Command | Description |
|---|---|
/start | Welcome message and setup |
/help | List available commands |
/status | View your registered nodes |
/pause <moniker> <minutes> | Pause alerts (e.g., /pause my-node 60) |
/resume <moniker> | Resume alerts for a node |
/deregister <moniker> | Remove node from monitoring |
Pause alerts when doing maintenance:
/pause my-node 60
This pauses alerts for 60 minutes (max 24 hours). Use /resume my-node to resume earlier.
Alert Types
The bot sends alerts when:
- Node goes offline (no heartbeat for 2+ minutes)
- Node starts lagging behind canonical height
- Node becomes stalled (height not increasing)
Best Practices
- Monitor 24/7 - Use automated monitoring
- Set up alerts - Know about issues immediately
- Test alerts - Verify notifications work
- Monitor trends - Catch issues before they're critical
- Keep dashboards - Visualize health at a glance
- Log retention - Keep logs for troubleshooting
FAQ
What's the minimum monitoring setup?
At least: process monitoring, disk space alerts, and external ping.
Should I expose metrics publicly?
No. Keep Prometheus metrics internal or behind authentication.
How often should I check?
Automated: every 15-30 seconds. Manual: at least daily review of dashboards.
Related
- Requirements - Node requirements
- Best Practices - Operational excellence