Monitoring

This guide covers monitoring your Monolythium node for health and performance.

Key Metrics

Node Health

Metric	What to Monitor
Process running	Is monod process alive
Block height	Is it increasing
Sync status	Is node caught up
Peer count	Are peers connected

System Resources

Metric	Alert Threshold
CPU usage	> 80% sustained
Memory usage	> 90%
Disk usage	> 80%
Disk I/O	High sustained

Validator-Specific

Metric	What to Monitor
Missed blocks	Any misses
Signing status	Pre-commits signed
Jailed status	Not jailed

Prometheus Metrics

Monod exposes Prometheus metrics when enabled.

Enable Metrics

Edit config.toml:

[instrumentation]
prometheus = true
prometheus_listen_addr = ":26660"

Key Metrics

Metric	Description
`tendermint_consensus_height`	Current block height
`tendermint_consensus_validators`	Number of validators
`tendermint_consensus_missing_validators`	Validators not signing
`tendermint_p2p_peers`	Connected peer count
`tendermint_mempool_size`	Pending transactions

Prometheus Setup

Install Prometheus

# Download
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvf prometheus-*.tar.gz
cd prometheus-*

# Configure
cat > prometheus.yml <<EOF
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'monod'
    static_configs:
      - targets: ['localhost:26660']
EOF

# Run
./prometheus

Access

Open http://localhost:9090 in browser.

Grafana Dashboards

Install Grafana

# Ubuntu
sudo apt install -y apt-transport-https software-properties-common
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
echo "deb https://packages.grafana.com/oss/deb stable main" | sudo tee /etc/apt/sources.list.d/grafana.list
sudo apt update
sudo apt install grafana

# Start
sudo systemctl enable grafana-server
sudo systemctl start grafana-server

Access

Open http://localhost:3000 (default: admin/admin)

Dashboard Setup

Add Prometheus data source
Import Cosmos/Tendermint dashboard
Customize for your needs

Alerting

Prometheus Alertmanager

# alertmanager.yml
global:
  smtp_smarthost: 'smtp.example.com:587'
  smtp_from: 'alerts@example.com'

route:
  receiver: 'email'

receivers:
  - name: 'email'
    email_configs:
      - to: 'operator@example.com'

Alert Rules

# alerts.yml
groups:
  - name: monod
    rules:
      - alert: NodeDown
        expr: up{job="monod"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Monod node is down"

      - alert: MissedBlocks
        expr: tendermint_consensus_missing_validators > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Validator missing blocks"

      - alert: LowPeers
        expr: tendermint_p2p_peers < 5
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Low peer count"

Command Line Checks

Node Status

# Check if synced
curl -s localhost:26657/status | jq '.result.sync_info'

# Check peers
curl -s localhost:26657/net_info | jq '.result.n_peers'

# Check health
curl -s localhost:26657/health

Validator Status

# Check signing info
monod query slashing signing-info $(monod tendermint show-validator)

# Check if jailed
monod query staking validator $(monod keys show validator --bech val -a) | grep jailed

Log Monitoring

View Logs

# Recent logs
sudo journalctl -u monod -n 100

# Follow logs
sudo journalctl -u monod -f

# Filter errors
sudo journalctl -u monod | grep -i error

Log Aggregation

Consider tools like:

Loki + Grafana
ELK Stack (Elasticsearch, Logstash, Kibana)
Datadog
Splunk

Health Check Script

#!/bin/bash
# health-check.sh

# Check process
if ! pgrep -x "monod" > /dev/null; then
    echo "CRITICAL: monod not running"
    exit 2
fi

# Check sync status
CATCHING_UP=$(curl -s localhost:26657/status | jq -r '.result.sync_info.catching_up')
if [ "$CATCHING_UP" = "true" ]; then
    echo "WARNING: Node is catching up"
    exit 1
fi

# Check peers
PEERS=$(curl -s localhost:26657/net_info | jq -r '.result.n_peers')
if [ "$PEERS" -lt 3 ]; then
    echo "WARNING: Low peer count: $PEERS"
    exit 1
fi

echo "OK: Node healthy"
exit 0

Uptime Monitoring

External services to monitor your node:

UptimeRobot
Pingdom
StatusCake
Custom scripts

Monitor RPC endpoint externally:

curl -s https://your-node:26657/health

Opt-In Node Monitoring

Monolythium provides an opt-in monitoring service that tracks your node's health and sends alerts via Telegram. This service is privacy-preserving: no IPs, hostnames, or RPC URLs are ever exposed publicly.

What Gets Monitored

Data Sent	Purpose
Node ID	Unique identifier (derived from node_key.json)
Block height	Sync status tracking
Version info	monod and monoctl versions
Chain ID	Network verification
Capabilities	Archive, state-sync, snapshots

Never sent: IP addresses, hostnames, RPC URLs, private keys.

Register Your Node

# Register with the monitoring service
monoctl monitor register --network Sprintnet --moniker "my-node" --role fullnode --home ~/.monod

# Roles: validator, seed, bootstrap, fullnode, gateway, snapshot

This generates an ed25519 keypair for signing heartbeats and returns a link token. Send this token to the @MonolythiumBot on Telegram to complete registration and receive alerts.

Install Automatic Heartbeats

# Install systemd timer (sends heartbeat every 30 seconds)
sudo monoctl monitor install --network Sprintnet --user $(whoami) --home ~/.monod

# Verify timer is active
sudo systemctl status monoctl-monitor@Sprintnet.timer

Manual Heartbeat

# Send a single heartbeat (useful for testing)
monoctl monitor heartbeat --network Sprintnet --home ~/.monod

Control Visibility

By default, registered nodes are private and don't appear on the public explorer. To make your node visible:

# Make node public (visible on explorer)
monoctl monitor visibility --public --network Sprintnet --home ~/.monod

# Make node private again
monoctl monitor visibility --private --network Sprintnet --home ~/.monod

Public nodes appear at sprintnet.monoscan.xyz/nodes.

Uninstall

# Remove the systemd timer
sudo monoctl monitor uninstall --network Sprintnet

Telegram Bot Commands

Once registered, you can interact with @MonolythiumBot:

Command	Description
`/start`	Welcome message and setup
`/help`	List available commands
`/status`	View your registered nodes
`/pause <moniker> <minutes>`	Pause alerts (e.g., `/pause my-node 60`)
`/resume <moniker>`	Resume alerts for a node
`/deregister <moniker>`	Remove node from monitoring

Pause alerts when doing maintenance:

/pause my-node 60

This pauses alerts for 60 minutes (max 24 hours). Use /resume my-node to resume earlier.

Alert Types

The bot sends alerts when:

Node goes offline (no heartbeat for 2+ minutes)
Node starts lagging behind canonical height
Node becomes stalled (height not increasing)

Best Practices

Monitor 24/7 - Use automated monitoring
Set up alerts - Know about issues immediately
Test alerts - Verify notifications work
Monitor trends - Catch issues before they're critical
Keep dashboards - Visualize health at a glance
Log retention - Keep logs for troubleshooting

FAQ

What's the minimum monitoring setup?

At least: process monitoring, disk space alerts, and external ping.

Should I expose metrics publicly?

No. Keep Prometheus metrics internal or behind authentication.

How often should I check?

Automated: every 15-30 seconds. Manual: at least daily review of dashboards.

Requirements - Node requirements
Best Practices - Operational excellence

Key Metrics​

Node Health​

System Resources​

Validator-Specific​

Prometheus Metrics​

Enable Metrics​

Key Metrics​

Prometheus Setup​

Install Prometheus​

Access​

Grafana Dashboards​

Install Grafana​

Access​

Dashboard Setup​

Alerting​

Prometheus Alertmanager​

Alert Rules​

Command Line Checks​

Node Status​

Validator Status​

Log Monitoring​

View Logs​

Log Aggregation​

Health Check Script​

Uptime Monitoring​

Opt-In Node Monitoring​

What Gets Monitored​

Register Your Node​

Install Automatic Heartbeats​

Manual Heartbeat​

Control Visibility​

Uninstall​

Telegram Bot Commands​

Alert Types​

Best Practices​

FAQ​

Related​

Key Metrics

Node Health

System Resources

Validator-Specific

Prometheus Metrics

Enable Metrics

Key Metrics

Prometheus Setup

Install Prometheus

Access

Grafana Dashboards

Install Grafana

Access

Dashboard Setup

Alerting

Prometheus Alertmanager

Alert Rules

Command Line Checks

Node Status

Validator Status

Log Monitoring

View Logs

Log Aggregation

Health Check Script

Uptime Monitoring

Opt-In Node Monitoring

What Gets Monitored

Register Your Node

Install Automatic Heartbeats

Manual Heartbeat

Control Visibility

Uninstall

Telegram Bot Commands

Alert Types

Best Practices

FAQ

Related