Production Deployment
This guide covers what you need beyond the basic installation to run a reliable, secure production node. It addresses systemd hardening, backups, disaster recovery, log management, resource monitoring, and kernel tuning -- everything that separates a node that works from a node that stays up.
Complete the Installation and Security Guide before applying these production hardening steps. This guide assumes you are running monod as a dedicated non-root user with systemd.
systemd Hardening
The Installation guide provides a basic systemd unit file. For production, extend it with additional security and resource directives.
Extended Unit File
sudo tee /etc/systemd/system/monod.service > /dev/null <<EOF
[Unit]
Description=Monolythium Node
After=network-online.target
Wants=network-online.target
[Service]
Type=simple
User=monod
Group=monod
WorkingDirectory=/home/monod
ExecStart=/usr/local/bin/monod start --home /home/monod/.mono
# Restart configuration
Restart=on-failure
RestartSec=3
StartLimitInterval=0
# Resource limits
LimitNOFILE=65535
LimitNPROC=4096
# Environment
Environment="HOME=/home/monod"
# --- Production hardening ---
MemoryMax=12G
CPUWeight=90
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/home/monod/.mono
ProtectKernelModules=true
ProtectKernelTunables=true
RestrictNamespaces=true
RestrictSUIDSGID=true
[Install]
WantedBy=multi-user.target
EOF
After editing, reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart monod
Directive Reference
| Directive | What It Does | Why It Matters |
|---|---|---|
MemoryMax=12G | Kills the process if it exceeds 12 GB of RAM | Prevents a memory leak from consuming all host memory and crashing other services |
CPUWeight=90 | Gives monod high CPU scheduling priority (default is 100; lower values get less CPU) | Ensures consensus-critical work is not starved by background processes |
NoNewPrivileges=true | Prevents the process (and its children) from gaining additional privileges via setuid/setgid binaries | Blocks privilege escalation if the process is compromised |
PrivateTmp=true | Gives monod its own isolated /tmp directory | Prevents other processes from reading or tampering with temporary files |
ProtectSystem=strict | Mounts the entire filesystem read-only except explicitly allowed paths | Stops a compromised process from modifying system binaries or configuration |
ProtectHome=true | Makes all home directories inaccessible except the allowed paths | Protects other users' data from being read or modified |
ReadWritePaths=/home/monod/.mono | Allows writes only to the chain data directory | The sole exception to ProtectSystem=strict -- monod can write chain state here |
ProtectKernelModules=true | Blocks loading or unloading kernel modules | Prevents a compromised process from inserting rootkits or rogue drivers |
ProtectKernelTunables=true | Makes /proc/sys, /sys, and similar paths read-only | Prevents runtime modification of kernel parameters |
RestrictNamespaces=true | Denies creation of new Linux namespaces | Prevents container escape techniques and namespace-based privilege escalation |
RestrictSUIDSGID=true | Prevents setting SUID/SGID bits on files | Blocks a common vector for privilege escalation |
If monod fails to start after adding these directives, check journalctl -u monod -n 50 for Permission denied errors. The most common cause is a missing path in ReadWritePaths.
Backup Strategy
What to Back Up
| Item | Path | Priority | Notes |
|---|---|---|---|
| Validator key | ~/.mono/config/priv_validator_key.json | Critical | Loss means losing your validator. Compromise means potential double-signing. |
| Node key | ~/.mono/config/node_key.json | High | Defines your P2P identity. Replaceable but causes peer disruption. |
| Account keys (keyring) | ~/.mono/keyring-* | Critical | Contains account private keys for signing transactions. |
| Configuration | ~/.mono/config/config.toml, app.toml, client.toml | Medium | Recreatable, but saves time during recovery. |
| Chain data | ~/.mono/data/ | Do not back up | Re-sync from a snapshot or genesis instead. Chain data is large and changes every block. |
Automated Backup Script
Create a cron job that runs monarch backup keys daily:
# Create backup script
sudo -u monod tee /home/monod/backup-keys.sh > /dev/null <<'SCRIPT'
#!/bin/bash
set -euo pipefail
BACKUP_DIR="/home/monod/backups"
DATE=$(date +%Y-%m-%d)
BACKUP_FILE="${BACKUP_DIR}/keys-${DATE}.tar.gz.enc"
mkdir -p "$BACKUP_DIR"
# Use monarch to create an encrypted backup
monarch backup keys --output "$BACKUP_FILE" --home /home/monod/.mono
# Keep only the last 30 daily backups locally
find "$BACKUP_DIR" -name "keys-*.tar.gz.enc" -mtime +30 -delete
echo "[$(date)] Backup complete: $BACKUP_FILE"
SCRIPT
sudo chmod +x /home/monod/backup-keys.sh
Schedule it with cron:
# Run daily at 03:00 UTC
sudo -u monod crontab -e
# Add this line:
# 0 3 * * * /home/monod/backup-keys.sh >> /home/monod/backups/backup.log 2>&1
Using monod directly instead of monarch:
# Manual encrypted backup of key files
tar czf - -C /home/monod/.mono/config priv_validator_key.json node_key.json | \
openssl enc -aes-256-cbc -pbkdf2 -out /home/monod/backups/keys-$(date +%Y-%m-%d).tar.gz.enc
Backup Verification
Test your backup restoration monthly. An untested backup is not a backup.
# Decrypt and list contents (does not overwrite anything)
openssl enc -d -aes-256-cbc -pbkdf2 -in /home/monod/backups/keys-2026-03-30.tar.gz.enc | tar tzf -
Off-Site Storage
- Store encrypted backups in at least two separate locations (USB drive, safety deposit box, a different cloud provider)
- Never store unencrypted keys on cloud storage or network-attached drives
- Use unique, strong encryption passwords -- store those passwords separately from the backups
- Rotate backup encryption passwords quarterly
Your priv_validator_key.json is the single most sensitive file. If it is compromised, an attacker can double-sign and permanently slash your validator. Encrypt it before any transfer and never leave unencrypted copies on network-accessible storage.
Disaster Recovery
When a server fails, your recovery speed determines how many blocks you miss and whether you get jailed.
Recovery Procedure
- Provision a new server meeting the hardware requirements
- Install monod and monarch following the Installation guide
- Restore keys from backup:
# Using monarch
monarch backup restore /path/to/backup/
# Or manually
openssl enc -d -aes-256-cbc -pbkdf2 -in keys-backup.tar.gz.enc | \
tar xzf - -C /home/monod/.mono/config/ - Download canonical configuration for your network:
monarch join --network Testnet --home /home/monod/.mono - Sync the chain -- choose one:
- From snapshot (30-60 minutes):
monarch snapshot apply --network Testnet - Via state-sync (15-30 minutes):
monarch state-sync --network Testnet - From genesis (several hours): let the node sync naturally
- From snapshot (30-60 minutes):
- Verify the node is signing:
monarch status
# Confirm: catching_up = false, signing blocks
Recovery Time Estimates
| Method | Estimated Time | Disk Required |
|---|---|---|
| State-sync | 15-30 minutes | Minimal (recent state only) |
| Snapshot restore | 30-60 minutes | Depends on snapshot size |
| Genesis sync | Several hours to days | Full chain history |
Disaster Recovery Runbook
Use this checklist when recovering from a server failure:
- Old server confirmed stopped (or unreachable)
- New server provisioned and SSH access verified
-
monodandmonarchinstalled and version confirmed - Keys restored from encrypted backup
- Key file permissions set (
chmod 600on key files) - Node joined to correct network
- Chain sync started (snapshot, state-sync, or genesis)
- Sync completed --
catching_upisfalse - Validator signing blocks -- check
monarch status - Monitoring re-enabled --
monarch monitor registeror Prometheus scrape target updated - Firewall rules applied --
monarch firewall - systemd service enabled for auto-start on boot
Before starting your restored validator, confirm with certainty that the old server is stopped or destroyed. Running two instances of the same validator simultaneously causes double-signing and permanent slashing. See the Security Guide for the safe migration procedure.
Log Management
Unmanaged logs will eventually fill your disk. Configure journald to cap log storage.
journald Configuration
Create a drop-in configuration for monod log limits:
sudo mkdir -p /etc/systemd/journald.conf.d
sudo tee /etc/systemd/journald.conf.d/monod.conf > /dev/null <<EOF
[Journal]
SystemMaxUse=2G
SystemKeepFree=1G
SystemMaxFileSize=200M
MaxRetentionSec=30day
EOF
sudo systemctl restart systemd-journald
| Directive | Effect |
|---|---|
SystemMaxUse=2G | Total journal storage capped at 2 GB |
SystemKeepFree=1G | Always keep at least 1 GB of disk free |
SystemMaxFileSize=200M | Individual journal files rotate at 200 MB |
MaxRetentionSec=30day | Logs older than 30 days are automatically deleted |
Custom Log Files
If you redirect monod output to custom log files (not recommended -- prefer journald), configure logrotate:
sudo tee /etc/logrotate.d/monod > /dev/null <<EOF
/var/log/monod/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0640 monod monod
}
EOF
Viewing Logs
# Using monarch
monarch logs --lines 100
monarch logs --follow
# Using journalctl directly
sudo journalctl -u monod -n 100 --no-pager
sudo journalctl -u monod -f
sudo journalctl -u monod --since "1 hour ago"
sudo journalctl -u monod --priority=err
Resource Monitoring
Quick Health Audit
monarch doctor
monarch doctor checks 13 categories including disk space, peer count, sync status, key file permissions, and version compatibility. Run it after any configuration change.
Continuous Monitoring
# Enable built-in Prometheus metrics
monarch metrics enable
# Verify metrics endpoint is live
curl -s localhost:26660/metrics | head -5
See the Monitoring guide for full Prometheus and Grafana setup.
Warning Thresholds
| Metric | Warning | Critical | How to Check |
|---|---|---|---|
| Disk usage | > 70% | > 85% | df -h /home/monod/.mono |
| Memory usage | > 80% | > 95% | free -h |
| CPU usage (sustained) | > 70% | > 90% | top -bn1 | grep monod |
| Peer count | < 5 | 0 | monarch status or curl -s localhost:26657/net_info | jq '.result.n_peers' |
| Block height lag | > 10 blocks | > 100 blocks | Compare monarch status height to explorer |
| Disk I/O wait | > 20% | > 40% | iostat -x 1 5 |
| Open file descriptors | > 50,000 | > 60,000 | ls /proc/$(pgrep monod)/fd | wc -l |
| Missed blocks (validators) | > 0 in 1 hour | > 5 in 1 hour | monarch status or Prometheus tendermint_consensus_missing_validators |
Disk Growth Monitoring
Chain data grows continuously. Monitor the growth rate to predict when you will need more storage:
# Check current data size
du -sh /home/monod/.mono/data
# Check disk usage over time (add to cron for tracking)
echo "$(date +%Y-%m-%d) $(du -sb /home/monod/.mono/data | cut -f1)" >> /home/monod/disk-growth.log
Kernel Tuning
Production nodes benefit from kernel parameter adjustments. Apply these via sysctl:
sudo tee /etc/sysctl.d/99-monolythium.conf > /dev/null <<EOF
# Maximum number of file handles for the entire system
fs.file-max = 100000
# Maximum queue length for incoming connections
net.core.somaxconn = 4096
# Reduce swap usage (prefer RAM over swap)
vm.swappiness = 10
# Allow reuse of TIME_WAIT sockets for new connections
net.ipv4.tcp_tw_reuse = 1
EOF
# Apply immediately
sudo sysctl --system
Parameter Reference
| Parameter | Value | Default | Purpose |
|---|---|---|---|
fs.file-max | 100000 | ~65,000 | Sets the system-wide maximum number of open file descriptors. A node maintaining hundreds of P2P connections and database files can exhaust the default limit. |
net.core.somaxconn | 4096 | 4096 (modern kernels) | Maximum length of the listen queue for incoming connections. Prevents connection drops during P2P peer surges. Older kernels default to 128. |
vm.swappiness | 10 | 60 | Controls how aggressively the kernel swaps memory pages to disk. A value of 10 keeps more data in RAM, reducing latency for consensus operations. |
net.ipv4.tcp_tw_reuse | 1 | 0 | Allows reusing sockets in TIME_WAIT state for new outbound connections. Useful when the node cycles through many short-lived P2P connections. |
Verify the settings are applied:
sysctl fs.file-max net.core.somaxconn vm.swappiness net.ipv4.tcp_tw_reuse
Settings in /etc/sysctl.d/ persist across reboots. Running sysctl --system applies them immediately without a reboot.
Production Checklist
Use this checklist before considering your node production-ready:
systemd
- Extended unit file with all hardening directives applied
-
MemoryMaxset to an appropriate value for your server - Service enabled for auto-start on boot (
systemctl enable monod) - Verified service restarts correctly after
kill -9
Backups
- Automated daily key backup configured (cron +
monarch backup keys) - Backup encryption password stored separately from backups
- Off-site backup in at least two locations
- Backup restoration tested within the last 30 days
Disaster Recovery
- Written runbook with step-by-step recovery procedure
- Recovery tested on a fresh server at least once
- Snapshot or state-sync method verified and working
Logs
- journald configured with size limits (
SystemMaxUse=2G) - Log retention policy set (30 days recommended)
- Log aggregation configured if running multiple nodes
Monitoring
-
monarch doctorpasses all checks -
monarch metrics enableactivated - Prometheus scraping metrics endpoint
- Grafana dashboards configured
- Alerting rules for node down, missed blocks, low peers, disk usage
- External uptime monitoring configured
Kernel
-
fs.file-maxincreased -
vm.swappinessreduced -
net.core.somaxconnverified -
net.ipv4.tcp_tw_reuseenabled - Settings persisted in
/etc/sysctl.d/
Security
- SSH key-only authentication (password auth disabled)
- Firewall configured (
monarch firewall) - Node running as dedicated non-root user
- Key file permissions verified (
chmod 600) - Sentry architecture configured for validators (see Sentry Architecture)
Related
- Installation -- Basic setup and systemd configuration
- Security Guide -- Key management, server hardening, double-signing prevention
- Monitoring -- Prometheus, Grafana, and alerting setup
- Troubleshooting -- Common issues and solutions
- Sentry Architecture -- Protecting validators with sentry nodes