Ana içeriğe geç

Production Deployment

This guide covers what you need beyond the basic installation to run a reliable, secure production node. It addresses systemd hardening, backups, disaster recovery, log management, resource monitoring, and kernel tuning -- everything that separates a node that works from a node that stays up.

Prerequisites

Complete the Installation and Security Guide before applying these production hardening steps. This guide assumes you are running monod as a dedicated non-root user with systemd.


systemd Hardening

The Installation guide provides a basic systemd unit file. For production, extend it with additional security and resource directives.

Extended Unit File

sudo tee /etc/systemd/system/monod.service > /dev/null <<EOF
[Unit]
Description=Monolythium Node
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
User=monod
Group=monod
WorkingDirectory=/home/monod
ExecStart=/usr/local/bin/monod start --home /home/monod/.mono

# Restart configuration
Restart=on-failure
RestartSec=3
StartLimitInterval=0

# Resource limits
LimitNOFILE=65535
LimitNPROC=4096

# Environment
Environment="HOME=/home/monod"

# --- Production hardening ---
MemoryMax=12G
CPUWeight=90
NoNewPrivileges=true
PrivateTmp=true
ProtectSystem=strict
ProtectHome=true
ReadWritePaths=/home/monod/.mono
ProtectKernelModules=true
ProtectKernelTunables=true
RestrictNamespaces=true
RestrictSUIDSGID=true

[Install]
WantedBy=multi-user.target
EOF

After editing, reload and restart:

sudo systemctl daemon-reload
sudo systemctl restart monod

Directive Reference

DirectiveWhat It DoesWhy It Matters
MemoryMax=12GKills the process if it exceeds 12 GB of RAMPrevents a memory leak from consuming all host memory and crashing other services
CPUWeight=90Gives monod high CPU scheduling priority (default is 100; lower values get less CPU)Ensures consensus-critical work is not starved by background processes
NoNewPrivileges=truePrevents the process (and its children) from gaining additional privileges via setuid/setgid binariesBlocks privilege escalation if the process is compromised
PrivateTmp=trueGives monod its own isolated /tmp directoryPrevents other processes from reading or tampering with temporary files
ProtectSystem=strictMounts the entire filesystem read-only except explicitly allowed pathsStops a compromised process from modifying system binaries or configuration
ProtectHome=trueMakes all home directories inaccessible except the allowed pathsProtects other users' data from being read or modified
ReadWritePaths=/home/monod/.monoAllows writes only to the chain data directoryThe sole exception to ProtectSystem=strict -- monod can write chain state here
ProtectKernelModules=trueBlocks loading or unloading kernel modulesPrevents a compromised process from inserting rootkits or rogue drivers
ProtectKernelTunables=trueMakes /proc/sys, /sys, and similar paths read-onlyPrevents runtime modification of kernel parameters
RestrictNamespaces=trueDenies creation of new Linux namespacesPrevents container escape techniques and namespace-based privilege escalation
RestrictSUIDSGID=truePrevents setting SUID/SGID bits on filesBlocks a common vector for privilege escalation
bilgi

If monod fails to start after adding these directives, check journalctl -u monod -n 50 for Permission denied errors. The most common cause is a missing path in ReadWritePaths.


Backup Strategy

What to Back Up

ItemPathPriorityNotes
Validator key~/.mono/config/priv_validator_key.jsonCriticalLoss means losing your validator. Compromise means potential double-signing.
Node key~/.mono/config/node_key.jsonHighDefines your P2P identity. Replaceable but causes peer disruption.
Account keys (keyring)~/.mono/keyring-*CriticalContains account private keys for signing transactions.
Configuration~/.mono/config/config.toml, app.toml, client.tomlMediumRecreatable, but saves time during recovery.
Chain data~/.mono/data/Do not back upRe-sync from a snapshot or genesis instead. Chain data is large and changes every block.

Automated Backup Script

Create a cron job that runs monarch backup keys daily:

# Create backup script
sudo -u monod tee /home/monod/backup-keys.sh > /dev/null <<'SCRIPT'
#!/bin/bash
set -euo pipefail

BACKUP_DIR="/home/monod/backups"
DATE=$(date +%Y-%m-%d)
BACKUP_FILE="${BACKUP_DIR}/keys-${DATE}.tar.gz.enc"

mkdir -p "$BACKUP_DIR"

# Use monarch to create an encrypted backup
monarch backup keys --output "$BACKUP_FILE" --home /home/monod/.mono

# Keep only the last 30 daily backups locally
find "$BACKUP_DIR" -name "keys-*.tar.gz.enc" -mtime +30 -delete

echo "[$(date)] Backup complete: $BACKUP_FILE"
SCRIPT

sudo chmod +x /home/monod/backup-keys.sh

Schedule it with cron:

# Run daily at 03:00 UTC
sudo -u monod crontab -e
# Add this line:
# 0 3 * * * /home/monod/backup-keys.sh >> /home/monod/backups/backup.log 2>&1

Using monod directly instead of monarch:

# Manual encrypted backup of key files
tar czf - -C /home/monod/.mono/config priv_validator_key.json node_key.json | \
openssl enc -aes-256-cbc -pbkdf2 -out /home/monod/backups/keys-$(date +%Y-%m-%d).tar.gz.enc

Backup Verification

Test your backup restoration monthly. An untested backup is not a backup.

# Decrypt and list contents (does not overwrite anything)
openssl enc -d -aes-256-cbc -pbkdf2 -in /home/monod/backups/keys-2026-03-30.tar.gz.enc | tar tzf -

Off-Site Storage

  • Store encrypted backups in at least two separate locations (USB drive, safety deposit box, a different cloud provider)
  • Never store unencrypted keys on cloud storage or network-attached drives
  • Use unique, strong encryption passwords -- store those passwords separately from the backups
  • Rotate backup encryption passwords quarterly
Validator Key Security

Your priv_validator_key.json is the single most sensitive file. If it is compromised, an attacker can double-sign and permanently slash your validator. Encrypt it before any transfer and never leave unencrypted copies on network-accessible storage.


Disaster Recovery

When a server fails, your recovery speed determines how many blocks you miss and whether you get jailed.

Recovery Procedure

  1. Provision a new server meeting the hardware requirements
  2. Install monod and monarch following the Installation guide
  3. Restore keys from backup:
    # Using monarch
    monarch backup restore /path/to/backup/

    # Or manually
    openssl enc -d -aes-256-cbc -pbkdf2 -in keys-backup.tar.gz.enc | \
    tar xzf - -C /home/monod/.mono/config/
  4. Download canonical configuration for your network:
    monarch join --network Testnet --home /home/monod/.mono
  5. Sync the chain -- choose one:
    • From snapshot (30-60 minutes): monarch snapshot apply --network Testnet
    • Via state-sync (15-30 minutes): monarch state-sync --network Testnet
    • From genesis (several hours): let the node sync naturally
  6. Verify the node is signing:
    monarch status
    # Confirm: catching_up = false, signing blocks

Recovery Time Estimates

MethodEstimated TimeDisk Required
State-sync15-30 minutesMinimal (recent state only)
Snapshot restore30-60 minutesDepends on snapshot size
Genesis syncSeveral hours to daysFull chain history

Disaster Recovery Runbook

Use this checklist when recovering from a server failure:

  • Old server confirmed stopped (or unreachable)
  • New server provisioned and SSH access verified
  • monod and monarch installed and version confirmed
  • Keys restored from encrypted backup
  • Key file permissions set (chmod 600 on key files)
  • Node joined to correct network
  • Chain sync started (snapshot, state-sync, or genesis)
  • Sync completed -- catching_up is false
  • Validator signing blocks -- check monarch status
  • Monitoring re-enabled -- monarch monitor register or Prometheus scrape target updated
  • Firewall rules applied -- monarch firewall
  • systemd service enabled for auto-start on boot
Double-Signing Risk

Before starting your restored validator, confirm with certainty that the old server is stopped or destroyed. Running two instances of the same validator simultaneously causes double-signing and permanent slashing. See the Security Guide for the safe migration procedure.


Log Management

Unmanaged logs will eventually fill your disk. Configure journald to cap log storage.

journald Configuration

Create a drop-in configuration for monod log limits:

sudo mkdir -p /etc/systemd/journald.conf.d

sudo tee /etc/systemd/journald.conf.d/monod.conf > /dev/null <<EOF
[Journal]
SystemMaxUse=2G
SystemKeepFree=1G
SystemMaxFileSize=200M
MaxRetentionSec=30day
EOF

sudo systemctl restart systemd-journald
DirectiveEffect
SystemMaxUse=2GTotal journal storage capped at 2 GB
SystemKeepFree=1GAlways keep at least 1 GB of disk free
SystemMaxFileSize=200MIndividual journal files rotate at 200 MB
MaxRetentionSec=30dayLogs older than 30 days are automatically deleted

Custom Log Files

If you redirect monod output to custom log files (not recommended -- prefer journald), configure logrotate:

sudo tee /etc/logrotate.d/monod > /dev/null <<EOF
/var/log/monod/*.log {
daily
rotate 14
compress
delaycompress
missingok
notifempty
create 0640 monod monod
}
EOF

Viewing Logs

# Using monarch
monarch logs --lines 100
monarch logs --follow

# Using journalctl directly
sudo journalctl -u monod -n 100 --no-pager
sudo journalctl -u monod -f
sudo journalctl -u monod --since "1 hour ago"
sudo journalctl -u monod --priority=err

Resource Monitoring

Quick Health Audit

monarch doctor

monarch doctor checks 13 categories including disk space, peer count, sync status, key file permissions, and version compatibility. Run it after any configuration change.

Continuous Monitoring

# Enable built-in Prometheus metrics
monarch metrics enable

# Verify metrics endpoint is live
curl -s localhost:26660/metrics | head -5

See the Monitoring guide for full Prometheus and Grafana setup.

Warning Thresholds

MetricWarningCriticalHow to Check
Disk usage> 70%> 85%df -h /home/monod/.mono
Memory usage> 80%> 95%free -h
CPU usage (sustained)> 70%> 90%top -bn1 | grep monod
Peer count< 50monarch status or curl -s localhost:26657/net_info | jq '.result.n_peers'
Block height lag> 10 blocks> 100 blocksCompare monarch status height to explorer
Disk I/O wait> 20%> 40%iostat -x 1 5
Open file descriptors> 50,000> 60,000ls /proc/$(pgrep monod)/fd | wc -l
Missed blocks (validators)> 0 in 1 hour> 5 in 1 hourmonarch status or Prometheus tendermint_consensus_missing_validators

Disk Growth Monitoring

Chain data grows continuously. Monitor the growth rate to predict when you will need more storage:

# Check current data size
du -sh /home/monod/.mono/data

# Check disk usage over time (add to cron for tracking)
echo "$(date +%Y-%m-%d) $(du -sb /home/monod/.mono/data | cut -f1)" >> /home/monod/disk-growth.log

Kernel Tuning

Production nodes benefit from kernel parameter adjustments. Apply these via sysctl:

sudo tee /etc/sysctl.d/99-monolythium.conf > /dev/null <<EOF
# Maximum number of file handles for the entire system
fs.file-max = 100000

# Maximum queue length for incoming connections
net.core.somaxconn = 4096

# Reduce swap usage (prefer RAM over swap)
vm.swappiness = 10

# Allow reuse of TIME_WAIT sockets for new connections
net.ipv4.tcp_tw_reuse = 1
EOF

# Apply immediately
sudo sysctl --system

Parameter Reference

ParameterValueDefaultPurpose
fs.file-max100000~65,000Sets the system-wide maximum number of open file descriptors. A node maintaining hundreds of P2P connections and database files can exhaust the default limit.
net.core.somaxconn40964096 (modern kernels)Maximum length of the listen queue for incoming connections. Prevents connection drops during P2P peer surges. Older kernels default to 128.
vm.swappiness1060Controls how aggressively the kernel swaps memory pages to disk. A value of 10 keeps more data in RAM, reducing latency for consensus operations.
net.ipv4.tcp_tw_reuse10Allows reusing sockets in TIME_WAIT state for new outbound connections. Useful when the node cycles through many short-lived P2P connections.

Verify the settings are applied:

sysctl fs.file-max net.core.somaxconn vm.swappiness net.ipv4.tcp_tw_reuse
Persistence

Settings in /etc/sysctl.d/ persist across reboots. Running sysctl --system applies them immediately without a reboot.


Production Checklist

Use this checklist before considering your node production-ready:

systemd

  • Extended unit file with all hardening directives applied
  • MemoryMax set to an appropriate value for your server
  • Service enabled for auto-start on boot (systemctl enable monod)
  • Verified service restarts correctly after kill -9

Backups

  • Automated daily key backup configured (cron + monarch backup keys)
  • Backup encryption password stored separately from backups
  • Off-site backup in at least two locations
  • Backup restoration tested within the last 30 days

Disaster Recovery

  • Written runbook with step-by-step recovery procedure
  • Recovery tested on a fresh server at least once
  • Snapshot or state-sync method verified and working

Logs

  • journald configured with size limits (SystemMaxUse=2G)
  • Log retention policy set (30 days recommended)
  • Log aggregation configured if running multiple nodes

Monitoring

  • monarch doctor passes all checks
  • monarch metrics enable activated
  • Prometheus scraping metrics endpoint
  • Grafana dashboards configured
  • Alerting rules for node down, missed blocks, low peers, disk usage
  • External uptime monitoring configured

Kernel

  • fs.file-max increased
  • vm.swappiness reduced
  • net.core.somaxconn verified
  • net.ipv4.tcp_tw_reuse enabled
  • Settings persisted in /etc/sysctl.d/

Security

  • SSH key-only authentication (password auth disabled)
  • Firewall configured (monarch firewall)
  • Node running as dedicated non-root user
  • Key file permissions verified (chmod 600)
  • Sentry architecture configured for validators (see Sentry Architecture)