Documentation Index Fetch the complete documentation index at: https://wukong.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Monitoring
Setting up comprehensive monitoring for WuKongIM ensures optimal performance and early detection of issues.
Prerequisites
Install Prometheus for metrics collection and monitoring.
Install Prometheus
# Download Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
# Extract
tar xvfz prometheus-2.45.0.linux-amd64.tar.gz
cd prometheus-2.45.0.linux-amd64
# Create user and directories
sudo useradd --no-create-home --shell /bin/ false prometheus
sudo mkdir /etc/prometheus
sudo mkdir /var/lib/prometheus
sudo chown prometheus:prometheus /etc/prometheus
sudo chown prometheus:prometheus /var/lib/prometheus
# Copy binaries
sudo cp prometheus /usr/local/bin/
sudo cp promtool /usr/local/bin/
sudo chown prometheus:prometheus /usr/local/bin/prometheus
sudo chown prometheus:prometheus /usr/local/bin/promtool
Add WuKongIM monitoring targets under the scrape_configs section in your Prometheus configuration.
Single Node Configuration
For single node deployment, create /etc/prometheus/prometheus.yml:
global :
scrape_interval : 15s
evaluation_interval : 15s
scrape_configs :
- job_name : 'prometheus'
static_configs :
- targets : [ 'localhost:9090' ]
- job_name : 'wukongim-trace-metrics'
static_configs :
- targets : [ 'xx.xx.xx.xx:5300' ]
labels :
id : "1001"
instance : "wukongim-node1"
Multi-Node Configuration
For multi-node cluster deployment:
global :
scrape_interval : 15s
evaluation_interval : 15s
scrape_configs :
- job_name : 'prometheus'
static_configs :
- targets : [ 'localhost:9090' ]
- job_name : 'wukongim1-trace-metrics'
static_configs :
- targets : [ '10.206.0.13:5300' ]
labels :
id : "1001"
instance : "wukongim-node1"
- job_name : 'wukongim2-trace-metrics'
static_configs :
- targets : [ '10.206.0.14:5300' ]
labels :
id : "1002"
instance : "wukongim-node2"
- job_name : 'wukongim3-trace-metrics'
static_configs :
- targets : [ '10.206.0.8:5300' ]
labels :
id : "1003"
instance : "wukongim-node3"
Configuration Parameters :
job_name: Unique job name for each WuKongIM node
targets: WuKongIM internal IP + port 5300
labels.id: WuKongIM node ID
labels.instance: Human-readable instance name
Replace xx.xx.xx.xx with the actual internal IP address of your WuKongIM nodes.
Add Prometheus configuration to each node’s wk.yaml file:
mode : "release"
# ... other configurations ...
trace :
prometheusApiUrl : "http://xx.xx.xx.xx:9090"
Replace xx.xx.xx.xx with the internal IP address of your Prometheus server.
Complete WuKongIM Configuration Example
mode : "release"
rootDir : "./wukongim_data"
# Cluster configuration (for multi-node)
cluster :
nodeId : 1001
serverAddr : "10.206.0.13:11110"
apiUrl : "http://10.206.0.13:5001"
initNodes :
- "1001@10.206.0.13:11110"
- "1002@10.206.0.14:11110"
- "1003@10.206.0.8:11110"
# External configuration
external :
ip : "119.45.229.172"
tcpAddr : "119.45.229.172:15100"
wsAddr : "ws://119.45.229.172:15200"
# Monitoring configuration
trace :
prometheusApiUrl : "http://10.206.0.13:9090"
# Logging configuration
logger :
level : "info"
dir : "./logs"
Start Services
Start Prometheus
Create a systemd service file for Prometheus:
sudo nano /etc/systemd/system/prometheus.service
Add the following content:
[Unit]
Description =Prometheus
Wants =network-online.target
After =network-online.target
[Service]
User =prometheus
Group =prometheus
Type =simple
ExecStart =/usr/local/bin/prometheus \
--config.file /etc/prometheus/prometheus.yml \
--storage.tsdb.path /var/lib/prometheus/ \
-- web.console.templates =/etc/prometheus/consoles \
-- web.console.libraries =/etc/prometheus/console_libraries \
-- web.listen-address =0.0.0.0:9090 \
-- web.external-url =
[Install]
WantedBy =multi-user.target
Enable and start Prometheus:
sudo systemctl daemon-reload
sudo systemctl enable prometheus
sudo systemctl start prometheus
sudo systemctl status prometheus
Restart WuKongIM
After updating the configuration, restart WuKongIM on all nodes:
./wukongim stop
./wukongim --config wk.yaml -d
Verification
Check Prometheus Targets
Access Prometheus web interface: http://prometheus-server-ip:9090
Go to Status → Targets
Verify all WuKongIM targets are UP
Check Metrics
Query WuKongIM metrics in Prometheus:
# Check if WuKongIM metrics are being collected
wukongim_connections_total
# Check message throughput
rate(wukongim_messages_total[5m])
# Check memory usage
wukongim_memory_usage_bytes
# Check CPU usage
wukongim_cpu_usage_percent
Key Metrics to Monitor
System Metrics
Metric Description wukongim_connections_totalTotal number of active connections wukongim_messages_totalTotal number of messages processed wukongim_memory_usage_bytesMemory usage in bytes wukongim_cpu_usage_percentCPU usage percentage wukongim_disk_usage_bytesDisk usage in bytes
Cluster Metrics (Multi-node)
Metric Description wukongim_cluster_nodes_totalTotal number of cluster nodes wukongim_cluster_leader_changes_totalNumber of leader changes wukongim_cluster_proposals_failed_totalFailed proposals count wukongim_cluster_proposals_committed_totalCommitted proposals count
Metric Description wukongim_message_latency_secondsMessage processing latency wukongim_api_request_duration_secondsAPI request duration wukongim_websocket_connectionsWebSocket connections count wukongim_tcp_connectionsTCP connections count
Alerting Rules
Create alerting rules in /etc/prometheus/alert_rules.yml:
groups :
- name : wukongim
rules :
- alert : WuKongIMDown
expr : up{job=~"wukongim.*"} == 0
for : 1m
labels :
severity : critical
annotations :
summary : "WuKongIM instance is down"
description : "WuKongIM instance {{ $labels.instance }} has been down for more than 1 minute."
- alert : HighMemoryUsage
expr : wukongim_memory_usage_bytes / (1024*1024*1024) > 2
for : 5m
labels :
severity : warning
annotations :
summary : "High memory usage on WuKongIM"
description : "WuKongIM instance {{ $labels.instance }} is using more than 2GB of memory."
- alert : HighCPUUsage
expr : wukongim_cpu_usage_percent > 80
for : 5m
labels :
severity : warning
annotations :
summary : "High CPU usage on WuKongIM"
description : "WuKongIM instance {{ $labels.instance }} CPU usage is above 80%."
- alert : TooManyConnections
expr : wukongim_connections_total > 10000
for : 2m
labels :
severity : warning
annotations :
summary : "Too many connections on WuKongIM"
description : "WuKongIM instance {{ $labels.instance }} has more than 10,000 active connections."
Update Prometheus configuration to include alert rules:
# Add to prometheus.yml
rule_files :
- "alert_rules.yml"
alerting :
alertmanagers :
- static_configs :
- targets :
- alertmanager:9093
Grafana Dashboard
Install Grafana
# Add Grafana repository
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
# Install Grafana
sudo apt-get update
sudo apt-get install grafana
# Start Grafana
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Access Grafana: http://grafana-server-ip:3000 (admin/admin)
Add Prometheus data source: http://prometheus-server-ip:9090
Import WuKongIM dashboard or create custom dashboards
Sample Dashboard Queries
Connection Count :
sum(wukongim_connections_total)
Message Rate :
sum(rate(wukongim_messages_total[5m]))
Memory Usage :
wukongim_memory_usage_bytes / (1024*1024*1024)
CPU Usage :
wukongim_cpu_usage_percent
Troubleshooting
Prometheus Not Collecting Metrics
# Check if WuKongIM metrics endpoint is accessible
curl http://wukongim-node-ip:5300/metrics
# Check Prometheus logs
sudo journalctl -u prometheus -f
# Verify Prometheus configuration
promtool check config /etc/prometheus/prometheus.yml
WuKongIM Not Sending Metrics
# Check WuKongIM logs
tail -f ./wukongim_data/logs/wukongim.log
# Verify trace configuration in wk.yaml
grep -A 5 "trace:" wk.yaml
# Test connectivity to Prometheus
curl http://prometheus-server-ip:9090/api/v1/targets
Next Steps
Performance Tuning Optimize WuKongIM performance based on monitoring data
Backup Strategy Set up automated backup and recovery
Load Testing Test system performance under load
Cluster Management Learn about cluster scaling and management