Why the ELK Stack Still Wins
Despite the rise of managed observability SaaS offerings, the Elasticsearch–Logstash–Kibana (ELK) stack remains the go-to choice for organisations that need full control over their data pipeline. At RNX we've deployed ELK in banking, energy, and large e-commerce environments — and the patterns we've refined over those engagements are what this article is about.
Cluster Topology for Reliability
The most common mistake we see in production ELK deployments is under-sizing the cluster and conflating node roles. A resilient production cluster needs dedicated nodes for each role:
- Master nodes (3) — cluster state, shard allocation; never place data on these
- Data nodes (≥3) — hot/warm/cold tiering based on query latency SLAs
- Ingest nodes — pre-processing pipelines; offload from data nodes
- Coordinating nodes — receive client requests, fan out searches
For most enterprise workloads, start with 3 master + 3 hot-data + 2 coordinating nodes. Add warm-tier data nodes as index age policies kick in.
Index Lifecycle Management
ILM is the single highest-leverage feature in Elasticsearch for controlling storage costs. A well-designed ILM policy moves data through hot → warm → cold → frozen tiers automatically.
{
"policy": {
"phases": {
"hot": { "actions": { "rollover": { "max_age": "1d", "max_size": "50gb" } } },
"warm": { "min_age": "7d", "actions": { "shrink": { "number_of_shards": 1 } } },
"cold": { "min_age": "30d", "actions": { "freeze": {} } },
"delete": { "min_age": "90d", "actions": { "delete": {} } }
}
}
}Logstash vs. Beats vs. OpenTelemetry
Logstash is powerful but heavyweight. For high-throughput environments we often layer Filebeat/Metricbeat at the edge (low CPU footprint) with Logstash as a central transformation layer, and increasingly adopt OpenTelemetry collectors to future-proof the pipeline.
OpenTelemetry is the direction the industry is moving. If you're starting a greenfield deployment today, build your collection layer on OTEL and send to Elasticsearch via the OTLP exporter.
Monitoring the Monitor
A production ELK cluster generates its own monitoring data. We always deploy a separate monitoring cluster (even a small one) to keep cluster health metrics isolated from your primary ingestion pipeline. Use Stack Monitoring or Metricbeat to ship metrics, and configure Watcher alerts for JVM heap, disk watermarks, and shard count.