Home/Blog/Building a Production ELK Stack for Enterprise Log Management
Architecture12 min read10 May 2026

Building a Production ELK Stack for Enterprise Log Management

A practical guide to designing, deploying, and operating a resilient Elasticsearch–Logstash–Kibana pipeline at enterprise scale — with real-world patterns we use at RNX.

R

Radoslav Nagy

Founder, RNX

ElasticsearchELKLog ManagementArchitecture

Why the ELK Stack Still Wins

Despite the rise of managed observability SaaS offerings, the Elasticsearch–Logstash–Kibana (ELK) stack remains the go-to choice for organisations that need full control over their data pipeline. At RNX we've deployed ELK in banking, energy, and large e-commerce environments — and the patterns we've refined over those engagements are what this article is about.

Cluster Topology for Reliability

The most common mistake we see in production ELK deployments is under-sizing the cluster and conflating node roles. A resilient production cluster needs dedicated nodes for each role:

Tip

For most enterprise workloads, start with 3 master + 3 hot-data + 2 coordinating nodes. Add warm-tier data nodes as index age policies kick in.

Index Lifecycle Management

ILM is the single highest-leverage feature in Elasticsearch for controlling storage costs. A well-designed ILM policy moves data through hot → warm → cold → frozen tiers automatically.

json
{
  "policy": {
    "phases": {
      "hot":  { "actions": { "rollover": { "max_age": "1d", "max_size": "50gb" } } },
      "warm": { "min_age": "7d",  "actions": { "shrink": { "number_of_shards": 1 } } },
      "cold": { "min_age": "30d", "actions": { "freeze": {} } },
      "delete": { "min_age": "90d", "actions": { "delete": {} } }
    }
  }
}

Logstash vs. Beats vs. OpenTelemetry

Logstash is powerful but heavyweight. For high-throughput environments we often layer Filebeat/Metricbeat at the edge (low CPU footprint) with Logstash as a central transformation layer, and increasingly adopt OpenTelemetry collectors to future-proof the pipeline.

Note

OpenTelemetry is the direction the industry is moving. If you're starting a greenfield deployment today, build your collection layer on OTEL and send to Elasticsearch via the OTLP exporter.

Monitoring the Monitor

A production ELK cluster generates its own monitoring data. We always deploy a separate monitoring cluster (even a small one) to keep cluster health metrics isolated from your primary ingestion pipeline. Use Stack Monitoring or Metricbeat to ship metrics, and configure Watcher alerts for JVM heap, disk watermarks, and shard count.

Put it into practice

Need expert help implementing this?

We implement these patterns for enterprise clients. Book a free consultation to discuss your environment.

Book a Free Consultation