Sharing our work on high-performance log compression - curious about this community's scaling experiences
The Challenge That Started It All
Most logging systems are built for the cloud era — they assume CPU power, RAM, cheap bandwidth and storage.
But what if you're running edge computing, IoT devices, or just want to keep your cloud bills reasonable?
We set out to build something different.
It began with a simple question: how many logs per second can you realistically process on consumer hardware before hitting a wall?
The goal was audacious: create a logging library so efficient it could handle 250 million logs at the highest rate with efficient on-the-fly compression and live statistics exports on a standard developer laptop without breaking a sweat.
Why This Matters More Than You Think
We discovered that typical log data compresses at 4–6× when you apply smart preprocessing.
That means for every terabyte of logs you're storing, you could be storing just 200–250 GB.
Here's the dirty secret of modern logging: you're probably storing the same data dozens of times.
Error messages, API endpoints, user sessions — they follow patterns.
Yet most systems store each instance as if it were unique.
This translates into concrete egress cost cuts and multiplies long-term storage possibilities.
The Breakthrough: On-the-Fly Preprocessing + Compression
Beyond Standard Compression Techniques
Most solutions throw raw logs at a compressor.
We've developed a preprocessing layer that transforms logs into a more compressible format before even applying LZ4.
This step significantly reduces data entropy, allowing compression to achieve on-the-fly 4–6× ratios where standard approaches plateau at 2–3.5×, or require significant overhead using HC algorithms with cumbersome post-processing pipelines.
Lock-Free Everything
We treated contention as the enemy.
The hot path uses MPMC queues and ring buffers so multiple threads can enqueue logs without blocking each other.
It's like a multi-lane highway where cars merge without stopping.
Batch Compression Magic
Instead of compressing each log individually, we batch them and compress entire chunks.
Combined with preprocessing, this gives LZ4 more patterns to work with, dramatically improving ratios.
Our batching strategy pairs with a unique cache approach that handles millions of unique values within tight hardware constraints.
Positioning in the Observability Ecosystem:
Loggr is not designed to replace full-featured platforms like Datadog or Splunk, but to serve as an upstream gateway — compressing logs at the source before transmission to storage or downstream analysis pipelines. This creates a cost-efficient two-stage architecture where Loggr handles the “heavy lifting” of data reduction, dramatically cutting egress and storage costs while maintaining compatibility with existing tools.
The Moment of Truth: 250 Million Log Test
Laptop: Lenovo P14s Gen 5 (Ryzen 5 Pro, 96 GB RAM, NVMe SSD)
Environment: Windows x64, AVX2 enabled
Data: 1,000 unique URLs × 5,000 endpoints, random IP x URL distribution, assorted with other randomly generated fields
Sample Format:
[249328097] [2025-10-17T15:50:20.721988Z] [/forum/thread/12345.html] [172.16.18.116] [506] [SEARCH] [59466ms] [16802b] [174]
The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware
Results
With 6 caller threads, 2 MB batch size:
- 🔥 250,000,000 logs processed
- ⏱️ 11.52 seconds total
- 🚀 21.71 million logs/second -> to disk
- 💾 5:1 compression ratio
- 🖥️ 100% CPU (6 physical cores used)
- 📊 105 MB RAM footprint (stable)
- ✅ 0 lost logs & no back-pressure
With 1 caller thread, 512 MB batch size:
- 🔥 250,000,000 logs processed
- ⏱️ 27.17 seconds total
- 🚀 9.2 million logs/second -> to disk
- 💾 5:1 compression ratio
- 🖥️ 20% CPU (1 physical core)
- 📊 1.8 GB RAM footprint (stable)
- ✅ 0 lost logs & no back-pressure
With 1 caller thread, 500 KB batch size:
- 🔥 250,000,000 logs processed
- ⏱️ 40.32 seconds total
- 🚀 6.2 million logs/second -> to disk
- 💾 4.6:1 compression ratio
- 🖥️ 20% CPU (1 physical core)
- 📊 16 MB RAM footprint (stable)
- ✅ 0 lost logs & no back-pressure
Detailed Benchmarks : Here
Real-World Implications
- Edge computing: Run comprehensive logging on resource-constrained devices without worrying about RAM and CPU limits.
- Cost-conscious teams: Reduce log volume manyfold = lower storage costs, lower egress fees, and potentially lower licensing costs.
- High-throughput systems: Maintain detailed logging without becoming I/O bound or drowning in storage costs.
- Security traceability: Log all events with minimal resources, maintaining absolute temporal order with unique trans-thread atomic IDs.
The Architecture
Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Disk Writer
Key Design Decisions
Single C DLL (170KB) with no dependencies (plug & play, interoperable with most environments) featuring :
- AVX2-optimized code paths
- Highly configurable without complex setup
- On-the-fly IP anonymization
- Custom compression level (or none)
- URL param truncation
- Unique per-log cross-thread ID
- Instant backup write path
- Custom memory footprint
- Live usage statistics
Optional cryptographic signing for audit trails (on the roadmap).
When You Might Not Need This
Let’s be honest — not every application needs this level of optimization. If you're processing 10,000 logs per second, traditional solutions work fine.
But if you're dealing with:
- Hundred of thousand or million of events per second
- Bandwidth-constrained environments
- Budgets where cloud costs matter
- Regulatory requirements for long-term retention ...then host-side compression becomes incredibly valuable.
The Future of Logging
We believe the next frontier in observability isn't collecting more data — it's being smarter about what we keep and how we store it.
By moving compression to the source, we can maintain detailed audit trails without the traditional cost burden.
The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware.
And in an era where data growth is outpacing budget growth, that might be the most important optimization of all.
Want to run your own tests? For organizations conducting formal technical evaluations, a limited demo DLL is available. We're particularly interested in hearing about edge cases and workloads where our approach does (and doesn't) work well.
Technical specs: Windows x64, AVX2 required, C API (easy bindings for most languages).

Top comments (0)