François Gauthier

Posted on Oct 31 • Edited on Nov 20

How We Built Loggr, a Logging Library That Handles 20M+ Logs/second on a Laptop

#scaling #performance #logging #scalability

Sharing our work on high-performance log compression - curious about this community's scaling experiences

The Challenge That Started It All

Most logging systems are built for the cloud era — they assume CPU power, RAM, cheap bandwidth and storage.

But what if you're running edge computing, IoT devices, or just want to keep your cloud bills reasonable?

We set out to build something different.

It began with a simple question: how many logs per second can you realistically process on consumer hardware before hitting a wall?

The goal was audacious: create a logging library so efficient it could handle 250 million logs at the highest rate with efficient on-the-fly compression and live statistics exports on a standard developer laptop without breaking a sweat.

Why This Matters More Than You Think

We discovered that typical log data compresses at 4–6× when you apply smart preprocessing.

That means for every terabyte of logs you're storing, you could be storing just 200–250 GB.

Here's the dirty secret of modern logging: you're probably storing the same data dozens of times.

Error messages, API endpoints, user sessions — they follow patterns.

Yet most systems store each instance as if it were unique.

This translates into concrete egress cost cuts and multiplies long-term storage possibilities.

The Breakthrough: On-the-Fly Preprocessing + Compression

Beyond Standard Compression Techniques

Most solutions throw raw logs at a compressor.

We've developed a preprocessing layer that transforms logs into a more compressible format before even applying LZ4.

This step significantly reduces data entropy, allowing compression to achieve on-the-fly 4–6× ratios where standard approaches plateau at 2–3.5×, or require significant overhead using HC algorithms with cumbersome post-processing pipelines.

Lock-Free (Nearly) Everything

We treated contention as the enemy.

The hot path uses MPMC queues and ring buffers so multiple threads can enqueue logs without blocking each other.

It's like a multi-lane highway where cars merge without stopping.

Batch Compression Magic

Instead of compressing each log individually, we batch them and compress entire chunks.

Combined with preprocessing, this gives LZ4 more patterns to work with, dramatically improving ratios.

Our batching strategy pairs with a unique cache approach that handles millions of unique values within tight hardware constraints.

Positioning in the Observability Ecosystem:

Loggr is not designed to replace full-featured platforms like Datadog or Splunk, but to serve as an upstream gateway — compressing logs at the source before transmission to storage or downstream analysis pipelines. This creates a cost-efficient two-stage architecture where Loggr handles the “heavy lifting” of data reduction, dramatically cutting egress and storage costs while maintaining compatibility with existing tools.

The Moment of Truth: 250 Million Log Test

Laptop: Lenovo P14s Gen 5 (Ryzen 5 Pro, 96 GB RAM, NVMe SSD)

Environment: Windows x64, AVX2 enabled

Data: 1,000 unique URLs × 5,000 endpoints, random IP x URL distribution, assorted with other randomly generated fields

Sample Format:

[249328097] [2025-10-17T15:50:20.721988Z] [/forum/thread/12345.html] [172.16.18.116] [506] [SEARCH] [59466ms] [16802b] [174]

The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware

Results

With 6 caller threads, 2 MB batch size:

🔥 250,000,000 logs processed
⏱️ 11.52 seconds total
🚀 21.71 million logs/second -> to disk
💾 5:1 compression ratio
🖥️ 100% CPU (6 physical cores used)
📊 105 MB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

With 1 caller thread, 512 MB batch size:

🔥 250,000,000 logs processed
⏱️ 27.17 seconds total
🚀 9.2 million logs/second -> to disk
💾 5:1 compression ratio
🖥️ 20% CPU (1 physical core)
📊 1.8 GB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

With 1 caller thread, 500 KB batch size:

🔥 250,000,000 logs processed
⏱️ 40.32 seconds total
🚀 6.2 million logs/second -> to disk
💾 4.6:1 compression ratio
🖥️ 20% CPU (1 physical core)
📊 16 MB RAM footprint (stable)
✅ 0 lost logs & no back-pressure

Detailed Benchmarks : Here

Real-World Implications

Edge computing: Run comprehensive logging on resource-constrained devices without worrying about RAM and CPU limits.
Cost-conscious teams: Reduce log volume manyfold = lower storage costs, lower egress fees, and potentially lower licensing costs.
High-throughput systems: Maintain detailed logging without becoming I/O bound or drowning in storage costs.
Security traceability: Log all events with minimal resources, maintaining absolute temporal order with unique trans-thread atomic IDs.

The Architecture

Producer Threads → [Lock-free Queue] → Batch Builder → [LZ4 Compression] → Disk Writer

Key Design Decisions

Single C DLL (170KB) with no dependencies (plug & play, interoperable with most environments) featuring :

AVX2-optimized code paths
Highly configurable without complex setup
On-the-fly IP anonymization
Custom compression level (or none)
URL param truncation
Unique per-log cross-thread ID
Instant backup write path
Custom memory footprint
Live usage statistics

Optional cryptographic signing for audit trails (on the roadmap).

When You Might Not Need This

Let’s be honest — not every application needs this level of optimization. If you're processing 10,000 logs per second, traditional solutions work fine.

But if you're dealing with:

Hundred of thousand or million of events per second
Bandwidth-constrained environments
Budgets where cloud costs matter
Regulatory requirements for long-term retention ...then host-side compression becomes incredibly valuable.

The Future of Logging

We believe the next frontier in observability isn't collecting more data — it's being smarter about what we keep and how we store it.
By moving compression to the source, we can maintain detailed audit trails without the traditional cost burden.
The 250M log benchmark isn't just a party trick — it's proof that with careful engineering, we can handle orders of magnitude more data on the same hardware.

And in an era where data growth is outpacing budget growth, that might be the most important optimization of all.

Want to run your own tests? For organizations conducting formal technical evaluations, a limited demo DLL is available. We're particularly interested in hearing about edge cases and workloads where our approach does (and doesn't) work well.

Technical specs: Windows x64, AVX2 required, C API (easy bindings for most languages).

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.