Series

Off-Heap Algorithms in Java

10 articles in this series

344 min total read

By Arthur Costa

Off-Heap Algorithms in Java: The Ring Buffer Foundation

From a naive heap-based queue to an off-heap ring buffer with dramatically better throughput, tail latency, and GC behavior for high-frequency trading workloads.

Nov 15, 2025•20 min read•

AArthur Costa

Part 2

Wait-Free SPSC Queues in Java

How to replace synchronized queue handshakes with a wait-free Single-Producer Single-Consumer ring buffer that uses precise memory ordering instead of locks.

Dec 23, 2025•18 min read•

AArthur Costa

Part 3

Lock-Free MPSC Queues: Production-Grade Implementation

A deep-dive into building production-grade Multi-Producer Single-Consumer lock-free queues in Java, with VarHandle, CAS operations, and real-world benchmarks.

Jan 15, 2026•51 min read

Part 3

Lock-Free MPSC Queues in Java

How to replace locked many-producer queues with a lock-free Multi-Producer Single-Consumer ring buffer coordinated entirely by CAS and sequence numbers.

Nov 17, 2025•18 min read•

AArthur Costa

Part 4

Lock-Free MPMC Queues: Dual Contention Mastery

Master the complexity of Multi-Producer Multi-Consumer lock-free queues with per-slot sequence numbers, dual CAS coordination, and work-stealing thread pool integration.

Jan 29, 2026•50 min read

Part 4

MPMC Queues in Java: The Final Boss

How to build a dual-CAS Multi-Producer Multi-Consumer ring buffer in Java that scales on both ends without collapsing under lock contention.

Nov 18, 2025•18 min read•

AArthur Costa

Part 5

The Disruptor Pattern: Multi-Stage Event Processing Pipelines

Implement LMAX Disruptor-style event processing with sequence barriers, multi-stage pipelines, and batch processing for ultra-low latency systems.

Feb 12, 2026•51 min read

Part 5

Event Pipelines in Java: The LMAX Disruptor Pattern

How to chain SPSC queues into a high-throughput event pipeline, following the LMAX Disruptor pattern for multi-stage processing with sub-microsecond latency.

Nov 19, 2025•18 min read•

AArthur Costa

Part 6

Wait-Free Telemetry: Never-Blocking Observability

Build wait-free telemetry buffers that never block producers, with overwrite semantics for high-frequency trading observability that doesn't impact system performance.

Jan 4, 2026•50 min read•

AArthur Costa

Part 7

Sharded Processing: Per-Core Isolation for Zero Contention

Eliminate contention entirely with per-CPU-core sharded buffers, thread affinity, and isolated processing lanes for maximum parallelism.

Jan 4, 2026•50 min read•

AArthur Costa

Back to all articles