Laytoun' thoughts!

Building RAG Update: Hybrid Search, Reranking & Production Hardening

When I published the original series in November 2025, I was happy with where the system landed. It had semantic caching, fallback strategies, distributed tracing, autoscaling, solid production patterns throughout. But as I kept working with it and preparing a talk around the same material, I kept spotting areas where

Building Production-Grade RAG Systems: Kubernetes, Autoscaling & LLMs

We finally got first drop of snow this week in Stockholm. The eather is getting colder and days shorter. That only motivates me to continue writing my third and final post in the RAG series. In part one, we explored the production challenges of RAG systems. In part two, we

Building Production-Grade RAG Systems: Architecture Deep Dive

In the first part, we explored the production challenges of RAG systems: latency, reliability, cost, quality, and observability. Now let's get our hands dirty with the actual architecture and implementation. The codebase uses Java 25, Spring Boot 3.5.7, reactive programming with WebFlux, and follows production patterns you'd see

Building Production-Grade RAG Systems: Understanding the Problem Space

I've been quiet on this blog for a while now. Truth is, I lost my appetite for writing these past months. Between traveling to conferences, delivering talks, and shipping some cool features at work, the keyboard just didn't feel the same. There was also this nagging voice in my head:

A look into Deep Java Library!

When you think about building machine learning apps, Java is not the first language that comes to mind, probably not even in the top 3 or 5! But Java has proved time and again that it is capable of modernising itself, and even if it's not the first choice for

Laytoun' thoughts! © 2026