Lessons learned writing LLMap I wrote LLMap to solve code search in the Apache Cassandra repo. Cassandra is too large (~200kloc across ~2500 files, about 4.5M tokens) to throw at even the largest LLM context window. And of course there are many codebases larger still. The idea is simple: ask the LLM to do the work. But getting it to work consistently was harder than I expected . Here's a few of the hiccups I ran into and how I worked around them. DeepSeek V3 can't classify things without thinking first Recall that LLMap optimizes the problem by using a multi-stage analysis to avoid spending more time than necessary analyzing obviously irrelevant files: Coarse analysis using code skeletons Full source analysis of potentially relevant files from (1) Refine the output of (2) to only the most relevant snippets It turns out that if you just ask DeepSeek V3 to classify the skeleton as relevant/irrelevant you will get garbage results. Sometimes it calls everything rel...
I recently switched tasks from writing the ColBERT Live! library and related benchmarking tools to authoring BM25 search for Cassandra . I was able to implement the former almost entirely with "coding in English" via Aider . That is: I gave the LLM tasks, in English, and it generated diffs for me that Aider applied to my source files. This made me easily 5x more productive vs writing code by hand, even with AI autocomplete like Copilot. It felt amazing! (Take a minute to check out this short thread on a real-life session with Aider , if you've never tried it.) Coming back to Cassandra, by contrast, felt like swimming through molasses. Doing everything by hand is tedious when you know that an LLM could do it faster if you could just structure the problem correctly for it. It felt like writing assembly without a compiler -- a useful skill in narrow situations, but mostly not a good use of human intelligence today. The key difference in these two sce...