I recently added support for ColPali image search to the ColBERT Live! Library . This post is going to skip over the introduction to ColPali and how it works; please check out Antaripa Saha's excellent article for that. TLDR, ColPali allows you to natively compare text queries with image-based documents, with accuracy that bests that previous state of the art. ("Natively" means there's no extract-to-text pipeline involved.) Adding ColPali to ColBERT Live! The "Col" in ColPali refers to performing maxsim-based "late interaction" relevance as seen in ColBERT . Since ColBERT Live! already abstracts away the details of computing embedding vectors, it is straightforward to add support for ColPali / ColQwen by implementing an appropriate Model subclass . However, ColBERT Live!'s default parameters were initially tuned for text search . To be able to give appropriate guidance for image search, I ran a grid search on the ViDoRe benchmark that
Transcript of airhacks.fm episode 316 Adam Bien: Hey, Jonathan, how JVector 4 is doing? Jonathan Ellis: JVector 4? AB: Yeah, because Vector 3 is completed, I think now. JE: JVector 4. Well, shoot man. If you want to sneak preview, we may have some news to talk about with GPU acceleration for JVector 4, but that's super, super early and I can't promise any specifics yet. 0:00:28 JVector 3 features and improvements AB: It was a joke actually. So, I know that JVector 3 is completed. And so what's the major features or what happened between JVector 2 and 3? JE: JVector 2 was a fairly straightforward adaptation of Microsoft Research’s DiskANN search indexing to Java and to Cassandra. And so that means that you have a two pass search where you have a core index that works on a graph, whose comparisons are done with quantized vectors that are kept in memory. And then you refine the results of that search by using full resolution vectors from disk. And so JVector 3 has been, ho