I recently added support for ColPali image search to the ColBERT Live! Library . This post is going to skip over the introduction to ColPali and how it works; please check out Antaripa Saha's excellent article for that. TLDR, ColPali allows you to natively compare text queries with image-based documents, with accuracy that bests that previous state of the art. ("Natively" means there's no extract-to-text pipeline involved.) Adding ColPali to ColBERT Live! The "Col" in ColPali refers to performing maxsim-based "late interaction" relevance as seen in ColBERT . Since ColBERT Live! already abstracts away the details of computing embedding vectors, it is straightforward to add support for ColPali / ColQwen by implementing an appropriate Model subclass . However, ColBERT Live!'s default parameters were initially tuned for text search . To be able to give appropriate guidance for image search, I ran a grid search on the ViDoRe benchmark that ...