I spent way too long trying to figure out this problem with dlib while using the Python face_recognition library that wraps it, and since I couldn't find anyone giving the correct diagnosis and solution online, I'm posting it as a public service to the next person who hits it.
Here's the error I was getting:
RuntimeError: Error while calling cudaMallocHost(&data, new_size*sizeof(float)) in file /home/jonathan/Projects/dlib/dlib/cuda/gpu_data.cpp:211. code: 2, reason: out of memory
Eventually I gave up and switched from the GPU model ("cnn") to the CPU one ("hog"). Then I started getting errors about
RuntimeError: Unsupported image type, must be 8bit gray or RGB image.
The errors persisted after adding PIL code to convert to RGB.
This one was easier to track down on Google: it happens when you have numpy 2.x installed, which is not compatible with dlib. Seems like something along the way should give a warning about that!
At any rate, with numpy downgraded to the latest 1.x version, the cudaMallocHost error also went away. I guess something in numpy 2.x is getting interpreted as a Very Large image size value.
Later on I started getting cudaMallocHost errors again. These came from using an Image that had not been converted to RGB. So the unifying theme seems to be "cnn mode doesn't have the same sanity checks enabled that hog does, if you get weird errors you should switch to hog and once it works try cnn again."