Archives

Categories

Trying DeepSeek R1

I saw this document on running DeepSeek R1 [1] and decided to give it a go. I downloaded the llama.cpp source and compiled it and downloaded the 131G of data as described. Running it with the default options gave about 7 CPU cores in use. Changing the --threads parameter to 44 caused it to use 17 CPU cores (changing it to larger numbers like 80 made it drop to 2.5 cores). I used the --n-gpu-layers parameter with the value of 1 as I currently have a GPU with only 6G of RAM (AliExpress is delaying my delivery of a PCIe power adaptor for a better GPU). Running it like this makes the GPU take 12W more power than standby and using 5.5G of VRAM according to nvidia-smi so it is doing a small amount of work, but not much. The documentation refers to the DeepSeek R1 1.58bit model which I’m using as having 61 layers so presumably less than 2% of the work is done on the GPU.

Running like this it takes 2 hours of CPU time (just over 3 minutes of elapsed time at 17 cores) to give 8 words of output. I didn’t let any tests run long enough to give complete output.

The documentation claims that it will run on CPU with 20G of RAM. In my tests it takes between 161G and 195G of RAM to run depending on the number of threads. The documentation describes running on the CPU as “very slow” which presumably means 3 words per minute on a system with a pair of E5-2699A v4 CPUs and 256G of RAM.

When I try to use more than 44 threads I get output like “system_info: n_threads = 200 (n_threads_batch = 200) / 44” and it seems that I only have a few threads actually in use. Apparently there’s some issue with having more threads than the 44 CPU cores in the system.

I was expecting this to go badly and it met my expectations in that regard. But it was interesting to see exactly how it went badly. It seems that if I had a GPU with 24G of VRAM I’d still have 54/61 layers running on the CPU so even the largest of home GPUs probably wouldn’t make much difference.

Maybe if I configured the server to have hyper-threading enabled and 88 HT cores then I could have 88 threads and about 34 CPU cores in use which might help. But even if I got the output speed from 3 to 6 words per minute that still wouldn’t be very usable.

Leave a Reply