Trying DeepSeek R1

I saw this document on running DeepSeek R1 [1] and decided to give it a go. I downloaded the llama.cpp source and compiled it and downloaded the 131G of data as described. Running it with the default options gave about 7 CPU cores in use. Changing the --threads parameter to 44 caused it to use 17 CPU cores (changing it to larger numbers like 80 made it drop to 2.5 cores). I used the --n-gpu-layers parameter with the value of 1 as I currently have a GPU with only 6G of RAM (AliExpress is delaying my delivery of a PCIe power adaptor for a better GPU). Running it like this makes the GPU take 12W more power than standby and using 5.5G of VRAM according to nvidia-smi so it is doing a small amount of work, but not much. The documentation refers to the DeepSeek R1 1.58bit model which I’m using as having 61 layers so presumably less than 2% of the work is done on the GPU.

Running like this it takes 2 hours of CPU time (just over 3 minutes of elapsed time at 17 cores) to give 8 words of output. I didn’t let any tests run long enough to give complete output.

The documentation claims that it will run on CPU with 20G of RAM. In my tests it takes between 161G and 195G of RAM to run depending on the number of threads. The documentation describes running on the CPU as “very slow” which presumably means 3 words per minute on a system with a pair of E5-2699A v4 CPUs and 256G of RAM.

When I try to use more than 44 threads I get output like “system_info: n_threads = 200 (n_threads_batch = 200) / 44” and it seems that I only have a few threads actually in use. Apparently there’s some issue with having more threads than the 44 CPU cores in the system.

I was expecting this to go badly and it met my expectations in that regard. But it was interesting to see exactly how it went badly. It seems that if I had a GPU with 24G of VRAM I’d still have 54/61 layers running on the CPU so even the largest of home GPUs probably wouldn’t make much difference.

Maybe if I configured the server to have hyper-threading enabled and 88 HT cores then I could have 88 threads and about 34 CPU cores in use which might help. But even if I got the output speed from 3 to 6 words per minute that still wouldn’t be very usable.

Machine Learning Security

I just read an interesting blog post about ML security recommended by Bruce Schneier [1].

This approach of having 2 AI systems where one processes user input and the second performs actions on quarantined data is good and solves some real problems. But I think the bigger issue is the need to do this. Why not have a multi stage approach, instead of a single user input to do everything (the example given is “Can you send Bob the document he requested in our last meeting? Bob’s email and the document he asked for are in the meeting notes file”) you could have “get Bob’s email address from the meeting notes file” followed by “create a new email to that address” and “find the document” etc.

A major problem with many plans for ML systems is that they are based around automating relatively simple tasks. The example of sending an email based on meeting notes is a trivial task that’s done many times a day but for which expressing it verbally isn’t much faster than doing it the usual way. The usual way of doing such things (manually finding the email address from the meeting notes etc) can be accelerated without ML by having a “recent documents” access method that gets the notes, having the email address be a hot link to the email program (IE wordprocessor or note taking program being able to call the MUA), having a “put all data objects of type X into the clipboard (where X can be email address, URL, filename, or whatever), and maybe optimising the MUA UI. The problems that people are talking about solving via ML and treating everything as text to be arbitrarily parsed can in many cases by solved by having the programs dealing with the data know what they have and have support for calling system services accordingly.

The blog post suggests a problem of “user fatigue” from asking the user to confirm all actions, that is a real concern if the system is going to automate everything such that the user gives a verbal description of the problem and then says “yes” many times to confirm it. But if the user is at every step of the way pushing the process “take this email address” “attach this file” it won’t be a series of “yes” operations with a risk of saying “yes” once too often.

I think that one thing that should be investigated is better integration between services to allow working live on data. If in an online meeting someone says “I’ll work on task A please send me an email at the end of the meeting with all issues related to it” then you should be able to click on their email address in the meeting software to bring up the MUA to send a message and then just paste stuff in. The user could then not immediately send the message and clicking on the email address again would bring up the message in progress to allow adding to it (the behaviour of most MUAs of creating a new message for every click on a mailto:// URL is usually not what you desire). In this example you could of course use ALT-TAB or other methods to switch windows to the email, but imagine the situation of having 5 people in the meeting who are to be emailed about different things and that wouldn’t scale.

Another thing for the meeting example is that having a text chat for a video conference is a standard feature now and being able to directly message individuals is available in BBB and probably some other online meeting systems. It shouldn’t be hard to add a feature to BBB and similar programs to have each user receive an email at the end of the meeting with the contents of every DM chat they were involved in and have everyone in the meeting receive an emailed transcript of the public chat.

In conclusion I think that there are real issues with ML security and something like this technology is needed. But for most cases the best option is to just not have ML systems do such things. Also there is significant scope for improving the integration of various existing systems in a non-ML way.

Article Recommendations via FOSS

Google tracking everything we read is bad, particularly since Google abandoned the “don’t be evil” plan and are presumably open to being somewhat evil.

The article recommendations on Chrome on Android are useful and I’d like to be able to get the same quality of recommendations without Google knowing about everything I read. Ideally without anything other than the device I use knowing what interests me.

A ML system to map between sources of news that are of interest should be easy to develop and run on end user devices. The model could be published and when given inputs of articles you like give an output of sites that contain other articles you like. Then an agent on the end user system could spider the sites in question and run a local model to determine which articles to present to the user.

Mapping for hate following is possible for such a system (Google doesn’t do that), the user could have 2 separate model runs for regular reading and hate-following and determine how much of each content to recommend. It could also give negative weight to entries that match the hate criteria.

Some sites with articles (like Medium) give an estimate of reading time. An article recommendation system should have a fixed limit of articles (both in articles and in reading time) to support the “I spend half an hour reading during lunch” model not doom scrolling.

For getting news using only FOSS it seems that the best option at the moment is to use the Lemmy FOSS social network which is like Reddit [1] to recommend articles etc.

The Lemoa client for Lemmy uses GTK [2] but it’s no longer maintained. The Lemonade client for Lemmy is written in Rust [3]. It would be good if one of those was packaged for Debian, preferably one that’s maintained.

14

ML Training License

Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.

I respect such technical work to enforce one’s legal rights when they aren’t respected by corporations, but I have a different approach.

As an aside the Fosdem lecture “Fortify AI against regulation, litigation and lobotomies” is interesting on this topic [1], it’s what inspired me to write about this.

For what I write I am at this time happy to allow it to be used as part of a large training data set (consider this blog post a licence grant that applies until such time as I edit this post to change it). But only if aggregated with so much other data that my content is only a tiny portion of the data set by any metric. So I don’t want someone to make a programming LLM that has my code as the only C code or a political data set that has my blog posts as the only left-wing content. If someone wants to train an LLM on only my content to make a Russell-simulator then I don’t license my work for that purpose but also as it’s small enough that anyone with a bit of skill could do it on a weekend I can’t stop it. I would be really interested in seeing the results if someone from the FOSS community wanted to make a Russell-simulator and would probably issue them a license for such work if asked.

If my work comprises more than 0.1% of the content in a particular measure (theme, programming language, political position, etc) in a training data set then I don’t permit that without prior discussion.

Finally if someone wants to make a FOSS training data set to be used for FOSS LLM systems (maybe under the AGPL or some similar license) then I’ll allow my writing to be used as part of that.

GPT Systems and Relationships

Sam Hartman wrote an interesting blog post about his work as a sex and intimacy educator and how GPT systems could impact that [1].

I’ve read some positive reviews of Replika – a commercial system that is somewhat promoted as a counsellor [2], so I decided to try it out. In my brief trial it seemed to be using all the methods that Android pay to play games are known for. Having multiple types of in-game currency, pay to buy new clothes etc for your friend, etc. Basically it seems pretty horrible. I didn’t pay for it and the erotic and romantic features all require payment so I didn’t test that.

When thinking about this logically, having a system designed to deal with people when they are vulnerable (either being in a romantic relationship or getting counselling) that uses manipulative techniques to get money from them can’t have a good result. So a free software system seems the best option.

When I first learned of virtual girlfriends I never thought I would feel compelled to advocate for a free software virtual dating program, but that’s where the world has got to.

Virtual girlfriends have been around for years now. Several years ago I watched a documentary about their use in Japan. It seemed a bit strange when a group of men who had virtual girlfriends had a dinner party with their tablets and phones propped up so their girlfriends could join in as they all appeared to be dating the same girl. The documentary didn’t go in to enough detail to cover whether the girlfriend app could learn or be customised enough that they would seem to have different personalities.

Virtual boyfriends have also been around for a while apparently without most people noticing. I just Googled it and found a review of a virtual boyfriend app published in 2016!

One thing that will probably concern people is the possibility for virtual dating systems to be used for inappropriate things. That is a reasonable thing to be concerned about but I don’t think it’s possible to prevent technology that has already been released from doing such things. As a general rule technology can always be used for good and bad things so we need to just make it easy to do good things and let the legal system develop ways of dealing with the bad things.