Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.
I respect such technical work to enforce one’s legal rights when they aren’t respected by corporations, but I have a different approach.
As an aside the Fosdem lecture “Fortify AI against regulation, litigation and lobotomies” is interesting on this topic [1], it’s what inspired me to write about this.
For what I write I am at this time happy to allow it to be used as part of a large training data set (consider this blog post a licence grant that applies until such time as I edit this post to change it). But only if aggregated with so much other data that my content is only a tiny portion of the data set by any metric. So I don’t want someone to make a programming LLM that has my code as the only C code or a political data set that has my blog posts as the only left-wing content. If someone wants to train an LLM on only my content to make a Russell-simulator then I don’t license my work for that purpose but also as it’s small enough that anyone with a bit of skill could do it on a weekend I can’t stop it. I would be really interested in seeing the results if someone from the FOSS community wanted to make a Russell-simulator and would probably issue them a license for such work if asked.
If my work comprises more than 0.1% of the content in a particular measure (theme, programming language, political position, etc) in a training data set then I don’t permit that without prior discussion.
Finally if someone wants to make a FOSS training data set to be used for FOSS LLM systems (maybe under the AGPL or some similar license) then I’ll allow my writing to be used as part of that.
Everytime I read about someone don’t wanting a regulation for ML/AI, I put it on the Musk side of the story (AKA trashbin)
https://joeyh.name/blog/entry/policy_on_adding_AI_generated_content_to_my_software_projects/
that was of course Joey who is not a DD anymore.
https://joeyh.name/blog/entry/attribution_armored_code/
“Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.”
I think that was that:
https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/
Hi Russell
I guess you referred to https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/
best wishes
Nick
Maybe you mean this post by Joey Hess:
https://joeyh.name/blog/entry/attribution_armored_code/
I think the blog post you have in mind is from https://joeyh.name/blog/ (look for “attribution armored code”) on the blog.
Maybe you mean Joeys blogpost:
https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/
”
Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.”
it was Joey Hess
https://joeyh.name/blog/archives/2023/11/
https://joeyh.name/blog/entry/attribution_armored_code/
might be the URL you are looking for
Hi, I put the link in the previous comment – that went into moderation, probably due to the link :D
I think the Debian Developer might be Joey Hess. Attribution armored code might be one of the relevant blog posts.
> Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.
You’re probably looking for this: https://joeyh.name/blog/entry/attribution_armored_code/
Thanks for all the comments about Joey’s posts, seems that he wrote more about it than I realised. Also the first comment by a user needs approval, after that it’s automatic.
I hope Fabio doesn’t think that I’m on the non-regulation side. Choosing a permissive license and acknowledging the fact that my license choices aren’t going to be respected by corporations is not the same as being against regulation. Big corporations should be broken up and/or nationalised as a precondition to regulating them properly.