ML Training License

Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.

I respect such technical work to enforce one’s legal rights when they aren’t respected by corporations, but I have a different approach.

As an aside the Fosdem lecture “Fortify AI against regulation, litigation and lobotomies” is interesting on this topic [1], it’s what inspired me to write about this.

For what I write I am at this time happy to allow it to be used as part of a large training data set (consider this blog post a licence grant that applies until such time as I edit this post to change it). But only if aggregated with so much other data that my content is only a tiny portion of the data set by any metric. So I don’t want someone to make a programming LLM that has my code as the only C code or a political data set that has my blog posts as the only left-wing content. If someone wants to train an LLM on only my content to make a Russell-simulator then I don’t license my work for that purpose but also as it’s small enough that anyone with a bit of skill could do it on a weekend I can’t stop it. I would be really interested in seeing the results if someone from the FOSS community wanted to make a Russell-simulator and would probably issue them a license for such work if asked.

If my work comprises more than 0.1% of the content in a particular measure (theme, programming language, political position, etc) in a training data set then I don’t permit that without prior discussion.

Finally if someone wants to make a FOSS training data set to be used for FOSS LLM systems (maybe under the AGPL or some similar license) then I’ll allow my writing to be used as part of that.

[1] https://tinyurl.com/24sptqxo

Fábio Emilio Costa

April 11, 2024 at 22:19

Everytime I read about someone don’t wanting a regulation for ML/AI, I put it on the Musk side of the story (AKA trashbin)

Joost van Baal-Ili?

April 11, 2024 at 23:20

https://joeyh.name/blog/entry/policy_on_adding_AI_generated_content_to_my_software_projects/

anarcat

April 12, 2024 at 00:00

that was of course Joey who is not a DD anymore.

https://joeyh.name/blog/entry/attribution_armored_code/

Bogdanow

April 12, 2024 at 00:32

“Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.”

I think that was that:

https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/

Nick

April 12, 2024 at 01:43

Hi Russell

I guess you referred to https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/

best wishes
Nick

LightKnight

April 12, 2024 at 01:49

Maybe you mean this post by Joey Hess:
https://joeyh.name/blog/entry/attribution_armored_code/

Pierre

April 12, 2024 at 02:06

I think the blog post you have in mind is from https://joeyh.name/blog/ (look for “attribution armored code”) on the blog.

Sascha

April 12, 2024 at 03:09

Maybe you mean Joeys blogpost:
https://joeyh.name/blog/entry/a_bitter_pill_for_Microsoft_Copilot/

levity

April 12, 2024 at 11:20

”

Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.”

it was Joey Hess

https://joeyh.name/blog/archives/2023/11/

Richard

April 12, 2024 at 18:00

might be the URL you are looking for

April 12, 2024 at 21:28

Hi, I put the link in the previous comment – that went into moderation, probably due to the link :D

April 13, 2024 at 05:17

I think the Debian Developer might be Joey Hess. Attribution armored code might be one of the relevant blog posts.

Tomas Janousek

April 13, 2024 at 06:25

> Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I’d appreciate the URL if anyone has it.

You’re probably looking for this: https://joeyh.name/blog/entry/attribution_armored_code/

etbe

April 13, 2024 at 11:58

Thanks for all the comments about Joey’s posts, seems that he wrote more about it than I realised. Also the first comment by a user needs approval, after that it’s automatic.

I hope Fabio doesn’t think that I’m on the non-regulation side. Choosing a permissive license and acknowledging the fact that my license choices aren’t going to be respected by corporations is not the same as being against regulation. Big corporations should be broken up and/or nationalised as a precondition to regulating them properly.

etbe – Russell Coker

Archives

Categories

ML Training License

14 comments to ML Training License

Archives

Email and RSS

etbe – Russell Coker

Archives

Categories

Tags

ML Training License

14 comments to ML Training License

Archives

Email and RSS