AI, Anubis, Containers
I’ve played around with a few things this month, and considered playing around with a few other things. But first, rambling about AI.
AI
I don’t hate AI. My colleagues have given both the Visual Studio Code and ChatGPT programming aids a positive review, and I’m inclined to trust their opinions. With that said, I very much dislike that the corpus used to train these large language models was scraped with no regard to the cost to the webhost, nor the copyright of the material hosted. I also very much dislike that the dialogue of both industry and the UK Government is “this entire field depends on infringing copyright, so infringing copyright should be legal”.
This paragraph from the UK Government’s consultation is illustrative:
At present, the application of UK copyright law to the training of AI models is disputed. Rights holders are finding it difficult to control the use of their works in training AI models and seek to be remunerated for its use. AI developers are similarly finding it difficult to navigate copyright law in the UK, and this legal uncertainty is undermining investment in and adoption of AI technology.
Seriously? - “Rights holders are finding it difficult to control the use of their works in training AI models”? “finding it difficult”? That’s a very conciliatory way to say “individuals that lack recourse are having the ownership of their works violated en masse”. By an industry spread almost exlusively between the USA and China, no less.
Anubis
Anyway - anubis is a proof-of-work middleware that sits in front of your site and issues challenges to clients. The short summary of anubis’ proof-of-work is like this:
- I give you a ‘challenge’. Say ‘Thursday’.
- I give you a ‘difficulty’. In anubis’ case, this is the number of leading zeroes your answer should have.
- You sha256sum my challenge with varying nonces until you have a hash with the right number of leading zeroes.
- You give me your answer, including the nonce that generated a suitable answer.
- I confirm that the shasum of the challenge + nonce has the right number of leading zeroes.
- I hand you a cookie that you provide on future requests so you don’t need to go through this again.
Step 3 is the work, and the proof of that work is the nonce. It does work - maybe two thirds of all requests I used to see on my dinky website are now stopped at the front door. Of course the reason it works isn’t because AI scrapers aren’t able to run a little script - they can and do, and some still get through. But it changes the finances. It makes it just that little bit more expensive to scrape me, and in a lot of cases, that seems to be enough.
Downsides
I don’t really want javascript on my site. Sure, the music pages use it - there’s not much else I can do to trigger playback. But the rest has always been plain HTML and CSS. So adding javascript just to load the site is rather sad.
The little mascot I could do without. And really, I could just remove it myself. I’m compiling anubis myself, instead of using the prebuilt docker image. I’ve already changed the styling to have it match the rest of the site. Removing the mascot and small advert for the author’s business is simple. But out of respect for the author (and their specific offer to provide a white-label version for a fee), I’ve left these elements in.
Containers
When setting up anubis, I did consider using the container images provided by the author. For a while now, I’ve been thinking about migrating everything I run on my servers to containers on top of a small Kubernetes cluster. It’s my job, so I know how to do it, and I don’t think it’s much work.
This would allow me to play with things like Yoke.
Or I could skip Kubernetes entirely and use something like Kamal. Although the lack of formal Podman support is a bummer.
Of course, I run quite a few things these days. Running Misskey in a cluster isn’t that hard, but migrating my existing postgresql instance would take a fair amount of time. I’m much more familiar with MariaDB. It would just be two deployments with hostPath mounts to an NFS directory. Unworkable at scale, but it’ll do for a dinky private site. Komga and Jellyfin would be simpler, as they come with all their nuts and bolts pre-fitted. This website would probably just be an nginx pod.
Calico could hand out private addresses within the home subnet, but outside the pool. And kubernetes-ingress could just use the host network so my A records can stay the same.
But I’m always left with the same sort of hesitation. Like, it’s a lot of work. To mostly get back to my starting position. It isn’t like I need to, or even could scale the things I run. You can’t simply spin up twenty postgres pods and have twenty-way load distribution. If everything is on one disk, then you’re bound to one disk’s worth of IO. The benefit is distributing CPU and memory load - which isn’t an issue I’m close to having.
Maybe I’ll do it at some point. I don’t know. It wouldn’t hurt to have somewhere to play with gateways.
But then again, do I really want to convert my nftables ruleset to iptables-syntax, so when kube-proxy shits all over said rules, I can follow everything in one place? Probably not. So maybe I wont do it at some point.