Using lots of little tools to aggressively reject the bots

2025-05-318:06215145lambdacreate.com

U G H. For some reason my quaint little piece of the internet has suddenly been inundated with unwanted guests. Now normally speaking I would be over joyed with having more guests to this tiny part of…

U G H.

For some reason my quaint little piece of the internet has suddenly been inundated with unwanted guests. Now normally speaking I would be over joyed with having more guests to this tiny part of the internet. Come inside the cozy little server room, we have podcasts to drown out the noise of the fans, and plenty to read. Probably at some point in the future there will be photography things too, and certainly plenty of company. But no, nobody can have such nice things. Instead the door to the server room was kicked down and in came a horde of robots hell bent on scraping every bit of data they possibly could from the site.

Now, for the longest time, I've had no real issue with this. Archive.org is welcome to swing by any time, index the entire site, and stash it away for posterity. Keep up the good work folks! But instead of respectful netizens like that, I have the likes of Amazon, Facebook, and OpenAI, along with a gaggle of random friends, knocking on my doors. These big corporations 1) do not need my content and 2) are only accessing it for entirely self serving means.

Lets not even pretend it's anything else, because we know it isn't. These large companies scrape data broadly and with little regard to the effect it has on the infrastructure servicing whatever it is they're pulling from. With the brain slug that is "AI" now openly encouraging the mass consumption of data from the internet at large to train their models on, it was really only a matter of time before the scraping became more severe. This is the hype cycle at work. OpenAI needs to scrape to train, Facebook does too because they have a competing model. Amazon and Google and Microsoft all have their own reasons related to search and advertising, bending the traffic to flow through their platforms. The point is, these are not "consumers" of Lambdacreate. You, the human reading this, are! Thanks for reading.

To the bots. Roboti ite domum!

Hyperbole aside, what's our problem exactly?

Fortunately, I am well versed in systems administration, and have a whole toolkit at my disposal to analyze the issue. Let's put some numbers against all of the above hyperbole.

My initial sign that something was up came in from my Zabbix instance. I call the little server that runs my Zabbix & Loki instances Vignere after the creator of the Vignere Cipher, hence the funky photo in Discord. Anyways, Vignere complained about my server using up its entire disc for all of my containers. Frustrating, but not a big deal since I'm using LXD under the hood.

A Zabbix alert in a Discord channel displaying disc exhaustion for multiple containers.

Fine, I'll take my lumps. Took down all of my sites briefly, expanding the underlying ZFS sparse file, and brought the world back up. No harm no foul, just growing pains. But of course, that really wasn't the issue. I was inundated with more alerts. Suddenly I was seeing my Gitea instance grow to consume the entire disc every single day, easily generating 20-30G of data each day. Super frustrating, and enough information on the internet says that Gitea just does this and doesn't enable repo archive cleanup by default, so that must be it. I happily go and setup some aggressive cleanup tasks thinking my problems are over. Maybe I shouldn't have setup a self-hosted git forge and just stuck with Gitlab or Github.

But no, not at all, this thin veneer of a fix rapidly crumbled under the sudden and aggressive uptick in web traffic I started seeing. Suddenly it wasn't just disc usage, I was getting inundated with CPU and Memory alerts from my poor server. I couldn't git pull or push to my Gitea. Hell my weechat client couldn't even stay connected. Everything ground to a halt for a bit. But by the time I could get away from work, or the kids, and pull out my computer to dig into it the problem had stopped. I could access everything. Sysstat and Zabbix told me that the resource utilization issues were real, but I couldn't exactly tell why from just that.

A Zabbix alert in a Discord channel displaying extremely high cpu utilization.

This is however, why I keep an out of band monitoring system in the first place. I need to be able to look at historic metrics to see what "normal" looks like. Otherwise it's all just guesswork. And boy did Zabbix have a story to tell me. To get a clear understanding of what I mean, lets take a quick look at the full dashboard from when I redid my Zabbix server after it failed earlier this year. Pew pew flashy graphs right? The important one here is the nginx requests and network throughput chart in the bottom left hand corner of the dashboard. Note that that's what "normal" traffic looks like for my tiny part of the internet.

An aggregate Zabbix graph that shows Nginx requests per second overlaid with in/out bound network traffic data.

And this, dear reader, is what the same graph looks like after LC was laid siege to. Massive difference right? And not a fun one either. On average I was seeing 8 requests per second come into nginx across a one month period. It's not a lot, but once again, this is just a tiny server hosting a tiny part of the internet. I'm not trying to dump hyper scale resources into my personal blog, it just isn't necessary.

The same graph, only scary.

At its worst Zabbix shows that for a period I was getting hit with 20+ requests per second. Once again, not a lot of traffic, but it is 10x what my site usually gets, and that makes a big difference!

So why the spike in traffic? Why specifically from my gitea instance? Why are there CPU and Disc alerts mixed into all of this, it's not like 20+ requests a second is a lot for nginx to handle by any means. To understand that, we need to dig into the logs on the server.

Looking under the hood

But before I could even start to do that I needed a way to get keep the server online long enough to actually review the logs. This is where out of band logging like a syslog or loki server would be extremely helpful. But instead I had the join of simply turning off all of my containers and disabling the nginx server for a little bit. After that I dug two great tools out of my toolkit to perform the analysis, lnav & goaccess.

lnav is this really great log analysis tool, it provides you with a little TUI that color codes your log files and skim through them like any other pager. That in and of itself is cool, but it also provides an abstraction layer on top of common logging formats and lets you query the data inside of the log using SQL queries. That, for me, is a killer feature. I'm certainly not scared to grep, sed, and awk my way through a complex log file, but SQL queries are way simpler to grasp.

Here's the default view, it's the equivalent of a select * from access_log.

The default rendering for an nginx access log, there's a ton of colors, it's log files made pretty!

Digging through this log ended up being incredibly easy and immediately informative. I won't bore anyone with random data, but these are the various queries I ran against my access.log to try and understand what was happening.

# How many different visitors are there total?
select count(distinct(c_ip)) from access_log;

# Okay that's a big number, what do these IPs look like, is there a pattern?
select distinct(c_ip) from access_log;

# Are these addresses coming from somewhere specific (ie: has LC been posted to Reddit/Hackernews and hugged to death?)
select distinct(cs_referer) from access_log;

# Are these IPs identified by a specific agent?
select distinct(cs_user_agent) from access_log;

# Theres a lot of agents and IPs, what IPs are associated with what address?
select c_ip, cs_user_agent from access_log;

After a quick review of the log it was obvious that the traffic wasn't originating from the same referrer, ie: no hug of death. Would've been neat though right? Instead there was entire blocks of IP addresses hitting www.lambdacreate.com and krei.lambdacreate.com and scraping every single url. Some of these IPs were kind enough to use actual agent names like Amazonbot, OpenAI, Applebot, and Facebook, but there was plenty of obviously spoofed user agents in the mix. Since this influx of traffic was denying my own access to the services I host (specifically my Gitea instance) I figured the easiest and most effective solution was just to slam the door in everyone's face. Sorry, this is MY corner of the internet, if you can't play nice you aren't welcome.

So anyways, I just started banning.

Roboti ite infernum

Nginx is frankly an excellent web server. I've managed lots of Apache in my time, but Nginx is just slick. Really it's probably the fact that Openresty + Lapis brings you this wonderful blog that I really have a preference at all because I'm positive what I'm about to describe is entirely doable in Apache as well. Right, anyways, the easiest way to immediately change the situation is to outright reject anyone who reports their user agent and is causing any sort of disruption.

My hamfisted solution to that is to just build up a list of all of the offensive agents. Sort of like this, only way longer.

map $http_user_agent $badagent {
        default         0;
        ~*AdsBot-Google 1;
        ~*Amazonbot     1;
        ~*Amazonbot/0.1 1;
}

Then in the primary nginx configuration, source the user agent list, and additional setup a rate limit. Layering the defenses here allows me to outright block what I know is a problem, and slow down anything that I haven't accounted for while I make adjustments.

# Filter bots to return a 403 instead of content.
include /etc/nginx/snippets/useragent.rules;

# Define a rate limit of 1 request per second every 1m
limit_req_zone $binary_remote_addr zone=krei:10m rate=5r/s;

Then in the virtual host configuration we configure both the rate limit and a 403 rejection statement.

limit_req zone=krei burst=20 nodelay;

if ($badagent) {
         return 403;
}

It really is that hamfisted and easy. If you're on the list, 403. If you're not and you start to scrape, you get the door slammed in your face! But of course this only half helps, while issuing 403s prevents access to the content of the site, my server still needs to process that http request and reject it. That's less resource intense then processing something on the backend, but it's still enough where if the server is getting tons of simultaneous scraping requests that it bogs it down.

Now with 403 rejections in place we can start to prod the nginx access log with lnav. How about checking to see all of the unique IPs that our problems originate from?

select distinct(c_ip) from access_log where sc_status = 403;

126 distinct IPs displayed in lnav

Or better yet, we can use goaccess to analyze in detail all of our logs, historic and current, and see how many requests have hit the server, and what endpoint they're targeting the most.

zcat -f access.log-*.gz | goaccess --log-format=COMBINED access.log -o scrapers.html

The Goaccess dashboard displaying the broad total statistics. The Goaccess graphs displaying the total IP and agent type graphs.

Either of these is enough to indicate that there are hundreds of unique IPs, and to fetch lists of user agents to block. But to actually protect the server we need to go deeper, we need firewall rules, and some kind of automation. What we need is Fail2Ban.

Since we're 403 rejecting traffic based off of known bad agents, our fail2ban rule can be wicked simple. And because I just don't care anymore we're handing out 24 hour bans for anyone breaking the rules. That means adding this little snippet to our fail2ban configuration.

[nginx-forbidden]
enabled = true
port     = http,https
logpath = /var/log/nginx/access.log
bantime = 86400

And then creating a custom regex to watch for excessively 403 requests.

[INCLUDES]

before = nginx-error-common.conf

[Definition]
failregex = ^<HOST> .* "(GET|POST) [^"]+" 403
ignoreregex =

datepattern = {^LN-BEG}

journalmatch = _SYSTEMD_UNIT=nginx.service + _COMM=nginx

And our result is! Boom! A massive ban list! 735 bans at the time of writing this. Freaking ridiculous.

~|>> fail2ban-client status nginx-forbidden
Status for the jail: nginx-forbidden
|- Filter
|  |- Currently failed: 13
|  |- Total failed:     57135
|  `- File list:        /var/log/nginx/access.log
`- Actions
   |- Currently banned: 38
   |- Total banned:     735
   `- Banned IP list:   85.208.96.210 66.249.64.70 136.243.220.209 85.208.96.207 185.191.171.18 85.208.96.204 185.191.171.15 85.208.96.205 85.208.96.201 185.191.171.8 85.208.96.200 185.191.171.4 185.191.171.11 185.191.171.1 85.208.96.202 185.191.171.5 185.191.171.6 85.208.96.209 185.191.171.10 85.208.96.203 85.208.96.195 85.208.96.206 185.191.171.16 185.191.171.7 85.208.96.208 185.191.171.17 185.191.171.2 85.208.96.199 85.208.96.212 185.191.171.13 66.249.64.71 66.249.64.72 185.191.171.3 85.208.96.197 85.208.96.193 85.208.96.196 185.191.171.12 85.208.96.194

So what?

The end result is that you're able to enjoy this blog post, and have had access to all the other great lambdacreate things for several months now. Because it is incredibly difficult to write blog posts when you have to fend off the robotic horde. None of this was even scraping my blog, it was all targeting at generating tarballs of every single commit of every single publicly listed git repo in my gitea instance. Disgusting and unnecessary. But I'm leaving the rule set, take a quick glance at the resource charts from Zabbix and you'll readily understand why.

The change displayed as a Zabbix graph, everything is looking way better now.

Long term, I'll probably want to figure out a way to extend this list, or make exceptions for legitimate services such as archive.org. And I don't want the content here to be delisted from search engines necessarily, but at the same time this isn't here to fuel the AI enshitification of the internet either. So allez vous faire foutre scrapers.


Page 2

So a while back (apparently October of two years ago?!) I wrote a shell script to automate the maintenance of my Alpine packages. It started out as a simple version checker, and grew into a full blow workflow automation tool. I love this little script, and use it literally several times a day. I wouldn't consider managing all of the packages I do by hand ever again. But for all of the love I have for this tool, it is several flawed.

And it's by design! Entirely and utterly my fault! In fact, all of the flaws I put into this script were utterly intentional at the time I wrote it. See I was on this Python binge, had to do a whole bunch of it for work, and so it weaseled its way into my shell script. And this past year I've been rubying a ton, and that is all over the place too! I even started to rewrite my terribly janky script into ruby, but that really only made the problem worse.

You see, my main system is just not that strong, it turns out that when you rely on an old armv7 cpu and a gig of ram, it really can only do so much. And it really struggles to deal with unoptimized low resource unfriendly languages like python and ruby. See those languages trade ease of development for performance. So while I can absolutely bang out a python or ruby script in a few lines of what feels like pseudo-code, it just does not run "well". And that is exactly the jank we're dealing with now. I've suffered my own technical debt for too long.

This was my bright idea two years ago. I didn't want to deal with parsing XML inside of the shell script, I wanted to be lazy. So what if I just heredoc'd a really shitty python script into the python repl? Instead of admonishing me for this stupid idea, it actually worked, and thus my ENTIRE version checking pipeline was born!

check_feed() {
        title=$(python3 - <<EOF
import feedparser
feed = feedparser.parse("$1")
entry = feed.entries[0]
print(entry.title)
EOF
                )
        echo "$title" | sed 's/'$pkg'//g' | grep -m1 -Eo "([0-9]+)((\.)[0-9]+)*[a-z]*" | head -n1
}

But of course, this was a temporary solution, I'd rewrite this later right? NOPE. This solution just got WORSE, because every temporary solution is for some god awful reason permanent. And of course it turns out that my quick little over simplified python in a shell script was not up to the task of actually parsing all of the wild things that people shove into their git forge RSS/Atom feeds. To the point where I needed to keep notes on what it could and couldn't do, what it choked on, literally duplicating the entire script with different handling because it's pretty important to know when a release is a beta or alpha or rc. Yeah, it was terrible frankly.

check_feed() {
	title=$(python3 - <<EOF
import feedparser
from bs4 import BeautifulSoup

feed = feedparser.parse("$1")
entry = feed.entries[0]
print(entry.title)

if "-v" in "$2":
    for k in entry.content:
        if k["type"] =="text/html":
                detail = BeautifulSoup(k.value, features="lxml")
                print(detail.get_text())
elif "-d" in "$2":
    print(entry)
EOF
		 )

	if [ -z $2 ]; then
		ver=$(echo "$title" | sed 's/'$pkg'//g' | grep -m1 -Eo "([0-9]+)((\.)[0-9]+)*[a-z]*" | head -n1)
		pr=$(echo "$title" | grep -oi "alpha\|beta\|rc[0-9]\|rc\.[0-9]")
		if [ "$ver" == "" ]; then
			link=$(python3 - <<EOF
import feedparser
from bs4 import BeautifulSoup

feed = feedparser.parse("$1")
entry = feed.entries[0]
print(entry.link)
EOF
				)
			ver=$(echo "$link" | sed 's/'$pkg'//g' | grep -m1 -Eo "([0-9]+)((\.)[0-9]+)*[a-z]*" | head -n1)
			pr=$(echo "$link" | grep -oi "alpha\|beta\|rc[0-9]")
			if [ "$pr" == "" ]; then
				echo "$ver"
			else
				echo "$ver [$pr]"
			fi
		else
			if [ "$pr" == "" ]; then
				echo "$ver"
			else
				echo "$ver [$pr]"
			fi
		fi
	else
		echo "$title"
	fi
}

Quantifying the jank

We can all look at that code and immediately realize there is a major problem. It never should have made it into "production", but how bad was it exactly?

Well the last iteration of that python in a shell script jank took this long to skim through ~170 different RSS feeds. Who wants to waste 10 minutes of their life every time they run the script just to realize "oh yes I need to update things" or maybe not. I sure don't.

real    9m 46.78s
user    7m 27.79s
sys     1m 2.00s

Now some of this pain is self inflicted. I insist on using a weak armv7 system with a minimal amount of ram. and this script, as terrible as it was, ran OKish enough on x86_64 hardware. For a long while I was using my Chuwi netbook with its 4c Celeron J series processor and it couldn't have cared less about this. Couple minutes in and out at most. But when the code is just this poorly written, and the language chosen to work in is optimized for lower developer complexity and not function, the results can be terrible. There's no reason my Droid can't handle this type of workload just as quickly, the limiting factor is that the code needs to be... better.

PEBKAC, enough said.

Let's make it awk-ward

Now my gut reaction here was to rewrite the entire tool into something that compiles real small and runs real fast. Nim is a GREAT candidate for this! Fennel would be another excellent choice if compiled statically like tkts is. Or maybe even going so far as to pickup a new language, Janet comes to mind!

But, alas, as the 9 blog posts I managed to write in 2024 indicated, I didn't really have time for that. Learning a new language is high effort and requires a lot of time. Maintained.sh is a massive glob of things wrapped around recutils, which is another bottleneck I need to address, and that would mean migrating to sqlite3 and making helper functions for manual data correction. MEH. None of these felt like they would fit. So instead, I took a quick detour into my friend AWK! It's a great language, that we all probably just think of as a tool we call to strip out text. You know ps aux | awk '{print $2}', that jazz. Well awk my friends is so much more than that.

Awk will happily chew away at several different regexp patterns in a single go, parsing the contents of XML tags and then consuming what is inside them to ultimately attempt to find several matches. All of this effort is necessary because of semver, and its inconsistent application. See semver is a flawed system. It isn't enough to just grep [0-9] and hope you get things, semver isn't an int, nor a float, it's a string! So we get to split the string into bits and compare each int. Easy enough in theory, lots and lots of libraries out there to support it, we'll make due. But what if people treat the semver like the string it is? Software development is a messy affair and people often litter their release tags with nuggets of information, like alpha, beta, RC[0-9], a/b[0-9]+, or sometimes literally emojis. These weird edge cases can cause frustrations when developing automated package maintenance tooling.

It is extremely important to know that a tag is a release candidate and not the actual release version, and denoting that by tacking RC1 or similar to the semver is very common. But there is no standard, perhaps 3.0.0RC1 should be 3.0.0-RC1 or 3.0.0 RC1 or maybe 3.0.0b1? These patterns are all easy enough to parse, but require logic to handle each variant. But more and more I'm seeing projects on Github and Gitlab insert meaningless emojis and other nonsense into their project's tags. And this isn't even to say anything of people who don't use a version system at all and just expect their project to be built from HEAD. It's a ridiculous state of affairs we package maintainers must deal with. But ultimately, if you're the one writing the software, and you're providing it open source and libre, then I will work around those weird edge cases to make sure I can deliver that software to people who use Alpine. Keep rocking your emoji's Mealie devs, you make a wicked cool application.

Anyways, this is the revamp awk-ward version that attempts to compensate for all of these weird edge cases. It behaves exceptionally well for repos that just follow semver as expected, and tries to massage other common patterns as best as it can. I'm positive it will be extended throughout its lifetime, I already found a couple of edge cases with this new parser.

check_feed() {
	if [ ! -z $1 ]; then
		read -r -d '' parser << 'EOF'
BEGIN {
	RS="[<>]"  # Split on XML tags
	in_entry = 0
	in_title = 0
	found_first = 0
	OFS="\t"   # Output field separator
}

/^entry/ || /^item/ { in_entry = 1 }
/^\/entry/ || /^\/item/ { in_entry = 0 }
/^title/ { in_title = 1; next }
/^\/title/ { in_title = 0; next }

in_entry && in_title && !found_first {
	gsub(/^[ \t]+|[ \t]+$/, "")
	if (length($0) > 0) {
		title = $0
		version = ""
		type = ""
		
		# py3-bayesian-optimizations uses a 3.0.0b1 variant, this needs checking.
		# nyxt uses pre-release in some of their tags.

		# Pattern 1: Version with space + Beta/Alpha/RC
		if (match(title, /[vV]?[0-9][0-9\.]+[0-9]+[ \t]+(Beta|Alpha|RC[0-9]*|beta|alpha|rc[0-9]*)/)) {
			full_match = substr(title, RSTART, RLENGTH)
			split(full_match, parts, /[ \t]+/)
			version = parts[1]
			type = parts[2]
		}
		# Pattern 2: Version with hyphen + qualifier
		else if (match(title, /[vV]?[0-9][0-9\.]+[0-9]+-(Beta|Alpha|RC[0-9]*|beta|alpha|rc[0-9]*)/)) {
			full_match = substr(title, RSTART, RLENGTH)
			split(full_match, parts, /-/)
			version = parts[1]
			type = parts[2]
		}
		# Pattern 3: Just version number
		else if (match(title, /[vV]?[0-9][0-9\.]+[0-9]+/)) {
			version = substr(title, RSTART, RLENGTH)
		}
		
		# Clean up version and type if found
		if (version) {
			# Remove leading v/V if present
			sub(/^[vV]/, "", version)
			if (type) {
				# Convert type to lowercase for consistency
				type = tolower(type)
				print version, type
			} else {
				print version
			}
			found_first = 1
			exit 0
		}
	}
}
EOF
	
		# Set strict error handling
		set -eu
		
		# Configure curl to be lightweight and timeout quickly
		CURL_OPTS="-s --max-time 10 --compressed --no-progress-meter"
		local feed_url="$1"
		version=$(curl $CURL_OPTS "$feed_url" | awk "$parser")
		
		case "$version" in
			*$'\t'*)
				ver="${version%%$'\t'*}"
				pr="${version#*$'\t'}"
				echo "$ver [$pr]"
				;;
			*)
				ver="$version"
				echo "$ver"
				;;
		esac
	else
		echo "000"
	fi
}

The major optimization here is that we aren't spawning a python sub-process for every single freaking check! To nobodies surprise that works so amazingly better. I could probably have gotten similar "better" results by properly using Python here, I'll admit that fully. But since this is a very personal ad hoc maintenance script, awk was the right choice for a night of hacking.

Quantifying the effort

So we made it a lot more complicated than a couple lines of python, was it worth it? On that say ~170 RSS feeds we're now looking at a much saner 3m load time. And this is still a decently inefficient system built on top of a recfile DB. We could optimize even further by migrating to sqlite3, or batching (or even better paralleling) our requests.

real    3m 3.28s
user    1m 45.47s
sys     0m 21.26s

So yeah, revisit those temporary solutions from time to time, they can really suck the life out of otherwise wonderful tooling. And I cannot believe I spent 2 years letting this thing churn for 10m each time it ran. yikes!


Page 3

I'm apparently really bad at doing anything in a timely manner these days. There's never enough hours in the day to accomplish the things that I'd like to do, such is life sometimes. But that doesn't mean I shouldn't try! How thematic for this year's OCC, no? Just because something is obsolete, doesn't mean it's useless, or unworthy of effort. Sure it might be harder, the obstacles greater, but the victory is all the sweeter once they're overcome! In essence, every year us OCCers dig deep into our junk drawers and take the "obsolete" and face a difficult world that has grown away from its capabilities, and we charge forth despite the odds that we may never return!

Okay I jest, we sometimes return to newer more powerful machines at the end of the challenge, like I'm typing this post from my trust Droid4 which has a 2c armv7 cpu, and a whole gig of ram. Literally twice the limits of the challenge! And I'm still being cheeky, because there's a whole swath of modern technology that I use day to day to accomplish daily tasks, and enjoy my hobbies, but these old and slow ones are near and dear to my heart. And worth every bit of the effort we put into them to keep them running.

Anyways, OCC for me this year was a little unrefined, or rather, it felt like I was just doing what I would normally have done anyways. I focused on doing real world practical tasks that I would have tried to work on from a more powerful system.

Okay I jest, we sometimes return to newer more powerful machines at the end of the challenge, like I'm typing this post from my trusty Droid4 which has a 2c armv7 cpu, and a whole gig of ram. Literally twice the limits of the challenge! And I'm still being cheeky, because there's a whole swath of modern technology that I use day to day to accomplish daily tasks, and enjoy my hobbies, but these old and slow ones are near and dear to my heart. And worth every bit of the effort we put into them to keep them running.

Anyways, OCC for me this year was a little unrefined, or rather, it felt like I was just doing what I would normally have done anyways. I focused on doing real world practical tasks that I would have tried to work on from a more powerful system. And it more or less just worked. That's not a surprise to me at this point, but I think it should cause everyone a moment of pause, at least when we're considering the tools we use for the task at hand.

For me, most of what I use a computer for doesn't require a top notch spec. I need no GPUs, nor an i9 11th gen CPU. I don't need a massive high res screen, though my eyes thank me when I do use one. But those requirements, or lack thereof, are a direct correlation of the types of things I do with my computer. I don't really game, and if I do it's with my son on a dedicated system. I develop software for fun, and help maintain a Linux distro. Most of what is needed to do both of these things grew out of the same world that brought me these small x86 netbooks that I love so much, so they just fit right in.

But those systems do not run my blog, that's a nice little VPS on Digital Ocean. And I have a homelab filled with salvaged hardware, most of it far more powerful than the computers I use day to day. And these systems all have their purposes as well, little incus clusters, or custom monitoring servers. I kind of need it all, because honestly all of this old tech takes time and effort to keep together, and it's so much easier to bootstrap from more powerful systems. I'm very privileged in that manner.

Like my SBCL builds, using ECL took ~8hrs a pop on my x86, during with my system wasn't really usable due to resource load. The same build on a modern x64 system is only a couple of hours. Between the time I started this post, and its publishing now, my droid messed up again. We pushed gcc 14 in Alpine, rebuilt our firmware, and well my linux kernel for my droid was built against an older gcc and now I can't properly load those firmware blobs. They just fail to load, so no wifi for me, and no "posting" this blog post from my droid. Nope I need my Chuwi netbook and its built in micro sd card slot to pull this file off, just to git commit and push it up to the server.

And to fix my kernel issue? I need an arm SBC of some sort, an RPI or Pineboard comes to mind. But AWS sells their t4g.medium aarch64 servers for $0.81/hr, and a kernel build on those specs only takes 2 hours, so I might as well use the more powerful resource for the job that needs it.

All of this is to say that I am an incredibly stubborn person at times, and love the weird little niche I put myself into with my quirky old systems. And for that weird hobby I am willing to go well out of my way to do what it takes to ensure I have the tools I want and need available to me so I can enjoy them. And because of that effort, I can easily tell you that you can be productive on last years junk systems.

This probably doesn't jive with everyone, and for things like photo editing I do actually need a more powerful system, so I sit firmly in the middle with some sort of quasi physical maximalism enabling my digital minimalism. It's weird, but it's this and cameras folks, if I downsize it'll be because I finally bought a boat and have that burning a hole in my wallet instead!


Page 4

So earlier this month was the 2nd Rebble Hackathon, a little week long event where a bunch of us got together to breathe life into the "dead" Pebble smartwatch ecosystem. The flood of applications, watch faces, development tools, art, and generally fervent community development and engagement really drives home the "dead" aspect of this community. Over the course of a week we built a bunch of new applications, even more new watch faces, and a whole slew of tools and art! Maybe we all just felt extra encouraged by the fact that Eric Migicovsky (founder of Pebble) has decided to bring Pebble back! After 10 long years we have the PebbleOS firmware open sourced thanks to the work of former Pebblers and Google's kindness. And with that gift we have a store for that new Pebble!. This is just too freaking exciting, and that comes from someone who has clung desperately to his Pebble Time since he got it. And actively point to it as being one of the first areas in which I dabbled with programming productively.

But what did you do for the Hackathon?

Well, I personally painstakingly wrote ~250 lines of C and desperately tried to turn myself into a graphic designer before Mio kindly volunteered to help. And the result of a weeks worth of 2am hacking sessions and me frantically trying to remember how to do any of this stuff resulted in Pinout!. This legitimately only looks as good as it does thanks to Mio's contributions, all of the art is of their creation, and I am beyond thrilled with the results!

The Pinout Banner!

Application Menu RJ45A Pinout RJ45B Pinout RJ45A Crossover Pinout RJ45B Crossover Pinout

Seriously, you can take a look at my first three attempts at this application to get an idea of just how bad this would have looked without Mio's help. My talents firmly lie inside of the realms of operating cameras when it comes to art, and my brain has an immense amount of patience for wrangling painful things like C, but seems to revolt when faced with creating something that isn't code.

This was my first crack at Pinout, I threw this together while I was setting up a new office network. I couldn't remember the Pinout for RJ45B in the moment, and had to tip cables 30ft in the ceiling so that I could affix APs to an I-Beam. The safest reference I could think of was on my Pebble.

The very first version of Pinout, a vector art thing thrown together years ago. It's just colored lines barely legible.

But of course that poorly rendered, totally in accurate version wasn't acceptable and I eventually stopped using it. I thought that perhaps I could be lazy since I'm not particularly artistic and I could just use a cribbed image I found online. I downscaled and dithered a pinout and threw it on the pebble! It worked! But it wasn't exactly usable.

Validating my image rendering code using down-scaled dithered images, just image a very pixelated cable, it's terrible.

Now Mio has recommended Inkscape to me in the past for creating SVGs, and I gave it my best effort, but after struggling for a few hours to come up with this bland, color in accurate, incorrectly scaled image. And realizing after loading it that I can created an icon and not a serviceable Pinout image, I was at a bit of an impasse and switched over to working on the actual code. Maybe I could figure out a dithering solution or vector art using PDC. I wasn't really sure, but I didn't think continuing with Inkscape was a conducive use of my time.

And this was my attempt at a cable diagram, it lacks the finesse of Mio's work

Admittedly, the last attempt was sort of on the right track, I think if I had had a lot more time and was just adding new cabling diagrams to the app that I could have gotten something acceptable together. Mio was kind enough to provide SVGs for the cable diagrams in Pinout, so I have a legitimately excellent starting point for the next time I try this, and I will definitely be releasing another version with more diagrams in it soon! I personally want to add RJ45 diagrams for rollover cables like you'd use for Cisco console cables, and one way passive cables. Those also open the doors to potentially adding RJ11 diagrams or maybe even serial pinouts! I think though that RJ45 is my primary use case and I just want Pinout to be as useful for as many Pebblers as is possible. It would please me to no end to know other people are using it!

So that C..ommon Lisp code?

Okay lets be real, I procrastinated until the last couple of days on this. I had ideas! But I spent the first 4 days writing Common Lisp libraries and trying to teach myself Inkscape. I think the C scared me, and my reaction when faced with "learn C" has always been "okay, I'll learn C...ommon Lisp". It's all of the ((())), the allure of a good list is just too much. Now there's nothing wrong with this process, and initially I thought it was necessary! The old Pebble tools are written in Python2, and while there has been some work done to update them to Python3 that's not really my style (though I will totally use them once the new Pebble's are release, package them for Alpine even!). There's some awesome work being done to re-implement the entire Pebble runtime in Rust so that it can run on the Playdate, which is wicked cool and the developer, Heiko Behrens, even released a pre-built binary of his PDC tool for the hackathon, so I knew from the get go that long term there's some solid work being done to re-implement these old tools. But if you know me, you know that I don't particularly care for Rust. Nothing wrong with memory safety, but needing hundreds of mbs of libraries to compile anything is ridiculous.

Wait, back up, PDC? Oh yeah, this is the fun stuff, if you thought Pinout was cool then lets take a detour into obscure binary formats, because that is precisely what PDC is! So the Pebble smartwatch can render icons and images via vector graphics. Animations occur throughout the Pebble smartwatch, like if you delete an item from your timeline you'll see a skull that expands and bursts. If you have an alarm you get a little clock that bounces up and down. Dismissing something results in a swoosh message disappearing, or clearing your notification runs them through a little animated shredder. This functionality is wicked cool! And unfortunately was largely lost due to the tooling being just old, the PDC format being undocumented publicly, and there just not being enough motivation to revive it.

Now for Pinout specifically I initially thought I'd do an application like the cards demo application where users could flip through cable diagrams instead of implementing a menu. This addressed two things for me 1) menu logic and 2) I thought I could do the diagrams as PDC sequences so that the pins would pop up one after another on the screen.

Ultimately despite documenting the PDC binary format thoroughly and even developing a parser for existent PDC files and the Pebble color space, I ruled that this was wildly out of scope for the limited time I had to make the app, and after fixating on it for 4 days straight only to be faced with the fact that I had nothing to show for the hackathon I made a hard pivot back into C land. None of this was time wasted in my mind, this is still a viable 2.0 option for Pinout that would be wicked cool to implement! And I feel I have enough of a grasp on the PDC binary format to potentially make an SVG -> PDC conversion tool, and eventually that may lead to an animation sequencing tool! I don't expect it to be officially adopted by Rebble or the Pebble folks frankly, but I like building my own weird tools so I don't care about all that, I'm hear to learn.

And I think this snippet from cl-pdc really emphasizes what all of that learning was about. Being able to describe what a PDC binary comprises of is meaningful progress in being able to translate it to either a different format (png, svg) or stitch them together into sequenced animations!

* (pdc:desc (pdc:parse "../ref/Pebble_50x50_Heavy_snow.pdc"))
PDC Image (v1): 50x50 with 14 commands
  1. Path: [fill color:255; stroke color:192; stroke width:2] open [(9, 34) (9, 30) ]
  2. Path: [fill color:255; stroke color:192; stroke width:2] open [(7, 32) (11, 32) ]
  3. Path: [fill color:255; stroke color:192; stroke width:2] open [(26, 32) (30, 32) ]
  4. Path: [fill color:255; stroke color:192; stroke width:2] open [(28, 34) (28, 30) ]
  5. Path: [fill color:255; stroke color:192; stroke width:2] open [(26, 45) (30, 45) ]
  6. Path: [fill color:255; stroke color:192; stroke width:2] open [(28, 47) (28, 43) ]
  7. Path: [fill color:255; stroke color:192; stroke width:2] open [(17, 38) (21, 38) ]
  8. Path: [fill color:255; stroke color:192; stroke width:2] open [(19, 40) (19, 36) ]
  9. Path: [fill color:255; stroke color:192; stroke width:2] open [(7, 45) (11, 45) ]
  10. Path: [fill color:255; stroke color:192; stroke width:2] open [(9, 47) (9, 43) ]
  11. Path: [fill color:255; stroke color:192; stroke width:2] open [(35, 38) (39, 38) ]
  12. Path: [fill color:255; stroke color:192; stroke width:2] open [(37, 40) (37, 36) ]
  13. Path: [fill color:255; stroke color:192; stroke width:3] closed [(42, 25) (46, 21) (46, 16) (42, 12) (31, 12) (27, 8) (16, 8) (11, 13) (7, 13) (3, 17) (3, 21) (7, 25) ]
  14. Path: [fill color:0; stroke color:192; stroke width:2] open [(12, 14) (18, 14) (21, 17) ]

And it's even cooler to see the same PDC file consumed into a struct that we could in theory pass around to various transformation functions. So close!!

* (pdc:parse "../ref/Pebble_50x50_Heavy_snow.pdc")
#S(PDC::PDC-IMAGE
   :VERSION 1
   :WIDTH 50
   :HEIGHT 50
   :COMMANDS (#S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 9 :Y 34) #S(PDC::POINT :X 9 :Y 30))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 7 :Y 32) #S(PDC::POINT :X 11 :Y 32))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 26 :Y 32)
                          #S(PDC::POINT :X 30 :Y 32))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 28 :Y 34)
                          #S(PDC::POINT :X 28 :Y 30))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 26 :Y 45)
                          #S(PDC::POINT :X 30 :Y 45))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 28 :Y 47)
                          #S(PDC::POINT :X 28 :Y 43))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 17 :Y 38)
                          #S(PDC::POINT :X 21 :Y 38))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 19 :Y 40)
                          #S(PDC::POINT :X 19 :Y 36))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 7 :Y 45) #S(PDC::POINT :X 11 :Y 45))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 9 :Y 47) #S(PDC::POINT :X 9 :Y 43))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 35 :Y 38)
                          #S(PDC::POINT :X 39 :Y 38))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 37 :Y 40)
                          #S(PDC::POINT :X 37 :Y 36))
                 :OPEN-PATH T
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 3
                 :FILL-COLOR 255
                 :POINTS (#S(PDC::POINT :X 42 :Y 25) #S(PDC::POINT :X 46 :Y 21)
                          #S(PDC::POINT :X 46 :Y 16) #S(PDC::POINT :X 42 :Y 12)
                          #S(PDC::POINT :X 31 :Y 12) #S(PDC::POINT :X 27 :Y 8)
                          #S(PDC::POINT :X 16 :Y 8) #S(PDC::POINT :X 11 :Y 13)
                          #S(PDC::POINT :X 7 :Y 13) #S(PDC::POINT :X 3 :Y 17)
                          #S(PDC::POINT :X 3 :Y 21) #S(PDC::POINT :X 7 :Y 25))
                 :OPEN-PATH NIL
                 :RADIUS NIL)
              #S(PDC::COMMAND
                 :TYPE 1
                 :STROKE-COLOR 192
                 :STROKE-WIDTH 2
                 :FILL-COLOR 0
                 :POINTS (#S(PDC::POINT :X 12 :Y 14) #S(PDC::POINT :X 18 :Y 14)
                          #S(PDC::POINT :X 21 :Y 17))
                 :OPEN-PATH T
                 :RADIUS NIL)))

So that C code?

So now that we've demoed the thing I think I'm good at, lets look at what I think I'm not that good at. Pinout is an amalgamation of several example applications, and some code stolen from the other two watch faces I had previously published. That's a bit of a recurring theme for me, and probably most people. If I figure out a way to do something I re-implement it elsewhere because that just makes sense. Maybe these aren't the best ways to do any of this, but that's I think OK.

Pinout has three key components, a menu that allows you to select a diagram which then displays an image of the selected diagram, a battery widget, and a clock widget. Of the three the only one I had previously implemented was the clock widget.

Time Handling

This code was lifted straight from my emacs watch face. And it's really simplistic, I think it's in fact from one of the original watch face tutorials that Pebble provided. The only thing unique about it is that there's a check to ensure that a text layer (s_time_layer) exists before attempting to render to the screen. Since Pinout transitions between several different screens we need to make sure we don't attempt to render either the battery or time widget while transitioning.

//Update time handler
static void update_time() {
  time_t temp = time(NULL);
  struct tm *tick_time = localtime(&temp);

  static char s_buffer[8];
  // convert time to string, and update text
  strftime(s_buffer, sizeof(s_buffer), clock_is_24h_style() ?
	   "%H:%M" : "%I:%M", tick_time);

  // only updat the text if the layer exists, which won't happen until the menu item is selected
  if (s_time_layer) {
    text_layer_set_text(s_time_layer, s_buffer);
  }
}

//Latch tick to update function
static void tick_handler(struct tm *tick_time, TimeUnits units_changed) {
  update_time();
}

Inside of our image layer renderer we create a text layer in which we insert the current time. Note that we call update_time directly so that when the image is rendered we immediately have the current time rendered at the top of the application.

// Image window callbacks
static void image_window_load(Window *window) {
  Layer *window_layer = window_get_root_layer(window);
  GRect bounds = layer_get_bounds(window_layer);

  //Image handling code removed for brevity.

  //Allocate Time Layer
  s_time_layer = text_layer_create(GRect(110, 0, 30, 20));
  text_layer_set_text_color(s_time_layer, GColorBlack);
  text_layer_set_background_color(s_time_layer, GColorLightGray);
  layer_add_child(window_layer, text_layer_get_layer(s_time_layer));

  //Update handler
  update_time();
}

While the application is running we subscribe to the tick timer service, so that each time the time ticks up we get a call back to update the time in the application.

static void init(void) {
  //Subscribe to timer/battery tick
  tick_timer_service_subscribe(MINUTE_UNIT, tick_handler);

All of this works because once the application is launched its primary function is to initialize and then query those tick handlers then wait for user input.

int main(void) {
  init();
  app_event_loop();
  deinit();
}

Battery Handling

You're probably not surprised that the battery widget works almost exactly the same as the time widget. We define an update function that checks the current state, and only renders if our containing layer exists.

// Current battery level
static int s_battery_level;

// record battery level on state change
static void battery_callback(BatteryChargeState state) {
  s_battery_level = state.charge_percent;
  
  static char s_buffer[8];
  // convert battery state to string, and update text
  snprintf(s_buffer, sizeof(s_buffer), "%d%%", s_battery_level);

  // only update the text if the layer exists, which won't happen until the menu item is selected
  if (s_battery_layer) {
    text_layer_set_text(s_battery_layer, s_buffer);

    layer_mark_dirty(text_layer_get_layer(s_battery_layer));
  }
}

And we described another text layer inside of our image renderer with an update callback.

static void image_window_load(Window *window) {
  Layer *window_layer = window_get_root_layer(window);
  GRect bounds = layer_get_bounds(window_layer);

  //Image handling code removed for brevity.

  //Time handling code removed for brevity.

  //Allocate Battery Layer
  s_battery_layer = text_layer_create(GRect(5, 0, 30, 20));
  text_layer_set_text_color(s_battery_layer, GColorBlack);
  text_layer_set_background_color(s_battery_layer, GColorLightGray);
  layer_add_child(window_layer, text_layer_get_layer(s_battery_layer));

  //Update battery handler
  battery_callback(battery_state_service_peek());
}

And the subscribe to the tick service in init! Almost exactly the same!

static void init(void) {
  //Subscribe to timer/battery tick
  tick_timer_service_subscribe(MINUTE_UNIT, tick_handler);
  battery_state_service_subscribe(battery_callback);

  // Get initial battery state
  battery_callback(battery_state_service_peek());

It's really nice to see consistency like this. The Pebble C SDK is really well designed and documented with tons of examples from when Pebble was still in business. They really were something super unique, and it still shows today.

Menus & Image Rendering

Now image rendering and menu handling was new to me, but I was able to find this tutorial that helped immensely. Once I had my arms around the idea of how I thought the application might work it ended up being incredibly simple.

We define our menu as a simple enum and define the total length of the menu statically. Then we setup an array of IDs that are used to reference the PNG images for each diagram.

#define NUM_MENU_ITEMS 4

typedef enum {
  MENU_ITEM_RJ45A,
  MENU_ITEM_RJ45B,
  MENU_ITEM_RJ45A_CROSSOVER,
  MENU_ITEM_RJ45B_CROSSOVER
} MenuItemIndex;

// Images for each pinout
static GBitmap *s_pinout_images[NUM_MENU_ITEMS];
static uint32_t s_resource_ids[NUM_MENU_ITEMS] = {
  RESOURCE_ID_RJ45A,
  RESOURCE_ID_RJ45B,
  RESOURCE_ID_RJ45A_CROSSOVER,
  RESOURCE_ID_RJ45B_CROSSOVER
};

We keep track of where we are in the application by re-defining the currently displayed image upon selection. And then the menu rendering is as simple as iterating over the total length of of menu, and carving out a section of the screen for as many entries as will fit.

// Currently displayed image
static GBitmap *s_current_image;

// Menu callbacks
static uint16_t menu_get_num_sections_callback(MenuLayer *menu_layer, void *data) {
  return 1; // We're only using a single menu layer, but maybe down the line the image will be a sub menu to textual information about the diagram.
}

// Return the number of menu rows at point
static uint16_t menu_get_num_rows_callback(MenuLayer *menu_layer, uint16_t section_index, void *data) {
  return NUM_MENU_ITEMS;
}

// Get the height of the menu header from section in menu
static int16_t menu_get_header_height_callback(MenuLayer *menu_layer, uint16_t section_index, void *data) {
  return MENU_CELL_BASIC_HEADER_HEIGHT;
}

When we click on a menu item using the middle select button we trigger the image layer rendering function described in the last two sections, but this time we have a complete picture of what happens. We take the image correlated with the menu entry and render it to the screen as a bitmap layer, then overlay our text layers on top of the bitmap!

static void image_window_load(Window *window) {
  Layer *window_layer = window_get_root_layer(window);
  GRect bounds = layer_get_bounds(window_layer);
  
  // Create the bitmap layer for displaying the image
  s_image_layer = bitmap_layer_create(bounds);
  bitmap_layer_set_compositing_mode(s_image_layer, GCompOpAssign);
  bitmap_layer_set_bitmap(s_image_layer, s_current_image);
  bitmap_layer_set_alignment(s_image_layer, GAlignCenter);
  
  layer_add_child(window_layer, bitmap_layer_get_layer(s_image_layer));

  //Allocate Time Layer
  s_time_layer = text_layer_create(GRect(110, 0, 30, 20));
  text_layer_set_text_color(s_time_layer, GColorBlack);
  text_layer_set_background_color(s_time_layer, GColorLightGray);
  layer_add_child(window_layer, text_layer_get_layer(s_time_layer));

  //Allocate Battery Layer
  s_battery_layer = text_layer_create(GRect(5, 0, 30, 20));
  text_layer_set_text_color(s_battery_layer, GColorBlack);
  text_layer_set_background_color(s_battery_layer, GColorLightGray);
  layer_add_child(window_layer, text_layer_get_layer(s_battery_layer));

  //Update time handler
  update_time();
  battery_callback(battery_state_service_peek());
}

Incredibly simple right? This is the beauty of the pebble ecosystem, you can make a very nice and polished looking application incredibly simply.

The full source for Pinout can be found here and you can find it on the Rebble store. I sort of just glossed over the initialization functionality, it's pretty standard stuff. All just sequenced functions to allocate memory for our various layers, and then destroy them when done. Typically C memory allocation stuff in a nice wrapper the SDK provides.

So what does this look like in real life?

Yeah I knew you'd want to know that, I can't just build an app for a 10 year old smart watch and swear up and down I actively use it without proving that point. So just for you dear reader, here a promotional photo of my hairy wrist in the hot Florida sun showing off the excellent functionality of Pinout! These were taken while tipping cables for a 60Ghz point to point antenna installation, gotta make sure those cables are wired up correctly so I don't have to make a second trip out!

Pinout on a Pebble Time smartwatch, showing an RJ45B Pinout in the Florida sun!

If you happen to use Pinout I would love a picture of you using it and will gleefully add it above! Shoot me an email at durrendal (at) lambdacreate.com if you want to show your support!

Last but not least

It is incredibly important for me to express my sincere and utmost thanks and gratitude to Mio for helping with the graphic design work for Pinout. If you got this far you have a great sense of what Pinout would probably of looked like without their help. I bet it would have worked, but it would have been uglier than my C code.

Thanks again Mio, I couldn't have done it without you!!


Page 5

I just released version 3.0.2 of tkts recently, which is frankly a really cool feeling. I haven't really touched this code based in a couple of years despite using the tool almost daily to track and document various little issues and tasks that come up across the breathe of projects I deal with. What's even neater to me is that tkts will be five years old at the end of October, just in time for the next Fennelconf! It's kind of absurd how quickly time flies, and it's really surprising to me that I can revisit one of my code bases year after year like this and continue to improve upon it.

Which is actually exactly what this post is about, little things I can improve upon. Did you know that in the almost 5 years I've been working on tkts I've never once written tests for it? Not a single one. Like an absolute mad man I just hammered away at the code and tested it on the little blob of sqlite that I had managed to accumulate over the years. Worst case scenario if I broke something I'd pull a backup of the data from Restic and move on with my life a little grumpier. You know what I also did for the last two years? Engineer half baked shell scripts to work around weird pain points that tkts couldn't handle but that bothered me greatly. Like for example, I had this fever dream to have tkts generate invoices using groff templates that I could convert into PDFs. That totally worked, if you billed a single ticket inside tickets as an item. Try and break that down over a range of dates and it wouldn't work at all. My solution was just to hammer in a fix with a random shell script that did this instead.

Primarily because extending tkts to support this functionality meant doing deep focused work and the process of implementing this without a test harness and after being away from the code base for so long was just too daunting.

I should probably pause here, if you're not sure what tkts is, you might want to read this blog post. But if you want the cliff notes instead, tkts is a ticket system written for the sole purpose of being. Yeah I'm weird, I write business applications for fun, I know. But it's a seriously amazing way to learn!

Anyways back to this. Getting past the daunting reality of the fact that tkts lacked features I wanted and needed for us, had lots of little broken corner cases that I just lived with because this is my own problem I created with my own hands and likely nobody else is dealing with, ergo the problem is unimportant. All of that really just took swallowing the frog, and a month+ of nights working through the code base to iteratively extend, manually test, validate, repeat. All of which could have been made 1000% easier if I had just written some busted tests to begin with. Spoiler alert, I waited until the VERY end when I went to update the package for Alpine to actually implement these tests at all.

What is Busted?

Well, Busted is a really easy to use testing framework for Lua. Obviously since we're talking about getting past daunting realities and broken corner cases. Realistically the only thing "busted" could be is something to ensure your software isn't in fact busted. And after pouring so many hours into tkts, and finally getting it merged into Alpine's community repos I for one do not want it to be busted!

I won't be labor the point here, but the idea is that I was manually testing that things "worked" and all I ever really accomplished by doing this was to ensure that it "worked" when I personally ran the program. That means the development of this tool was idiosyncratic to the configuration of my computer, and not even really my "computers" but my droid where I do most of my work. And any of the data I tested on was a highly massage variant of real data, that sort of worked around known issues or hand waved things. The point is, you need to test your software, and testing frameworks like busted give you a systematic way to do this. It's a maturity thing, and tkts is mature enough for this.

So what does a test look like for tkts specifically? Well right now just a black box test. I build the software, then I run it through the ringer in a clean environment. This way I know that things like database initialization works correctly, db migrations apply, raw inserts work. To begin with I want to confirm "does the user experience of tkts hold up?"

local lfs = require "lfs"
local posix = require "posix"

-- Helper function to handle cleanup of directories in the test env
function removedir(dir)
   for file in lfs.dir(dir) do
	  local file_path = dir .. '/' .. file
	  if file ~= "." and file ~= ".." then
		 if lfs.attributes(file_path, 'mode') == 'file' then
			os.remove(file_path)
		 elseif lfs.attributes(file_path, 'mode') == 'directory' then
			removedir(file_path)
		 end
	  end
   end
   lfs.rmdir(dir)
end
   
describe("tkts CLI", function()
			--We define where our test env data will be created
			local test_home = "/tmp/tkts_test"
			local tkts_cmd = "./src/./tkts.fnl"

			-- Then we describe our tests
			describe("init operations", function()
						it("should remove existing config directory & recreate it", function()
							  -- Remove test directory if it exists
							  if lfs.attributes(test_home) then
								 removedir(test_home)
							  end
							  
							  -- Create fresh test directory
							  lfs.mkdir("/tmp/tkts_test")
							  lfs.mkdir("/tmp/tkts_test/.config")
							  posix.setenv("HOME", test_home)

							  -- Next we run tkts, read the output it creates, and compare it to what we expect it to display. Like a db migration, or ticket display.
							  local init_output = io.popen(tkts_cmd):read("*a")
							  assert.matches("DB{Migrating database from version (%d+) to (%d+)}", init_output)
							  assert.matches("Open: 0 | Closed: 0", init_output)

							  -- Then we check to make sure that any files tkts is supposed to create are created
							  assert.is_true(lfs.attributes(test_home .. "/.config/tkts/tkts.conf", 'mode') == "file")
							  assert.is_true(lfs.attributes(test_home .. "/.config/tkts/tkts.db", 'mode') == "file")
							  assert.is_true(lfs.attributes(test_home .. "/.config/tkts/invoice.tf", 'mode') == "file")
						end)

			-- And clean up our environment when we're done.
			describe("deinit operations", function()
						it("should remove existing config directory", function()
							  -- Remove test directory if it exists
							  if lfs.attributes(test_home) then
								 removedir(test_home)
							  end
						end)
			end)
	end)
end)

With this framework testing becomes simple questions that we ask in batches. When we're working with a "ticket" inside of tkts we should know that running tkts create -t 'title' -d 'description' creates a ticket. I know this, I wrote it. But whoever packages or uses the software on their system should also know that they need to ask this question and how to verify it. That's sort of what blackbox testing is all about!

Where to next?

Of course, black box testing isn't the best way to test things. It doesn't ensure that the interplay between functions in your software are correct. Something could be materially broken in tkts itself that just isn't exposed to the end user during normal expected operation. Like, what if I run tkts create --this-software-sucks that's a junk flag, how does tkts respond to it? Or maybe less dramatic, what if I pass tkts help -s a_section_that_doesnt_exist does it gracefully handle that?

No, it doesn't. And I didn't catch that before I started writing this and really thinking about how I test my software and why. That's all still just more blackbox testing, but it would be more helpful to source the bits and pieces that comprise tkts and test them actively as I develop. Does this function do what I think it does? If I feed it bad data does it react in an expected and deterministic fashion? These are questions I can answer with busted, but cannot ask of tkts in its current state.

That's because tkts is a 2400 line monolith. It's a bad design choice I made 5 years ago and never recovered from. Every time I've revisited the tkts code base it has been to add features I never got right, or suddenly needed unexpectedly. The scope creep never allowed for a refactor and so I have more and more technical debt. It's fun! Can you imagine how badly this would suck if it wasn't a personal project and did something actually important for some business? Goodness, no thank you.

So the next steps are to address that. I plan to break up the tkts code base into modules, build a full test framework on top of that, and continue to expand the feature set. I really like how much learning tkts has enabled for me as a developer. And I want to really encourage myself to continue to learn and grow from it.


Page 6

Is this blog becoming a Ruby blog? Of all the languages I know and use on a regular basis, I seem to be reaching more and more for Ruby on a regular basis. Of course, we're right smack dab in the midst of the Old Computer Challenge, and I've complained in the past that Ruby really is not super optimal for the types of computer I use during that event, or really on a regular basis. But despite that fact this little language has been my stalwart friend at work.

Just in the past few months I've written little tools to automate upgrading my rc-service scripts for some of my homelab services, to bulk convert data, and at work I've recently finished a Cradlepoint Netcloud & Zabbix API integration that brings a great level of detail into our monitoring stack. And in each of those situation Ruby has just been a breeze to work with.

I have an idea, I write what is essentially a guess at what I expect the Ruby syntax to be, et voila with minimal debugging I have a program. Python feels a lot like this for me too, sort of just guessing and boom something happens! Of course, Ruby doesn't fall over because I accidentally pressed tab instead of space. And the packaging ecosystem isn't abhorrent trash.

Oh wait, this is a post about packaging, not complaining about python.

Why would we package a Gem?

I think, for a lot of developers, the idea of packaging is somewhat vague and contrary to how they think about their programs. They have a very specific version of a library they want to use, and test against. And they vendor that directly into their code base, using something like git submodules or a Gemfile. And then they develop against this very specific version.

I personally like this workflow, it makes a lot of sense. You work with a known thing that doesn't just change out from under you, so you can focus very specifically on what you want to work on and not swatting the bugs brought about by someone else changing something upstream. But what if you need to manage a whole bunch of these sorts of programs, and they all depend on roughly the same thing?

A lot of the time most people just develop against whatever is the latest revision, or their code base is not so fragile that changing the version breaks just because a library is swapped out. Sometimes they are and you're stuck patching away the issues to bring modernity to the codebase, or you just end up with a vendored lib.

For me, I think about it from the perspective of the distribution. If I package something, then there is less likelihood that the corpus of tools that depend on a specific library remain vulnerable to CVEs found in a specific version of that library. If we're using system packages, and patching our codebase to work with up to date libraries, then I don't have to worry about version 2.3.1 being vulnerable to an RCE in a specific tool where my Gemfile tells me I absolutely must use that version. I just need to apk upgrade and move on.

Further if we consider the use case of Alpine in containers, by relying on system packages and not Ruby Gems/Bundler we can remove a whole corpus of tools and dependencies from our container images that really never need to exist in the first place. Not in a build layer, and not in the resulting product. Plus with a distro you have several hundred sets of eyes reviewing the packages as they flow through the ecosystem, whether that be regular users just trying to use and test something, to package maintainers like myself aggressively packaging the world, to the core distribution teams that scrutinizes each package change as it happens. These are contributors that would not exist in your project if all you did was bundle install and move on.

And as you'll see in a second, we're not really deviating that much from the typical Ruby workflow when we package a Gem. We still rely on tools like Bundler, Rake, and the various ruby test frameworks to package and validate our code. We're just adding more scrutiny and rigor around the process to make it sustainable/accessible to the distro at large.

APKBUILDs for Gems

Now admittedly, I'm not a master on packaging Ruby things, I've really only recently dipped my toes into these waters. But this process is so easy I rapidly added ~13 ruby libraries to Alpine in just the course of two nights. In fact, this is how I spent the first two days of the Old Computer Challenge, bundling up all the dependencies I've used in all the dabbling I've done with Ruby thus far. And whatever other dependencies they might have.

Lets look at what I did for ruby-resolv, this is a really simple Gem that provides an alternative DNS resolution to the default socket method built into ruby. Since this project is truly just a single ruby file, we don't actually have to do much work.

# Maintainer: Will Sinatra <wpsinatra@gmail.com>
pkgname=ruby-resolv
_gemname=${pkgname#ruby-}
pkgver=0.4.0
pkgrel=0
pkgdesc="A thread-aware DNS resolver library written in Ruby"
url="https://rubygems.org/gems/resolv"
arch="noarch"
license="BSD-2-Clause"
checkdepends="ruby-rake ruby-bundler ruby-test-unit ruby-test-unit-ruby-core"
depends="ruby"
source="$pkgname-$pkgver.tar.gz::https://github.com/ruby/resolv/archive/refs/tags/v$pkgver.tar.gz
	gemspec.patch"
builddir="$srcdir/$_gemname-$pkgver"

prepare() {
	default_prepare

	sed -i '/spec.signing_key/d' $_gemname.gemspec
}

build() {
	gem build $_gemname.gemspec
}

check() {
	rake
}

package() {
	local gemdir="$pkgdir/$(ruby -e 'puts Gem.default_dir')"

	gem install --local \
		--install-dir "$gemdir" \
		--ignore-dependencies \
		--no-document \
		--verbose \
		$_gemname

	rm -r "$gemdir"/cache \
		"$gemdir"/build_info \
		"$gemdir"/doc
}

sha512sums="
c1157d086a4d72cc48a6e264bea4e95217b4c4146a103143a7e4a0cea800b60eb7d2e32947449a93f616a9908ed76c0d2b2ae61745940641464089f0c58471a3  ruby-resolv-0.4.0.tar.gz
ed64dbce3e78f63f90ff6a49ec046448b406fa52de3d0c5932c474342868959169d8e353628648cbc4042ee55d7f0d4babf6f929b2f8d71ba7bb12eb9f9fb1ff  gemspec.patch
"

Gem build and its gemspec files do a wonderful job obfuscating away the complexity of our packaging concerns. It's extremely common to see rakefile's default to running tests and nothing more with whatever framework the other likes. And so we really only need to tell gem to be very particular about where it installs files and how it thinks about what it needs to install.

The one issue I ran into very consistently is the use of git ls-files inside of the gemspec files to figure out what kind of files the Gem actually installs. This is a neat trick, but a bit silly for a library that is literally one file, and even if it's several directories almost everything in a ruby library gets dump into a directory called lib.

Fortunately this little patch (while specific to the ruby-resolv package) is a quick fix for that one tiny issue. And it's really not a big deal to carry these sorts of "make the build system work" patches. At least I don't really mind.

--- a/resolv.gemspec
+++ b/resolv.gemspec
@@ -20,9 +20,7 @@
   spec.metadata["homepage_uri"] = spec.homepage
   spec.metadata["source_code_uri"] = spec.homepage
 
-  spec.files         = Dir.chdir(File.expand_path('..', __FILE__)) do
-    `git ls-files -z`.split("\x0").reject { |f| f.match(%r{^(test|spec|features)/}) }
-  end
+  spec.files         = Dir["lib/**/*"]
   spec.bindir        = "exe"
   spec.executables   = []
   spec.require_paths = ["lib"]

Now some Gem files need to be compiled, because they're actually wrappers on top of C libraries. This is a pretty common design, Ruby is used to interface with the lower level lib using FFI, just the same as would be done in Lua. When that happens the gem build system needs to compile the FFI interface code, as well as bundle the ruby code away.

# Contributor: Will Sinatra <wpsinatra@gmail.com>
# Maintainer: Will Sinatra <wpsinatra@gmail.com>
pkgname=ruby-sqlite3
_gemname=${pkgname#ruby-}
pkgver=2.0.2
pkgrel=0
pkgdesc="Ruby bindings for SQLite3"
url="https://rubygems.org/gems/sqlite3"
arch="all"
license="BSD-3-Clause"
makedepends="ruby-dev sqlite-dev"
depends="ruby ruby-mini_portile2"
checkdepends="ruby-rake ruby-bundler"
source="$pkgname-$pkgver.tar.gz::https://github.com/sparklemotion/sqlite3-ruby/archive/refs/tags/v$pkgver.tar.gz"
builddir="$srcdir/sqlite3-ruby-$pkgver"
options="!check" # requires rubocop

build() {
	gem build $_gemname.gemspec
}

check() {
	rake
}

package() {
	local gemdir="$pkgdir/$(ruby -e 'puts Gem.default_dir')"
	local geminstdir="$gemdir/gems/sqlite3-$pkgver"

	gem install \
		--local \
		--install-dir "$gemdir" \
		--bindir "$pkgdir/usr/bin" \
		--ignore-dependencies \
		--no-document \
		--verbose \
		"$builddir"/$_gemname-$pkgver.gem -- \
					--use-system-libraries

	rm -r "$gemdir"/cache \
		"$gemdir"/doc \
		"$gemdir"/build_info \
		"$geminstdir"/ext \
		"$geminstdir"/ports \
		"$geminstdir"/*.md \
		"$geminstdir"/*.yml \
		"$geminstdir"/.gemtest

	find "$gemdir"/extensions/ -name mkmf.log -delete
}

sha512sums="
987027fa5e6fc1b400e44a76cd382ae439df21a3af391698d638a7ac81e9dff09862345a9ba375f72286e980cdd3d08fa835268f90f263b93630ba660c4bfe5e  ruby-sqlite3-2.0.2.tar.gz
"

But as you can see in the apkbuild for ruby-sqlite3, that really isn't that much more effort. We just need to include our -dev dependencies to ensure we can actually compile against the correct distro libraries, and then we tell gem to build and install against those deps. There's extra work cleaning up the installation directory, but it's more or less the exact same process.

Closing Thoughts

Honestly, this is a pretty delightful find from my perspective. It means that it's not only really easy to add additional packages to Alpine, but I also discovered in this process that there aren't really that many Ruby things packaged for Alpine in the first place. I've often heard that people just can't use Alpine for Ruby things because X dependency isn't packaged, or when they try and add something using bundler it fails to properly compile. This will in the long term help wave away a whole class of issues, and I'm really excited about that.

Now, the OCC stuff. This year's theme is DIY, whatever you want to do, do it! None of use could agree on what to do so we're just doing anything and everything. I've seen some really cool posts and ideas thrown around, but with so much of my time limited by commitments at work and with my family, the best thing I can think to do is just anything I would normally decided to do, just from my junky little Acer ZG5. All of the packages above were built, tested, and pushed from the terribly 5400rpm IDE drive after being lovingly toasted by the heat spewing Atom N270 cpu. And while that process was slow at times with a repo as large as aports, it was still totally doable.

Long live old machines! We're doing real work out here thanks to them!


Page 7

We're back! I've had so little time to properly blog, it's really not surprising given everything that has happened this year, but that doesn't mean I don't miss it. A huge thank you is in order though, Mio's blog post on Vaporbot is a welcome addition to Lambdacreate, and it's really just kind of awesome to have a guest article to keep things fresh and engaging while I'm trying desparately to get back into the swing of things. Unfortunately, I think for the time being I'm juggling all things baby in what little free time I have, so the intermittent posting shall continue.

Anyways, the last couple of posts have been Ruby focused, it's almost thematic at this point. How could I not write more on the matter? I mean, I must absolutely LOVE this language if I'm willing to suffer through writing code on a Palm PDA right? Well, maybe, not exactly. I think I initially went into Ruby thinking I wouldn't like it, kind of convinced that it fits the same niche as Python or Lua, and I really don't have any real need for yet another scripting language.

But is that actually true?

Python is the swiss army knife of scripting languages. It does literally anything and everything. Just consider for the moment that this language powers full blow system orchestration systems like SaltStack and Ansible, to web application frameworks lke Django, to nearly everyones initial introduction to programming. It's the language we all reach for when we just want to do a thing, get it done, and get on with the next thing.

And Lua, well it's small, incredibly small. And kind of feature barren in comparison to Python. I mean we haven't shoved the kitchen sink inside the language, and the community isn't nearly as large as Python's. But despite that there's web application frameworks like Lapis, and the language exists nearly everywhere in some version. If you want speed LuaJit knocks it out the box. And the entire language can be embedded inside of other applications to enable complex run time dynamic configuration. Heck Love2D is a great and really cool example of this.

Hard to imagine that between these two things I could need yet another different scripting language, right?

Well I was wrong

It turns out that I DO need yet another language, and Ruby is a good fit for a particular problem that both Python and Lua share. And that problem is the ecosystem.

"But Will, the Python ecosystem is vast and impressive!" and you're not wrong. But it is woefully fragemented, riddled with circular dependenices, and suffers terribly from the competing standards problem. You can do literally anything with Python, but there's almost too many ways to accomplish that task. They're all very fragile unless you spend substantial time tracking down whatever is the supported method du jour. And that standard will change out from under you without warning.

"You don't know what you're talking about Will!" Maybe? Maybe not. I maintain a solid amount of Python packages for Alpine Linux. I write Python code for work. I think if anything I'm looking too far under the hood and not just running pip3 with --break-system-packages.

By comparison Lua has luarocks, which also isn't awesome, but in different ways. It overwrites distro maintained libs in the same way that that pip3 command would. But more so it's a literal wild west insofar far as what code is contributed. I maintain plenty of lua libs in Alpine as well, and have a few published ot luarocks. The ecosystem just doesn't feel robust or maintained.

So Ruby must be better right? Well sort of, the Gem ecosystem seems a little older, a little better maintained. But I think the situation is nuanced. Ruby has a major driving force behind it thanks to the popularity of Ruby on Rails, which is used to drive major projects like Gitlab. Lua doesn't really have that, and Python has too much of it! If you use Lua and want a package manager you suffer through luarocks, there is no alternatively, and the restaurant is almost entirely closed. Occassionally someone orders take out, and maybe someone pops in to wip that food up. Python on the other hand has a head chef, he publishes thoughtful documentation on the what and how, the recipe so to say. There are also several thousand other chefs ignoring it all and doing it their own way.

Ruby has one way, the Ruby way. And sometimes I just don't want to fight the horde of chefs, and I need something a little bit better maintained than the empty shop on the corner. Ruby feels like that corner Deli you go to, the owner works the counter, you order a pastrami on white bread with mayo, and they give it to you on rye bread with mustard because that's the only way you can make a pastrami.

Sometimes the box is good

The net gain in my mind, is that there is one way to think about doing things. I only need to track how to use the Gem ecosystem. I can expect to find a plethora of handy libraries in that ecoystem, some of which might be a little bit dated, but useful nonetheless! If it doesn't build there's a good chance that someone else has written a differnet library, because that ecosystem is still very much alive.

Lets look at a real example. I have a little mealie server at home I run for the wife and I. We have hundreds and hundreds of recipes and it has become a cornerstone of our budgeting and planning. Naturally that means it's somewhat important that it gets updated somewhat frequently, at least when there are compelling features or security issues.

But I run that server in an incus container, which in turn runs the upstream docker container. The little web portal will tell you when you need to update, but I'm never in the portal to do adminsitrative things, I'm in there to look up how to make spam musubi or some other tasty treat. There's also nothing that can be done from that web portal if you do check the version. If I check it and it says it's outdated, then I put down my phone, pull out a laptop to jump into the container and edit the openrc service for the container with the new version. It's a bit of a drudge frankly.

Enter some not so fancy, honestly very simple, ruby code! This took maybe 30-40m to figure out, which is a nice feeling that I think comes with any lanugage you're really familiar with, but came faster than my experience with previous languages.

The script itself is only meant to check whether or not the version reported by the mealie git repo is the version running in the container, and if it isn't, modify the openrc script with the newest version.

#!/usr/bin/ruby
require 'nokogiri'
require 'httparty'

# To Install:
# apk add ruby ruby-dev libc-dev gcc
# gem install nokogiri
# gem install httparty

$initcfg = "/etc/init.d/mealie"
$feed = "https://github.com/mealie-recipes/mealie/releases.atom"

# Given an Github Atom releases feed, and assuming title contains just versioning info (ie: v1.5.1), return the version of the last update
def findGitVer(url)
	resp = HTTParty.get(url)
	atom = Nokogiri::XML(resp.body)

	ver = atom.css("entry title").first
	return ver.text.gsub("v", "")
end

# Assuming an openrc init file, with an argument version=#.#.# (ie: version=1.4.0), return the currently configured version
def findInitVer(file)
	File.open(file) do |f|
		f.each_line do |line|
			if line.include? "version=" then
				return line.gsub("version=", "").strip
			end
		end
	end
end

# Compare the configured init version against the reported current version from git, and update the openrc init file to the latest version.
def updateInit(gitver, initver)
	if gitver > initver then
		text = File.read($initcfg)
		File.open($initcfg, "w") { |file| file.write(text.gsub("version=#{initver}", "version=#{gitver}")) }
		puts "#{$initcfg} has been updated to #{gitver}"
		system("rc-service mealie restart")
	elsif gitver == initver then
		puts "Init is configured to use #{initver} which is the same as the version reported by git #{gitver}"
	else
		puts "Something has gone wrong. Init says #{initver} and Git says #{gitver}?"
	end
end

git = findGitVer($feed)
init = findInitVer($initcfg)
updateInit(git, init)

Nothing special going on right? I could have easily done this with Python and bueautifulsoup, but if I had done it in Python I'm not entirely certain how reliable it would have been. Ruby has been fire and forget here, write it once and there's very little expectation that things break terribly. Perhaps some minor syntactic changes between versions, but it has been very minor.

Comparatively, upgrding from Python 3.11 -> 3.12 in Alpine recently uncovered a mesh of circular dependencies, precarious and disparate build processes and other strange errors.

Just [upgrading py3-iniconfig], a seemingly simple lib that gets imported by pytest, required dragging 6 r 7 other libraries through various changes. Either upgrading versions, disabling tests, or bypassing native builds in some cases. And this upgrade was a minor bump where iniconfig started using a different build system. This is an isolated problem, but unfortunately not an uncommon one. Rebuilding the couple thousand python packages in the Alpine ecosystem ahead of Alpine 3.20's release took multiple maintainers multiple weeks to sort through. And the entire process was precarious and needed thoughtful sequencing to pull off.

[https://gitlab.alpinelinux.org/alpine/aports/-/merge_requests/61309]: upgrading py3-iniconfig

Sometimes the box is bad

Maybe I'm being unfair though, Ruby isn't all good. Notice my little build notes? That's right, I'm pulling things in from Gem! Of course it's better, if I just use Python Venv's and pip I wouldn't have this problem. Or maybe Nix/Guix would be even better?

Strangely not a lot of Ruby libs appear to be packaged for Alpine, I'm not certain what the case for this is. Are they materially harder to maintain? They appear to compile in a lot of cases, but this is no different than a lot of Lua or Python libs.

Maybe as I keep digging into this ecosystem I will find that the packaging is just as bad, and the only good option is Golang's "include the world" because the world always breaks otherwise. (But I fundamentally disagree with this take as well). There's probably no solution that meets every single use case, but I firmly believe relying on the distro's maintenance and packaging is closer to right than --break-system-packages will ever be.

So right, what's not to like about Ruby? Well, it's kind of slow. Not in a way that makes it unusable, but in the sense that it uses a massive [amount of CPU to perform]. Maybe I'm once again looking too far under the hood here, but on 32bit systems Ruby is just plain slow. No issues whatsoever on aarch64/x86_64 systems. And sometimes it doesn't really matter how long something takes to complete. Like that mealie version script, I run it with an ansible playbook when I apk upgrade, who cares if it takes 5s to run?

[https://benchmarksgame-team.pages.debian.net/benchmarksgame/fastest/lua-ruby.html]: amount of CPU to perform

But then there's something like my rat info tool. This is a rewrite of a tool I wrote in Nim during the last OCC. Rat poison doesn't have a status bar like i3, but it can run arbitrary commands on a key combo. The idea here is to generate a little bit of info in the upper right hand window populated almost entirely from paths in /sys and /proc to make it as portable as possible. Now the nim version worked flawlessly, milisecond execution, really we can't be surprised that's a compiled language.

On the other hand, this bit of Ruby, while incredibly easy to write and debug, takes 3s to run on my Acer ZG5.

#!/usr/bin/ruby

def readFile(path)
  f = File.open(path, "r")
  return f.read
end

def findOneStr(file, match)
  File.readlines(file, chomp: true).each do |line|
    if line.include? match then
      return line
    end
  end
end

def getFirstLine(file)
  line = File.foreach(file).first
  return line
end
  
def batt(battery)
  sys = "/sys/class/power_supply/"

  if File.exist?(sys + battery + "/capacity") then
    perc = readFile(sys + battery + "/capacity").strip.to_i
  else
    max = readFile(sys + battery + "/charge_full").strip.to_i
    now = readFile(sys + battery + "/charge_now").strip.to_i
    perc = (now * 100) / max
  end
  return perc
end

def batteries
  sys = "/sys/class/power_supply/"
  batteries = Dir.glob(sys + "*").select {
    |path|
    path.include? "BAT"}
  total = 1
  if batteries.length < 1 then
    return nil
  else
    for battery in batteries do
                     name = battery.gsub!(sys, "")
                     total += batt(name)
                   end
      perc = (total / batteries.length)
      return perc
    end
end

def ctemp
  sys = "/sys/class/thermal/thermal_zone0/"
  if File.exist?(sys) then
    now = readFile(sys + "temp").strip.to_i
    t = now / 1000
    return t.round
  else
    return nil
  end
end

def cmem
  procf = "/proc/meminfo"
  totalkb = findOneStr(procf, "MemTotal:").match(/[0-9]+/).to_s.to_i
  availkb = findOneStr(procf, "MemAvailable:").match(/[0-9]+/).to_s.to_i
  perc = ((totalkb - availkb) * 100) / totalkb
  return perc.round
end

def cdate
  time = Time.new
  now = time.strftime("%Y-%m-%d %H:%M:%S")
  return now
end

def ccpustart
  return start
end

# /proc/stat
# user   nice system  idle      iowait irq    softirq steal_time virtual
#5818150 0    3852330 212448562 278164 567572 507430  477889     0
def ccpu
  start = getFirstLine("/proc/stat").match(/([0-9]+ )+/).to_s # Pull values from proc
  startmap = start.split.map(&:to_i) #map to an array of intergers
  sleep(1) # delay to generate data
  stop = getFirstLine("/proc/stat").match(/([0-9]+ )+/).to_s # Pull delta values from proc
  stopmap = stop.split.map(&:to_i) # map to an array of integers
  total = stopmap.sum - startmap.sum # delta difference between the two sums
  idle = stopmap[3] - startmap[3] # delta difference between the two idle times
  used = total - idle # subtract idle from total to get rough usage
  notidle = (100 * used / total).to_f # generate percentile usage
  return notidle.round
end

def main
  info = {
    perc: batteries,
    cal: cdate,
    temp: ctemp,
    mem: cmem,
    cpu: ccpu,
  }
  
  #C: 8% | M: 19% | T: 52.0c | B: 82.0% | 2023-06-07 13:01:06
  #if info[:batt].nil? and info[:temp].nil? then
  #  puts "C: #{info[:cpu]}% | M: #{info[:mem]}% | #{info[:cal]}"
  #elsif info[:batt].nil? and !info[:temp].nil? then
  #  puts "C: #{info[:cpu]}% | M: #{info[:mem]}% | T: #{info[:temp]}c | #{info[:cal]}"
  #elsif !info[:batt].nil? and info[:temp].nil? then
  #  puts "C: #{info[:cpu]}% | M: #{info[:mem]}% | B: #{info[:perc]}% | #{info[:cal]}"
  #else
    puts "C: #{info[:cpu]}% | M: #{info[:mem]}% | T: #{info[:temp]}c | B: #{info[:perc]}% | #{info[:cal]}"
  #end
end

main

And fundamentally this is written the same way the nim program was. I'm willing to accept that maybe that's a me issue, I shouldn't use computers that are old enough to buy their own alcohol, and I could probably write more performant code. But I think the intent is important. Without trying to write performant code in a compiled language, like Nim, or Golang, or Rust, you can get really solid performance. The difference is time invested in producing the thing.

So what now?

For me Ruby seems to fit in an interesting gap between Python, Lua, and maybe even Go. Like Golang I find it incredibly easy to work with, I can get ideas into Emacs really quickly, I can start testing those immediately without worrying about compilation, and I have a high degree of confidence that the tool once written can be reproduced and run elsewhere. It just feels a little bit more robust than Python from a long term management perspective if that makes sense.

Will I use it further? I think yes! I've rewritten a few different tools at work from Fennel to Ruby and that has been a delightful and rewarding experience. It also helps that we have a few Ruby on Rails stacks that we manage, so the skillset won't go to waste.

But for the record, I don't think I'll be dropping Lua or Python out of my frequent rotation of languages. They each have their weaknesses and strengths, I'm just tired of cutting myself trying to chase down Python deps. And I can't fathom the idea of learning Nix just to make that problem go away, the trade off in complexity is just not worth it in my mind.


Page 8

Picture an online door-to-door salesman showing up unannounced with some crazy thing he claims will change your life. While you protest that you haven't got any authorial currency to merit his quality blogging service, he smiles and pencils in your name anyway. He suavely offers to sell you a bridge of brownies with the post and accepts vapourware as a payment method. You have no idea where to get real vapourware that isn't just a retro T-shirt, but the brownies are enticing and you figure you can leave that problem for future you. Hours later, with most of the brownies having been deliciously decimated, your sugar-saturated brain reboots and finally begins the hunt in the unlit basement of ~/junk for some non-existent wares to pay for the chocolatey treats.

As you trudge through the musty confines, you remember that you've been telling the salesman and owner of the fine establishment named Lambdacreate about an Internet Relay Chat (IRC) bot framework you were supposed to be reassembling for many weeks now which still hasn't materialised. You think with only a tiny dose of regret it would've had potential for repayment had it been made with Ruby, which is apparently his current poison of choice. You don't know a thing about Ruby (or maybe you do but it's not part of this story) except Jekyll was the blogger bee's knees a decade ago. But you decide with all the folly from the fermenting sugar in your guts to learn you some Ruby for great good! With your newly-acquired skill, you'll finally craft a vapourware and your quest for the salesman will be complete. True story! Except you're me and there were no brownies. Drats.

Step 0: acquire tools and basic skill

Here's how this quest runs. Step 0 of the quest chain is to learn just enough Ruby to make some mischief, and for that the first port of call is Learn X in Y Minutes. It's the Cliff's Notes of programming languages and the bane of Computer Science instructors everywhere, all syntax and little nuance. Next, request Ruby from a friendly neighbourhood package manager, e.g. apk install ruby in Alpine Linux. Most program paladins already have a code editor in their inventory (plus the kitchen sink if the editor is an emacs), but you can get one from the package manager. Check ruby -v and pin up the Ruby Standard Library and Core Reference for the corresponding version. There are other manuals on the official Ruby website and no time to read them, so this story will continue as if they didn't exist. Or you can still read them, but they're not necessary for the quest.

Step 0.5: come up with a recipe

One of the hallmarks of good vapourware besides non-existence is an appearance of utility without really being useful. This requires some understanding of how to be useful in order to avoid being the very thing that is unacceptable but would otherwise be desirable. Bots are all the rage nowadays so let's make an IRC bot that will connect to a server, join a channel and promptly do nothing.

Useless bot recipe

  1. Make a data list.
  2. Make it connect to an IRC server.
  3. Make it join a channel.
  4. Make it crash.
  5. Make it do nothing.

Right away you might already notice some of the things it won't do:

  • Authenticate to the server — that would make it useful in channels that require users to be registered and authenticated to the server to talk. Can't do that, Dave. You're welcome to go on an "Authenticate with Nickserv" side quest, but you're on your own there.

  • Disconnect properly from the server — unless you're one of those despicably perfect wizards who gets every spell right the first time, you'll find out quickly Ruby is great at producing runtime errors, which is like unexpectedly discovering someone took the last brownie from the buffet platter when you return for more and Ruby the waitress doesn't know what to do. You should totally take advantage of it for the crashing effect. All of this is a roundabout way to say disconnecting following protocol is redundant when it can already crash out.

  • Save settings to a config file — it would make the bot modularly reusable, more useful and therefore untenable. Instead the settings will be baked into the bot and if other code mages don't like the channels it joins they would have to fork the bot and pipe in their own settings. Since it's a frowned-upon practice, it'll have the additional benefit of mostly keeping other people's mitts off your bot. Which is arguably a useful feature but no one's counting.

  • Handle errors — it doesn't pretend to be great at what it does, which will be nothing. Like Ruby the waitress it'll look at you blankly or nope itself out if it can't handle whatever you're asking it to do.

With the Don'ts written off, what about the Do's?

  • Make a data list — this is a list of the pieces of information the bot will use to perform the other parts of the recipe, such as the IRC server's hostname and port.

  • Make it connect to an IRC server — the bot still has to do a few things to look plausibly legitimate. Of course, advertising it will do something without actually doing it will also be enough, but that'll be passing up an opportunity to whinge about Ruby's inconsistency. It'd help to know how the IRC protocol usually works (keyword being "usually" as some servers may respond on different timing for authentication commands), but connecting is generally the same across servers.

  • Make it join a channel — this was a tough choice. The bot could hang around a server without joining any chat rooms like a ghost, giving it more of the vapourware vibe in being practically non-existent even if people could message the bot if they knew its name. It'd also be a little sad. The aim here is serendipitously useless, not sad.

  • Make it crash — this can be spontaneous or on-demand.

  • Make it do nothing — the easiest way to accomplish this is to not add things it can do, while the contrarian's way is to make it do something without actually doing anything, or idle.

Some code mages will call this step "defining the scope of your application", which is just a fancy way to say you figured out what you're doing.

Step 1: craft vapourware

It's Lamdacreate's Craft Corner! Let's craft!

1. Make a data list.

Start a new file creatively called vaporbot.rb in your editor, and add a new code block:

class Vaporbot
  @@server = {
    "host" => "irc.libera.chat",
    "port" => 6697,
    "ssl"  => true,
    "nick" => "vaporbot",
    "user" => "vaporbot",
    "channels" => ["##devnull"],
    "do" => ["crash", "idle"],
    "mod" => "!",
    "debug" => true,
    }
end

This tells Ruby there's a new class object called Vaporbot which will be used to group together instructions (or functions) for the bot. (Classes can do more than that, but this time it's just being used as a box to hold the parts rather than spilling them all out onto the table.) The next line creates a new variable or item named @@server with a hash table, which is like a dictionary with a list of all the settings the bot needs to look up in order to complete certain tasks, such as the address and port of the server to connect to, the name with which it should introduce itself to the server, the channels to join, and the actions it can perform for users. Adding @@ in front of the variable name allows it to be read by functions or instructions that will be added inside the box.

The ssl key will be checked by the bot to decide whether to use SSL/TLS for the server connection. Most IRC servers will support both SSL and non-SSL connections, or encourage people to use SSL for improved security. For Libera Chat's servers, 6697 is the SSL port, and if ssl is set to false, then the port setting should be changed to 6667. mod, short for modifier, is the character that users add in front of an action word, e.g. "!ping" to signal to the bot. The debug setting will eventually tell the bot to print out all the messages it receives from the server to the terminal, which is helpful for spotting problems (this feature can be added because it makes building the bot easier, not making the bot itself more useful). The keys can have different names, but it helps to use descriptive words unless you want future you to be confused too.

2. Make it connect to an IRC server.

The bot has the data it needs, so let's give it some instructions. Lines prefixed with # are comments.

# Import the openssl and socket libraries.
require "openssl"
require "socket"

class Vaporbot
  @@server = {
    "host" => "irc.libera.chat",
    "port" => 6697,
    "ssl"  => true,
    "nick" => "vaporbot",
    "user" => "vaporbot",
    "channels" => ["##devnull"],
    "do" => ["crash", "idle"],
    "mod" => "!",
    "debug" => true,
    }

  # Add a new function named "init".
  def self.init(s = @@server)
    # Create a new connection socket.
    sock = TCPSocket.new(s["host"], s["port"])

    # If using SSL, turn it into a SSL socket.
    if s["ssl"]
      sock = OpenSSL::SSL::SSLSocket.new(sock)
      sock.connect
    end

    # Listen for messages from the server.
    # Keep running as long as the variable's value is not nil.
    while line = sock.gets
      # Print the message to the terminal.
      puts line
    end
    # Close the socket if the server ends the connection.
    sock.close
  end

end

# Call the function.
Vaporbot.init

In order to connect to the server, the bot has to set an endpoint or socket through which it can send and receive messages from the server. Fortunately Ruby comes with an extensive built-in library of classes that have methods to provide components like sockets so they don't need to be created from scratch. The first two lines at the top of the file asks Ruby to load the classes that provide the SSL and non-SSL sockets. Then they can be used in a new function init to connect to the server. The self keyword registers the function as belonging in the Vaporbot class box and allows it to be called outside of the class. (s = @@server) shows that the function takes one variable, represented inside the function as s. If no variable is provided to the function when it is called by name like Vaporbot.init, it will use the values from the @@server table. The first part of the init function passes the server's address and port values to the socket to connect on a local port. If it successfully contacts the server, it proceeds to a loop that runs over and over, listening for messages from the server (sock.gets) until it receives nothing, at which point the loop will stop, and the function wraps up by closing the socket, freeing up the local port again.

At this point if you tried running the script with the command ruby vaporbot.rb, the bot will knock on the server's door then stand there wordlessly while the server prompts for its name. After about thirty seconds the server gets tired of waiting and shuts the door on the bot. What should little vaporbot do to be let into the party? Introduce itself to the server:

    while line = sock.gets
      puts line
      resp =
        if line.include?("No Ident")
          "NICK #{s["nick"]}\r\nUSER #{s["user"]} 0 * #{s["user"]}\r\n"
        else ""
        end
      if resp != ""
        sock.write(resp)
      end
    end

include? is a built-in string function to check whether a text string contains another string. If the message from the server contains certain keywords like "No Ident", the bot will respond with their nick and user names. The #{} is used to insert variable values such as those from the @@server table inside an existing text string. The \r\n marks the end of each line when sent to the server using sock.write(). Now when the bot connects, the server can greet it by name after the bot flashes its name tag:

:tantalum.libera.chat NOTICE * :*** Checking Ident
:tantalum.libera.chat NOTICE * :*** Looking up your hostname...
:tantalum.libera.chat NOTICE * :*** Found your hostname: example.tld
:tantalum.libera.chat NOTICE * :*** No Ident response
NICK vaporbot
USER vaporbot 0 * vaporbot
:tantalum.libera.chat 001 vaporbot :Welcome to the Libera.Chat Internet Relay Chat Network vaporbot

3. Make it join a channel.

Little vaporbot is ushered in, and the server enthusiastically tells the bot about the number of revellers and many rooms available. Maybe you're already in one of those rooms and you want vaporbot to join you there too. The IRC command is, yep, you guessed it, JOIN #channel. The trick however is to wait until the server winds down its welcome speech, also known as the MOTD or message of the day, before having the bot send the join request, or it won't hear it above the sound of its own happy gushing. To join multiple channels, separate each channel name with a comma.

    while line = sock.gets
      puts line
      body =
        if line != nil
          if line.split(":").length >= 3; line.split(":")[2..-1].join(":").strip
          else line.strip; end
        else ""; end

      resp =
        if body.include?("No Ident")
          "NICK #{s["nick"]}\r\nUSER #{s["user"]} 0 * #{s["user"]}\r\n"

        elsif (body.start_with?("End of /MOTD") or
          body.start_with?("#{s["user"]} MODE"))
          "JOIN #{s["channels"].join(",")}"

        elsif body.start_with?("PING")
          if body.split(" ").length == 2; body.sub("PING", "PONG")
          else "PONG"; end

        else ""
        end
      if resp != ""
        sock.write(resp)
      end
    end

Here's an example output for joining a channel called ##devnull:

:tantalum.libera.chat 376 vaporbot :End of /MOTD command.
JOIN ##devnull
:vaporbot MODE vaporbot :+Ziw
:vaporbot!~vaporbot@example.tld JOIN ##devnull
:tantalum.libera.chat 353 vaporbot @ ##devnull :vaporbot @mio
:tantalum.libera.chat 366 vaporbot ##devnull :End of /NAMES list.

While vaporbot was making its way to a channel, you might've spotted a few changes to the listening loop. The first is a new body variable with text extracted from line, specifically the section after the channel name which is the message body. This is what IRC clients usually format and display to users, including messages from other users, so it's handy and slightly more reliable to check for keywords in this part of a message from the server, e.g. line.include? is updated to body.include?. The other addition is a clause looking for a PING call in the server messages. Before this, if you've kept the script running for a while, you might've seen the poor bot getting shown the door again shortly after a similar ping. The bot needs to periodically echo back a PONG in response to keep the connection active, like this:

PING :tantalum.libera.chat
PONG :tantalum.libera.chat

While it might be funny the first time, the disconnects will eventually become annoying. Adding a ping check will enable the bot to run mostly unattended.

4. Make it crash.

The bot can connect to a server and join a channel, so far so good. Now to introduce user triggers and make it do silly things on demand. For this let's add a new variable called @@action with another hash table of keys and values like @@server, but this time with functions as values.

  @@action = {
    "crash" => -> (sock) {
      sock.write("QUIT :Crashed >_<;\r\n") },
    }

The -> here denotes an anonymous function or lambda. It's basically a small function that may be used only a few times to not bother giving a name, or it might be part of another function that triggers functions dynamically such as from a user's text input. The function itself just sends a QUIT message to the server which disconnects the bot. An optional text can be displayed to other users in the channel (Crashed >_<;) when the bot leaves.

Next, the @@action variable is passed into the init function like @@server, and in the listening loop, a new check is added that looks for the trigger keywords in the do list including "crash" and "idle".

class Vaporbot
  @@server = {
    # Other keys and values here [...]
    "do" => ["crash", "idle"],
    "mod" => "!",
    }

  @@action = {
    "crash" => -> (sock) {
      sock.write("QUIT :Crashed >_<;\r\n") },
    }

  def self.init(s = @@server, action = @@action)
    # Socket connect statements here [...]

    while line = sock.gets
      # body and resp variables here [...]

      # Respond to other user requests with actions.
      if body.start_with?("#{s["mod"]}")
        s["do"].each do |act|
          if body == "#{s["mod"]}#{act}"
            action[act].call(sock)
          end
        end
      end
    end

  end

end

When someone sends the trigger !crash, the bot will look up "crash" in the action table (@@action by default) and retrieve the lambda function that sends the quit notice to the server. The call() method actually runs the lambda, passing in the sock variable for sock.write() to talk to the server.

The result from an IRC client looks like this:

<@mio> !crash
 <-- vaporbot (~vaporbot@example.tld) has quit (Quit: Crashed >_<;)

A note of caution: for a seriously serious bot, you'd want to only permit the bot admins to do this, e.g. by checking the person's user name (not nick, which is easier to impersonate) and potentially a preset password match ones provided in the server settings. However, since it's a seriously useless bot, allowing anyone to crash the bot might be funny or annoying depending on whether there are fudge brownies to be had on a given day. Which is to say, it's irrelevant.

5. Make it do nothing.

You know the phrase "much ado about nothing"? This next and final sub-quest of a sub-quest is a literal example of this. In the previous section you may recall the do list had an "idle" keyword. Let's add a real action for it:

  @@action = {
    "crash" => -> (sock, msg) {
      sock.write("QUIT :Crashed >_<;\r\n") },
    "idle" => -> (sock, msg) {
      sock.write("PRIVMSG #{msg["to"]} :\x01ACTION twiddles thumbs\x01\r\n") },
    }

Aside from the new msg argument (more on that in a bit), the main thing here is the idle lambda that sends an ACTION message for the channel, just like a user might type /me twiddles thumbs in their IRC app to emote or roleplay. The ACTION message isn't part of the original IRC protocol specs but from a Client-to-Client Protocol (CTCP) draft that many IRC servers have since added support for which flags certain messages to be displayed differently. The \x01 are delimiters to signal to the server there's a special message within the PRIVMSG message.

The bot needs to tell the server who the message text is for, e.g. a channel or another user. That's where the msg variable comes in. It's another hash table that lives inside the listen loop, updated as the message arrives from the server to extract values such as the user who spoke, the channel, do keywords if any and the message body. Below is the listening loop with a breakdown of the msg keys and values.

    while line = sock.gets
      # body and resp variables here [...]

      # If the message string includes an "!" character,
      # it is likely from a regular user/bot account or the server's own bots.
      # Otherwise ignore the line.
      msg =
        if body != "" and line.include?("!")
          recipient =
            if line.split(":")[1].split(" ").length >= 3
              line.split(":")[1].split(" ")[2]
            else ""; end
          sender = line.split(":")[1].split("!")[0]
          do_args =
            if body.split(" ").length >= 2; body.split(" ")[1..-1]
            else []; end
          to =
            # Names that start with "#" are channels.
            if recipient.start_with?("#"); recipient
            # Individual user.
            else sender; end
          { "body" => body, "do" => body.split(" ")[0], "do_args" => do_args,
            "sender" => sender, "recipient" => recipient, "to" => to }
        else { "body" => "", "do" => "", "do_args" => [],
          "sender" => "", "recipient" => "", "to" => "" }; end

      # Respond to other user requests with actions.
      # The `msg` variable is also passed to the `call()` method
      # so the functions in the `action` table can accessits keys and values.
      if body.start_with?("#{s["mod"]}")
        s["do"].each do |act|
          if body == "#{s["mod"]}#{act}"
            action[act].call(sock, msg)
          end
        end
      end
    end

In the idle lambda, msg["to"] provides the channel name where the trigger originated so the action will be shown there:

<@mio> !idle
     * vaporbot twiddles thumbs

Putting it all together

After some minor fiddling, here's the vapourware in all its 95 lines of glorious futility:

#!/usr/bin/env ruby
# Vaporbot // Useless by design.™
# (c) 2024 no rights reserved.
require "openssl"
require "socket"

class Vaporbot
  @@server = {
    "host" => "irc.libera.chat",
    "port" => 6697,
    "ssl"  => true,
    "nick" => "vaporbot",
    "user" => "vaporbot",
    "channels" => ["##devnull"],
    "do" => ["crash", "idle", "ping"],
    "mod" => "!",
    "debug" => true,
    }

  @@action = {
    "crash" => -> (sock, msg) {
      self.respond(sock, "QUIT :Crashed >_<;") },
    "idle" => -> (sock, msg) {
      self.respond(sock, "PRIVMSG #{msg["to"]} :\x01ACTION twiddles thumbs\x01") },
    "ping" => -> (sock, msg) {
      self.respond(sock, "PRIVMSG #{msg["to"]} :pong!")},
    }

  @@state = { "nicked" => false, "joined" => false }

  def self.respond(sock, str)
    sock.write("#{str}\r\n")
  end

  def self.init(s = @@server, action = @@action)
    sock = TCPSocket.new(s["host"], s["port"])
    if s["ssl"]
      sock = OpenSSL::SSL::SSLSocket.new(sock)
      sock.connect
    end
    while line = sock.gets
      body =
        if line != nil
          if line.split(":").length >= 3; line.split(":")[2..-1].join(":").strip
          else line.strip; end
        else ""; end
      msg =
        if body != "" and line.include?("!")
          recipient =
            if line.split(":")[1].split(" ").length >= 3
              line.split(":")[1].split(" ")[2]
            else ""; end
          sender = line.split(":")[1].split("!")[0]
          do_args =
            if body.split(" ").length >= 2; body.split(" ")[1..-1]
            else []; end
          to =
            if recipient.start_with?("#"); recipient
            else sender; end
          { "body" => body, "do" => body.split(" ")[0], "do_args" => do_args,
            "sender" => sender, "recipient" => recipient, "to" => to }
        else { "body" => "", "do" => "", "do_args" => [],
          "sender" => "", "recipient" => "", "to" => "" }; end
      resp =
        # Wait for ident prompt before sending self-introduction.
        if not @@state["nicked"] and body.include?("No Ident")
          @@state["nicked"] = true
          "NICK #{s["nick"]}\r\nUSER #{s["user"]} 0 * #{s["user"]}"
        # Wait for user mode set before requesting to join channels.
        elsif not @@state["joined"] and (body.start_with?("End of /MOTD") or
          body.start_with?("#{s["user"]} MODE"))
          @@state["joined"] = true
          "JOIN #{s["channels"].join(",")}"
        # Watch for server pings to keep the connection active.
        elsif body.start_with?("PING")
          if body.split(" ").length == 2; body.sub("PING", "PONG")
          else "PONG"; end
        else ""; end
      # Respond to events and print to standard output.
      if resp != ""; self.respond(sock, resp); end
      if s["debug"] and line != nil; puts line; end
      if s["debug"] and resp != ""; puts resp; end
      # Respond to other user requests with actions.
      if body.start_with?("#{s["mod"]}")
        s["do"].each do |act|
          if body == "#{s["mod"]}#{act}"; action[act].call(sock, msg); end
        end
      end
    end
    sock.close
  end
end


Vaporbot.init
  • #!/usr/bin/env ruby is a shebang that tells terminals in Unix-based OSes to find the ruby program and use it to run the file when it's called as ./vaporbot.rb instead of ruby vaporbot.rb.

  • Look, a new !ping trigger! It makes little vaporbot yell "pong!" in response! So excitement! Much wow!

  • @@state["nicked"] and "@@state["joined"] act as flags that are set the first time the bot sends its name and joins a channel, so it won't try to do either again until the next time it's restarted and connects to the server.

As long as it's neither officially released in its own package nor deployed to the designated server, it can be considered a type of vapourware. Yay for arbitrary criteria!

Step 2: detour for a hot take

Although semi-optional, hot takes and lists are common amenities found on blogs these days so here's a 2-in-1 free with a buy-in of this vapourware. This half-baked opinion arose from learning beginner's Ruby in an afternoon and delivered while it's still fresh. First impressions thirty years late sort of fresh.

Things to like about Ruby:

  • Sizeable standard library — Ruby bundles a number of modules both automatically available and by import, including ones for JSON and OpenSSL (vaporbot only briefly demo-ed one feature of the latter). My tour of a new programming language occasionally includes taking a peek into the built-in toolbox or checking whether it has a decent string module, as much of my scripting currently involves splicing and mangling text. Classes for primitive types like String and Array look fairly comprehensive. (It might be less of a factor for apps mostly manipulating custom objects where you need to write conversion methods anyway.) The minimalists might shake their heads, but having a robust standard library is super helpful for getting on with the core operations of your vapourware, instead of being distracted writing utility classes to fill in the most basic functionality, though this sometimes comes at the expense of a larger install size. Unfortunately a few handy ones like RSS are no longer part of the bundled libraries, but if you don't mind using a language's package manager like RubyGems they're just one install command away. Somewhat notably, CGI is still included.

  • Usually helpful errors — there hasn't been a whole lot of opportunities yet for this vapourware to go wrong, but syntax errors are generally clear and include the type of the variable or argument that the problem function is operating on. Programming newbies can rejoice as it underlines the faulty segment and suggests other method or constant names, with the caveat it doesn't always find a suggestion. A lot of languages do this now but there was time when some didn't, so older languages could get some credit for pioneering or modernising their error reporting.

    vaporbot.rb:50:in `init': undefined method `includes?' for an instance of String (NoMethodError)
    
            if body != "" and line.includes?("!")
                                  ^^^^^^^^^^
    Did you mean?  include?
            from vaporbot.rb:101:in `<main>'
    

Things to like less about Ruby:

  • Runtime errors — coming to Ruby after almost two years of messing around with a strongly-typed compiled language, this is arguably a major drawback of using some interpreted languages and isn't specific to Ruby. Showstopping errors from missing methods causing the bot to crash and lose connection with the server are fun in useless apps, not so much if the bot is supposed to stay connected.

    :tantalum.libera.chat NOTICE * :*** Checking Ident
    :tantalum.libera.chat NOTICE * :*** Looking up your hostname...
    :tantalum.libera.chat NOTICE * :*** Found your hostname: example.tld
    (NoMethodError)n `respond': undefined method `NICK vaporbot
    USER vaporbot 0 * vaporbot
    ' for an instance of OpenSSL::SSL::SSLSocket
    
        sock.send("#{str}\r\n")
            ^^^^^
            from vaporbot.rb:81:in `init'
            from vaporbot.rb:96:in `<main>'
    

    Maybe the error could have been caught earlier before the bot got to the server door. For the bot to only find out it can't speak when the server asks for its name is a tiny bit weird? Sorry vaporbot, your crafting buddy here didn't equip you with a working mic before sending you off to meet the server. In this instance sock.send() is a defined method in the Socket class that includes TCPSocket, but unsupported by OpenSSL SSLSocket.

    # This is for non-SSL sockets only.
    sock.send("#{str}\r\n")
    
    # One of the following works for both.
    sock << "#{str}\r\n"
    sock.write("#{str}\r\n")
    sock.puts("#{str}\r\n")
    

    The higher level of syntactic sugar is fine because it increases the chances of finding a method that works, as long as you don't forget and use another method for no apparent reason elsewhere in the code later.

  • Verbose syntax — this is squarely in nitpicking territory. Every logic block has to be terminated with end. It's vaguely reminiscent of shell scripts where semi-colons can be used to terminate lines and not putting the end delimiter on new lines can reduce the line count in longer scripts if you care a lot about that, and some people might appreciate it as a flexibility. The ending delimiter is often unneeded in space/indent-delimited languages, and lumping lines together like that makes it less readable in some cases, so it might be a minor advantage over being able to omit it entirely. The mix of camel case class names with methods in kebab case like OpenSSL::SSL::SSLContext.connect_nonblock is a mild eyesore, again only if a coder cares about styling. Methods that return a boolean get a mark ? as in include? for seemingly no special reason. Most times it should be clear from later usage if a function returns a boolean. Plus tiny things like casecmp but each_line.

Bottom line: neither awful nor exceptional — keeping in mind this vapourware crafter is partial to languages that save coders from tripping over themselves, and paying for performance/speed costs up front.

Step 3: complete quest

If you're reading this, it means the quest is complete. Achievement got!

Hopefully you've enjoyed this intermission from the regularly scheduled programming.

Will vaporbot get a phantom update that will enable it to procure more make-believe brownies from the salesman? Or will it be thwarted by its crafter buddy's sugar-induced coma? Find out in the next instalment*!

* Available through participating authors only. Offer not valid on one-shot posts. Invisible terms and conditions apply. Not coming soon to a browser near you.


Read the original article

Comments

  • By miladyincontrol 2025-05-3115:353 reply

    A lot of less scrupulous crawlers just seem to imitate the big ones. I feel a lot of people make assumptions because the user agent has to be true, right?

    My fave method is still just to have bait info in robots.txt that gzip bombs and autoblocks all further requests from them. Was real easy to configure in Caddy and tends to catch the worst offenders.

    Not excusing the bot behaviours but if a few bots blindly take down your site, then an intentionally malicious offender would have a field day.

    • By horsawlarway 2025-05-3120:049 reply

      Your last comment feels pretty apt.

      Maybe I'm just a different generation than the folks writing these blog posts, but I really don't understand the fixation on such low resource usage.

      It's like watching a grandparent freak out over not turning off an LED light or seeing them drive 15 miles to save 5c/gallon on gas.

      20 requests per second is just... Nothing.

      Even if you're dynamically generating them all (and seriously... Why? Time would have been so much better spent fixing that with some caching than this effort) it's just not much demand.

      I get the "fuck the bots" style posts are popular in the Zeitgeist at the moment, but this is hardly novel.

      There are a lot more productive ways to handle this that waste a lot less of your time.

      • By whoisyc 2025-06-011:353 reply

        1. I fear you may be underestimating the volume of bot traffic websites are now receiving. I recommend reading this article to get an idea of the scale of the problem: https://thelibre.news/foss-infrastructure-is-under-attack-by...

        2. Not all requests are created equal. 20 requests a second for the same static HTML file? No problem. But if you have, say, a release page for an open source project with binary download links for all past versions for multiple platforms, each one being a multi megabyte blob, and a scraper starts hitting these links, you will run into bandwidth problems very quickly, unless you live in a utopia where bandwidth is free.

        3. You are underestimating the difficulty of caching dynamic pages. Cache invalidation is hard, they say. One notably problematic example is Git blames. So far I am not aware of any existing solution for caching blames, and jury rigging your own will likely not be any easier than the “solution” explored in the TFA.

        • By hartator 2025-06-014:371 reply

          > 2. Not all requests are created equal. 20 requests a second for the same static HTML file? No problem. But if you have, say, a release page for an open source project with binary download links for all past versions for multiple platforms, each one being a multi megabyte blob, and a scraper starts hitting these links, you will run into bandwidth problems very quickly, unless you live in a utopia where bandwidth is free.

          All of this is (and should) cached on a cdn. You can go 1000 QPS on this in that config.

          • By tdeck 2025-06-015:54

            I think a person should be able to set up a small website on a niche topic without ever having to set up a CDN. Until very recently this was the case, so it's sad to see that simplicity go away purely to satisfy the data needs of shady scrapers.

        • By busymom0 2025-06-013:262 reply

          Shouldn't such big blogs be put on something like CloudFlare R2 or BackBlaze or even S3 with their caching in front? Instead of having your server handle such file downloads.

          • By phantomathkg 2025-06-014:27

            Cache also cost money as well. Nothing is free.

        • By charcircuit 2025-06-013:281 reply

          >you live in a utopia where bandwidth is free.

          It's called peering agreements and they are very common. There's a reason social media and sites like YouTube, Twitch, TikTok don't immediately go out of business. The bandwidth is free for most users.

          • By chii 2025-06-017:441 reply

            there's only a handful of entities in the world that are capable of peering. Most people have to pay for their bandwidth.

            • By charcircuit 2025-06-018:12

              It can be done by whoever is providing you internet connectivity. Not everywhere adds extra charges for bandwith.

      • By spookie 2025-05-3121:381 reply

        A friend of mine had over 1000 requests/sec on his Gitea at peaks. Also, you aren't taking into account some of us don't have a "server", just some shitbox computer in the basement.

        This isn't about mere dozen requests. It gets pretty bad. It also slows down his life.

        • By jdboyd 2025-06-024:28

          The "shitbox computer in the basement" is something I would call a server. I mean, it is more capable than most VPSs (except in upload speed to the Internet).

      • By vladvasiliu 2025-05-3120:46

        I sympathize with the general gist of your post, but I've seen many a bot generate more traffic than legitimate users on our site.

        Never had any actual performance issue, but I can see why a site that expects generally a very low traffic rate might freak out. Could they better optimize their sites? Probably, I know ours sucks big time. But in the era of autoscaling workloads on someone else's computer, a misconfigured site could rack up a big ass bill.

      • By eGQjxkKF6fif 2025-05-3120:542 reply

        It's not fuck the bots, it's fuck the bot owners for using the websites as they want, and not at minimum, asking. Like 'hey cool if I use this tool to interact with your site for this and that reason?'

        No, they just do it. So that can scrape data, which at this point in time for AI which has hit the cap on what it can consume knowledge wise, scrapes it because live updates and new information is most valuable to them.

        So they will find tricky, evil ways to hammer resources that we as site operators own; even minimally to use site data to their profit, their success, their benefits while blatantly saying 'screw you' as they ignore robots.txt or pretend to be legitimate users.

        There's a digital battle field going on. Clients are coming in as real users using IP lists like from https://infatica.io/

        A writeup posted to HN about it

        https://jan.wildeboer.net/2025/04/Web-is-Broken-Botnet-Part-...

        A system and site operator has every right to build the tools they want to protect their systems, data, and have a user experience that benefits their audiences.

        Your points are valid and make sense, but; it's not about that. It's about valuing authentic works, intellectual properties, and some dweeb that wants to steal it doesn't get to just run their bots against resources at others detriments, and their benefits.

        • By eadmund 2025-06-011:121 reply

          > Like 'hey cool if I use this tool to interact with your site for this and that reason?'

          They do ask: they make an HTTP request. How the server responds to that request is up to the owner. As in the article, the owner can decide to respond to that request however he likes.

          I think that a big part of the issue is that software is not well-written. If you think about it, even the bots constantly requesting tarballs for git commits doesn’t have to destroy the experience of using the system for logged-in users. One can easily imagine software which prioritises handling requests for authorised users ahead of those for anonymous ones. One can easily image software which rejects incoming anonymous requests when it is saturated. But that’s hard to write, and our current networks, operating systems, languages and frameworks make that more difficult than it has to be.

          • By const_cast 2025-06-011:19

            Kind of, but they lie in the HTTP request - their user agent isn't true, they don't disclose they're a bot, they try to replicate other traffic as a disguise, they use many different IPs so they can't easily be blocked, etc.

            It's kind of like me asking to borrow your car to go to work and then I take your car and ship it overseas.

        • By polotics 2025-06-016:281 reply

          oh yeah I was horrified recently starting up a "smart TV" and going through the installable apps to find a lot of repackaged youtube contents, even from creators I like, eg. a chess channel. The app just provides the same content as the youtube channel does but at the bottom of the long free-to-use license agreement there is a weirdly worded clause that says you grant the app the right to act as a proxy for partner traffic... So many smart TV users are unwittingly providing residential IP's for the app developer to rent out.

          • By eGQjxkKF6fif 2025-06-0122:031 reply

            Yeah, it's a disgrace. 'bUt YoU AgReeD tO iT So I HaVe The RIGht To Do ThIS' it's just cyber warfare.

            Plain and simple.

            • By polotics 2025-06-047:301 reply

              Sorry I had forgotten who it was. Now time to name and shame: the culprit calls itself https://brightdata.com/ Also LG relying on the developer's own disclosure for what they call "Data Sefety" is really poor: "There is no relevant information provided by the developer" is all the app reports in the LG app store... Also no way to rate or report the app, I only found a mention that I shouldreport this to lgappsreport@lge.com

              • By polotics 2025-06-1010:27

                So that there's a trace on the Internet on how seriously LG takes the social engineering going on with oblivious Smart-TV users to enable creation of residential-proxy armies, I did report the BrightData apps to LG at the right email address, and received this useless reply:

                Apps Support 02:49 (9 hours ago) to me

                Dear valued user,

                Hello. This is your LG webOS "Report an App" Service Representative

                Apps (LG Store, Below: Apps) is an open-market service that enhances the user experience in LG smart media products. Customers can download and use apps in a variety of categories, including education, entertainment, life, news, etc.

                With Apps, customers can download the apps of their choice using the app recommendations, search, or browse features.

                Apps that are available in Apps are registered after going through a verification according to LG's process.

                And the registered app will be applied on the terms of conditions form the app supplier’s own policy.

                If you have any further inquiries regarding the inconvenience, please contact app supplier Customer Support E-Mail

                - sdk@brightdata.com

                Thank you for using LG webOS, and we will continue to strive for better services for our customers.

                Thank you.

                LG webOS "Report an App" Service Representative

      • By nickpsecurity 2025-05-3122:49

        Some of us have little money or optimized for something else. I spent a good chunk of this and last year with hardly any groceries. So, even $30 a month in hosting and CDN costs was large.

        Another situation is an expensive resource. This might be bandwidth hogs, CPU heavy, or higher licensing per CPU's in databases. Some people's sites or services dont scale well or hit their budget limits fast.

        In a high-security setup, those boxes usually have limited performance. It comes from the runtime checks, context switches, or embedded/soft processors. If no timing channels, one might have to disable shared caches, too.

        Those systems run slow enough that whatever is in front usually needs to throttle the traffic. We'd want no wasted traffic given their cost ranges from $2,000 / chip (FPGA) to six digits a system (eg XTS-500 w/ STOP OS). One could say the same if it was a custom or open-source chip, like Plasma MIPS.

        Many people might be in the poor category. A significant number are in the low-scalability category. The others are rare but significant.

      • By rozap 2025-06-015:29

        He said in the article there were requests that made a tarball of an entire repository, for each sha the git tree. No matter how you slice it that's pretty ugly.

        Sure, you could require any number of (user hostile) interactions (logins, captchas, etc) to do expensive operations like that, but now the usability is compromised which sucks.

      • By Dylan16807 2025-06-014:401 reply

        > 20 requests per second is just... Nothing.

        Unless you're running mediawiki.

        Are there easy settings I should be messing with for that?

      • By rnmg 2025-05-3120:332 reply

        Can you expand on the better ways of handling bots? Genuinely curious.

        • By layer8 2025-05-3122:411 reply

          He’s saying that a modern web server setup should be able to handle the traffic without any bot-specific handling. Personally, I don’t follow.

          • By haiku2077 2025-06-010:37

            My server can handle it, my ISP cannot!

        • By horsawlarway 2025-06-0415:251 reply

          My biggest recommendation is to just get familiar with the caching constructs that are available. I understand folks think CDNs are complicated and expensive, but they're honestly incredibly cheap and relatively easy to use.

          99.9% of the time, just showing static content with a good cache-control header will solve the issue. If you have a restrictive IP provider, use a CDN to do it for you for cheap.

          The more involved recommendation is trimming out features of hosted apps that aren't all that useful and are causing problems. A simple example of what I mean...

          ---

          The author here is noting that his Gitea instance is seeing huge load from 20r/s, which just isn't reasonable (I actually host a Gitea instance myself and I know it can handle 10 times this traffic, even when running on a raspberry pi). So why is his failing?

          Well - it sounds like he's letting bots hit every url of a public instance. Not the choice I'd make, but also not unreasonable, hosting public things is fine.

          Buuut - it also sounds like he's left the "Download archive" button enabled on the instance.

          That's not a good call. It's a feature that's used very rarely by real humans, but is tripmine for any bot scanning the site to trigger high load and network traffic.

          Want a 5 second solution to the problem? Set `DISABLE_DOWNLOAD_SOURCE_ARCHIVES` in the Gitea config (see https://docs.gitea.com/administration/config-cheat-sheet). Problem solved. Bots are no longer an issue. They are welcome to scan and not cause problems anymore.

          What if your app doesn't have an easy config option? Nginx will happily help, with far less complexity and frustrations than trying to blindly play whack-a-mole with IP addresses (this is terrible and does not work... period).

          Configure nginx with a specific path rule that either blocks requests to that path entirely, or places it behind basic auth (it doesn't need to be clever, and you don't even need to make it secret - hell, put the basic auth user/pass directly in the repo's readme, or show it on your site.) The bots won't hit it anymore.

          ---

          So I guess what I'm saying is really - if you're finding that bots on your sites are causing a problem, consider just treating them like users and solving the problem, instead of going mad and trying to remove the bots.

          Be constructive instead of destructive.

          Ultimately, a lot of those bots are scanning that content to show to users, many are even doing it directly at the request of the user currently interacting with them.

          Falling into the trap of the "fuck the bots" mindset is a sure way to lose (although it can feel good emotionally). It's not understanding the problem, it's not solving the problem, and it's limiting access to a thing you intentionally made public. Users are on the other end of those bots.

          He's choosing to play the "everyone loses" square of the prisoner's dilemma.

          • By lingo334 2025-06-0511:49

            > My biggest recommendation is to just get familiar with the caching constructs that are available.

            That doesn't help. They request seemingly random resources in seemingly random order. While they do often hit some links multiple times it's usually too few and far between for caching to be of any help.

            As to the rest, "Just turn features off. No one uses them, trust me bro!"

    • By ThePinion 2025-05-3121:181 reply

      Can you further elaborate on this robots.txt? I was under the impression most AI just completely ignores anything to do with robots.txt so you may just be hitting the ones that are maybe attempting to obey it?

      I'm not against the idea like others here seem to be, I'm more curious about implementing it without harming good actors.

      • By kevindamm 2025-05-3123:28

        If your robots.txt has a line specifying, for example

           Disallow: /private/beware.zip
        
        and you have no links to that file from elsewhere on the site, then if you get a request for that URL it was because someone/something read the robots.txt and explicitly violated it, then you can send it a zipbomb or ban the source IP or whatever.

        But in my experience it isn't the robots.txt violations being so flagrant (half the requests are probably humans who were curious what you're hiding, and most bots written specifically for LLMs don't even check the robots.txt). The real abuse is the crawler that hits an expensive and frequently-changing URL more often than reasonable, and the card-testers hitting payment endpoints, sometimes with excessive chargebacks. And port-scanners, but those are a minor annoyance if your network setup is decent. And email spoofers who bring your server's reputation down if you don't set things up correctly early on and whenever changing hosts.

    • By p3rls 2025-06-017:331 reply

      I run one of the largest wikis in my niche and convincing the other people on my dev team to use gzip bombs as a defensive measure has been impossible-- they are convinced that it is a dangerous liability (EU-brained) and isn't worth pursuing.

      Do you guys really use these things on real public-facing websites?

      • By pdimitar 2025-06-019:02

        Very curious if a bad actor can sue you if you serve them a zip bomb from an EU server. Got any links?

  • By ThomW 2025-05-3117:393 reply

    It ticks me off that bots no longer respect robots.txt files at all. The authors of these things are complete assholes. If you’re one of them, gfy.

    • By thowaway7564902 2025-06-015:161 reply

      You might as well say gfy to anyone using chat bots, search engines and price comparison tools. They're the one's who financially incentivize the scrapers.

      • By mandmandam 2025-06-0111:251 reply

        That doesn't logic.

        Giving someone a "financial incentive" to do something (by gasp using a search engine, or comparing prices) does not make that thing ethical or cool to do in and of itself.

        I wonder where you ever got the idea that it does.

        • By braiamp 2025-06-0312:491 reply

          Because there's a market to serve, that you refuse to serve, so the stop gap solution is for third parties to acquire the liabilities and risks for compensation.

          • By mandmandam 2025-06-0316:00

            Financial incentive ≠ moral justification.

            You're shifting the frame, to one where morality doesn't come into it. You're asserting some kind of market inevitability, which is probably the same sort of rationalization arms and people traffickers use to sleep at night.

    • By hinkley 2025-06-0118:41

      Honey potting the robots file is handy for the ones who don’t just ignore it but go looking for trouble.

    • By cyanydeez 2025-05-3120:401 reply

      [flagged]

  • By immibis 2025-05-3119:05

    I consider the disk space issue a bug in Gitea. When someone downloads a zip, it should be able to stream the zip to the client, but instead it builds the zip in temporary space, serves it to the client, and doesn't delete it.

    I solved it by marking that directory read-only. Zip downloads, obviously, won't work. If someone really wants one, they can check out the repository and make it theirself.

    If I really cared, of course I'd fix the bug or make sure there's a way to disable the feature properly or only enable it for logged-in users.

    Also I server-side redirect certain user-agents to https://git.immibis.com/gptblock.html . This isn't because they waste resources any more but just because I don't like them, what they're doing is worthless anyway, and because I can. If they really want the data of the Git repository they can clone the Git repository instead of scraping it in a stupid way. That was always allowed.

    8 requests per second isn't that much unless each request triggers intensive processing and indeed it wasn't a load on my little VPS other than the disk space bug. I blocked them because they're stupid, not because they're a DoS.

HackerNews