After some encouraging conversations following the TEI conference in Paderborn, I readied myself to offer consulting and training for digital humanists and librarians as a side business.
- XML for Digital Humanists (XML, XSLT, ISO Schematron, RELAX NG, TEI ODD)
- Processing and publishing library data (MARC, PICA, MODS, LIDO et al.)
- Building and maintaining XML-based workflows
- Developing custom solutions (modules) for VuFind
-based discovery systems
- Salvaging DH projects that are in the danger of going awry
Feel free to get in touch!
The Index: Issue #103
The Index #103 is here, featuring conversion optimisation, CSS transforms, Eleventy, Safari's lethargy and how an online publisher is dealing with the bad times.
"This is open-source software written by hobbyists, maintained by a single volunteer, badly tested, written in a memory-unsafe language and full of security bugs. It is foolish to use this software to process untrusted data. As such, we treat security issues like any other bug. Each security report we receive will be made public immediately and won't be prioritized."
https://gitlab.gnome.org/GNOME/libxml2/-/issues/913
Since I've left my last job, I've been thinking about the guy who used me as an alternative to ChatGPT whenever he hit a problem that he couldn't vibe code the answer out of at work.
He basically rotted his own brain by compulsively using ChatGPT in lieu of actually thinking with most any of the projects he was working on. Instead of taking the time to read through code in our framework, look up documentation, or do any sort of debugging, he instead just begged and pleaded with ChatGPT to try and get somewhere because "it was faster." Basically just really hammering his brain with the Programmer's Slot Machine. (@davidgerard wrote a really good article here about this specific gambling addiction angle here. I highly highly recommend reading/watching the corresponding YouTube video:
https://pivot-to-ai.com/2025/06/05/generative-ai-runs-on-gambling-addiction-just-one-more-prompt-bro/ )
Back to the story; When that wasn't working, which was a significant portion of the time, he'd then just turn and use me as a "more informed alternative" to ChatGPT.
I worked fully remote and the majority of our interactions was via a Teams chat. which apparently crossed some wires in his monkey brain and made him start just... Basically verbally barraging me like he would with the company ChatGPT instance. No thoughts at all, just an immediate process of:
- Ask vague question
- Get guess for an answer with a request for more details
- Try applying the guess blindly without thinking if it's applicable at all
- Have it not work and just report back that it didn't work.
- No follow-up details, no further explanation of what was going on or what he's trying to do. Nothing added past the original vague situation
- If lucky, I might get a screenshot of part of the error, meticulously sliced before it gave something useful in the output because he stopped reading error output to things and made no attempt to understand it. (Why? ChatGPT can do that part!)
- Rinse and Repeat until I get fed up and get into a call with him
- Fix the thing in less than a minute, pointing out that he should have been able to tell what was wrong almost immediately if he actually dropped a break-point and debugged the code at *literally any point* along the way
- Fuck off immediately after getting his fix, no thank you or anything
- start the process anew the following day when he vibe coded himself into a corner all over again
I literally had to go to leadership and make them have a talk with him and get him to leave me the fuck alone at work, after repeated attempts to establish boundaries about it, due to how much time it sucked out of me being able to work on other projects. Effectively just doubling up my work and slamming me with burn out right at the start of the year for absolutely no reason other than his belligerent insistence to just Not Do His Job Without His Hand Being Held By A Chat Window.
It rapidly went from a "He sometimes asks informed questions that I can answer and help him with. I enjoy working with him" to "The dude isn't even trying in the slightest and is now basically offloading his work onto me because he broke his capacity to actually do work independently of an external chat window. I fucking hate him and I hope he gets in a car wreck so I can get a break from the bleakness of dealing with him every goddamn morning"
ChatGPT has basically just been an absolute blight for me since it's inception. Going from the team being generally pro-crypto to intensely pro-genAI/LLM because their favorite scammers (er.. I mean YouTubers) had them hooked on a fantasy of some day making it Big by jumping from one Hype cycle to the next. I sincerely was very close to just finding an entirely different career path altogether because of just how incredibly shitty it was working with that team on just about anything, but lacking the job experience on the resume to land someplace else.
Nobody wanted to be an actual expert, nobody really wanted to learn anything. They had their degree and ChatGPT, which means they learned all they ever will need. ...While working in an industry that tends to re-invent itself every half decade or so while half-assing solutions with an outsourced bullshit generator. 🫠
All in the name of "Well it got me from point A to point B faster." and leaving it at that, despite taking significantly longer than they should have from the get go over it.
I've seen and lived what an AI Fueled future looks like:
Mediocre men harassing their talented and likely autistic peers until their peers just up and fuckin leave to a different organization out of frustration and exhaustion.
I think down the road, we'll be able to measure the negative impact using LLMs has on people's cognitive faculties by comparing it to horse kicks to the head, and only be exaggerating it by a little bit.
"AI bots that scrape the internet for training data are hammering the servers of libraries, archives, museums, and galleries, and are in some cases knocking their collections offline"
#AI is ruining our digital world
(Original title: AI Scraping Bots Are Breaking Open Libraries, Archives, and Museums)
https://www.404media.co/ai-scraping-bots-are-breaking-open-libraries-archives-and-museums/
Another #FairPhone success: USB C connector broke on Friday. Spare parts ca. 20 EUR including shipping, ~10 minutes to replace connector.
Well... Und nun sind die KI-Scraper im nächsten System gelandet. Immerhin haben wir etwas Routine im Feuerlöschen entwickelt.
Wir haben vor einiger Zeit einen Vortrag gehalten auf einem Workshop, in dessen Nachgang die Vorträge zu Papern ausformuliert werden sollten. Leider haben auch nach 2 Review-Runden die Reviewer auf Änderungen bestanden, die wir für unnötig halten und das Paper daher zurückgezogen.
Jetzt gibt's das einfach auf Zenodo. Falls es also wen interessiert:
Closing the Gap in non-Latin script data: An example of #git based pragmatism
Zuerst: Klarstellung, dass mit der Verwendung des Begriffs #KI hier vordergründig #LLMs bzw. generative KI adressiert werden. Deren negative Auswirkungen (Enormer Ressourcenverbrauch, Ausbeutung von Clickworker*innen, Vortäuschung Open Source zu sein etc.) geben den Organisator*innen der Konferenz so sehr zu bedenken, dass sie folgende Konsequenzen ziehen:
Intern: Keine Nutzung von LLMs zur Ausrichtung der Konferenz oder bei der Sichtung der Einreichungen. Sollte andere KI (wie z. B. OCR oder Audiotranskription) eingesetzt worden sein, wird das offengelegt und begründet.
Einreichungen: Die Konferenz will keine weitere unreflektierte Werbeplattform für LLMs sein. Die ungekennzeichnete Nutzung von LLMs bei der Ausarbeitung von Beiträgen wird als Plagiarismus gewertet. Von der Nutzung des Outputs von generativer KI wird abgeraten, selbst wenn diese gekennzeichtet sein sollte. Ausgenommen sind Beiträge, die sich mit KI-Forschung an sich beschäftigen und mit der Themensetzung der Konferenz übereinstimmen, z. B. die Auseinandersetzung mit deren Biases, die ressourcenschonende Nutzung oder die Vorstellung von Modellen, die auf einvernehmlich erstellten Trainingsdaten basieren und offline betreibbar sind.
Allgemein wird allen Beitragenden nahegelegt, darüber nachzudenken, ob jemandem durch die vorgestellte Software Schaden zugefügt wird, egal ob diese mit KI zu tun hat oder nicht. Jede Software verinnerlicht die Biases ihrer Erschaffer*innen, das muss eingeräumt und reflektiert werden.
Jede Konferenz in diesen Zeiten braucht ein solches ethisches Commitment.
#GenerativeKI #ChatGPT #Konferenz
The slides of my MarkupUK presentations are already online:
ISO Schematron Conformance Tests
Bad news: Lost my UK-to-EU power adapter at the conference. Good news: Turns out I can charge my phone with my notebook. Who'd knew?
My obligatory #Eduroam post: Eduroam is an amazing infrastructure. Go to London to visit conference, WiFi just works.
OUP hasn’t suddenly turned evil, this has been going on for years now.
Here’s another reason why not to publish in DSH.
Call for Papers: "VuFind® Berlin 2025: What’s in it for us?" (29.-30. September). Bis zum 1. Juli 2025 könnt ihr eure Vorschläge zu praxisnahen Umsetzungen, institutionellen Strategien und innovativen Nutzungsszenarien einreichen. Alle Infos 👉 https://blog.sbb.berlin/vufind2025/
Hey, listen. #Codeberg is being spammed with LLM bots right now. They started like an hour ago. Each bot is opening up one issue in a random tracker every minute.
I have no idea what the spread is, but each individual bot has like seven pages full of issues now. PieFed and FEP are both starting to see issues piling up because of this.
RE: https://social.wedistribute.org/objects/771856be-078f-4022-9971-7fd4b283426d
Migrating (some things) to Codeberg. It’s the switching costs that get you. #XProc #XMLCalabash
#Moon, deer, Miyajima, by Ohara Koson (#OharaKoson), ca. 1910.
#shinhanga
Wenn man als mittelalte, momentan eher unsportliche, dicke Frau mit ab und an Rücken, wenig Zeit und Schreibtisch-Alltag über ein Fitnessstudio-Abo nachdenkt und in der Nähe Kieser, NonStopGym und PureGym sind, was von denen wäre denn empfehlenswert und warum?
I find it interesting to call on researchers to exercise caution when using US platforms for research data and to urge them to migrate their data to non-commercial European infrastructures—when the university depends on Microsoft, who have just demonstrated that this dependence is not merely a theoretical risk