Automattic for the people, not the AI
This post is migrated from the old Wordpress blog. Some things may be broken.
Tl;dr: Automattic, the owner of of Tumblr and WordPress.com, is negotiating with Midjourney and OpenAI to sell them AI training data which would be scraped from platform users’ posts unless they opt out. It’s an ethically dubious proposition with poor messaging and not all users get treated equally, and the companies Automattic is in talks with aren’t exactly committed to an open or ethical approach to running their operations. It’s a move that may drive me back to self hosting.
I decided I’d had enough of self hosting my blog this year. At the new year I shut down my Linode VPS and decided to let WordPress.com run things. The idea was that I wanted a blog to write on again, not just one to do weird server and DNS things with. Overseeing some hosting infrastructure is part of my day job. There were some tradeoffs. At my current subscription level (Explorer level) I can’t really modify this theme in the way I’d want to, or turn on some elements I’d want to include. I can’t add all the plugins I usually would. And I haven’t made up my mind about the different newsletter and monetisation wingdings yet. The would require me to more regularly commit to updating this site, and I’m pretty scatty on that front so far. But I don’t need to think of security headers, backups, updates and patches, SSL stuff, traffic spikes, or what integrating the site with something like ActivityPub will do to the bandwidth, etc. It’s okay. I can pretty much do as much blogging as I want or don’t want to do, and it isn’t bad for my current hobby horse level of effort.
But one thing I’ve noticed about premium hosted WordPress that’s a departure from free-to-be-you-and-me open source WordPress is how much generative AI has been chucked into the CMS. Consider that first paragraph at top. I wrote the first sentence and then asked WordPress’ AI Assistant to expand on it. Here’s that output:
Pretty bland stuff. One quality that generated AI text has is that it’s often got no flavour, no personality or, you know, actual point of view. That’s because it’s a set of characters put in order based on a formula of likelihood. The machine doesn’t know it’s saying anything. It’s using math to choose patterns. Everything it produces reads like a PEEL formula essay of the kind they force kids to use in British secondary schools.
WordPress.com has this generative AI stuff in the CMS. It also has it in the post meta fields, it can guess what kind of except you want for your post and it can suggest some tagging when you decide to publish something. There’s no real way to disable these features, which I found kind of annoying, even if I don’t really use them. It’s clutter in a WYSIWYG that I don’t need, but in the end not a huge problem.
But now WordPress is taking it a disturbing step further. As first reported by 404 Media ( and discussed on Lifehacker without a paywall), Automattic plans to sell to Midjourney and OpenAI the user data from its Tumblr and WordPress platforms. So, if you use either of these, this means your data, inclusive of all the published posts, private posts, posts that were from accounts that have since been suspended or deleted, and including any explicit posts, so if you’ve got a site that has closed off content for your subscribers, it seems that Midjourney and OpenAI gets these as well and they aren’t paying you for them.
A few general thoughts…
As a WordPress.com user, I felt it was weird to read about this in the news before, you know, a communication from WordPress itself. And by weird I mean wrong. As in WordPress was most definitely in the wrong. I don’t care if its ToS quietly updated somewhere. I don’t care if some mass communique was dispatched somewhere into the cesspool of my unread emails from service providers. I have the WP app on my phone, the CMS on my browser. I have the flippin’ desktop app. For something like this, it could have dropped in a banner and push notifications anywhere.
It’s just a shitty move asking users to opt out of this instead of welcoming them to opt in if they choose. Let me repeat that: It’s a real shitty move, Automattic, to say that users need to opt out of your AI harvesting, instead of opting in. In a kind of Orwellian move on its blog, WordPress published an item titled “More Control Over the Content You Share.” Instead of announcing its data-sharing deal with Midjourney or OpenAI, covered in the news, the post says: “There are currently very few options for individual users to control how their content is used for AI training, and we want to change that. That’s why we’re launching a new tool that lets you opt out of sharing content from your public blogs with third parties, including AI platforms that use such content for training models.” Thanks… um… for that. But if I haven’t opted into it, then it would logically follow that I’m not down with it.
The above linked WordPress blog post says, “We already discourage AI crawlers from gathering content from WordPress.com and will continue to do so, save for those with which we partner.” That last part is the tricky bit. It also is kind of limiting and anti-choice. What if I want to share my content with an AI that is paying me directly? What if Automattic shared the proceeds of AI crawling with users? The what-if scenarios can continue until morale improves. Mistakes already seem to be likely.
“404 Media’s report included internal Auttomatic employee messages describing how engineers were tasked with compiling posts from 2014 to 2023, but had made some mistakes, according to 404’s reporting. The employees included posts from deleted or suspended blogs, private posts on public blogs, and private answers from the “Ask” function, the report said.”
Business Insider
Graham Cluley notes on his blog that Automattic is also playing favourites with regards to whose content it will take extra pains to protect. You can apparently buy protection: “BTW, if you’re a WordPress VIP customer (in other words, if you pay them the big bucks) then Automattic wants to reassure you that they’re not going to include you with the common hoi polloi,” Cluley posted. “Nick Genert, CEO of WordPress VIP at Automattic published a rather frantic post to his customers, clearly realising they were in danger of being mightily pissed off.”
In the above mentioned WP VIP Lobby blog, Garnet writes: “You may see news reports of our parent company, Automattic, striking deals to sell data from WordPress.com and Tumblr to OpenAI and Midjourney. The original report appeared in 404 Media and was picked up by The Verge. … I want to assure WordPress VIP customers that your data has not been shared as part of any deal that Automattic may have negotiated and we will never share your data without explicit consent.”
“So, that’s alright then. One rule for them, something else for the rest of us,” writes Cluley. Yeah, I’d agree with that assessment as someone on the prols tier. I’m often dubious about that opt-out boxes really do, whether it’s those annoying GDPR cookie setting pop-ups or when you’re online shopping and you try to carefully select “no marketing, plz” options when checking out. I tend to think things will happen as they will regardless, and I sometimes wonder how well these have been technically implemented to avoid the opposite of what’s being selected from happening. Automattic warns that you can only put requests on your site for AI companies (or anyone) to not harvest your content. If it’s published, there’s not really a technical way to prevent it. But at the VIP level there sure are more tools from WordPress to help prevent it.
All of this was horribly managed by Automattic. It became a policy update as a form of damage control. The ethical thing to do would be to automatically enroll everyone as opt-out and let them decide to opt in. Maybe chuck in a discount on their subscription fee if they do. Automattic — and its flagship product WordPress — aims to present itself as a good player in the open source software community. And the open source products do live up to that, but it’s corporate side seems to be starting to show the signs of “don’t be evil” in the way that Google uses that term.
This isn’t to mean I’m a luddite on generative LLM based AI. I find all kinds of uses for it. I’ve described the plots of films or books I couldn’t remember the titles for and OpenAI and Google’s Bard have reminded me what they are. When writing and groping for the right word I’ve used it as a quick thesaurus, sometimes with a few lines of extra context to help pick the best one. I also enjoy that it does seem to know grammar, particularly the dictates from Strunk & White’s Elements of Style. But that doesn’t mean these companies don’t have dodgy practices, or that there are still huge ethical problems with how they consume creators’ works, and are themselves opaque. If Automattic was working with some of the more open and ethical AI projects or frameworks it may be a different story.
Sort of like safety tools, content control shouldn’t be something that’s based on economic power. People own their own content and can do what they want with it. I like WordPress as a content management system and am a fan of a lot of Automattic products. WP is a cornerstone piece of technology in both my work and non-work life. But as far as premium services go, this was a disappointing road for Automattic to head down, and not really even a necessary one. One would hope that Automattic would invest in ways to resist the enshitification that’s taking down so many other companies.
Finding a content platform that is both usable and extendable and yet resistant to douchebag business decisions is getting trickier. I left DIY hosting to catch a break in my off-hours time, but essentially the hassle hasn’t disappeared, it’s just shifted from the command line to the ToS. Maybe next time I move to Ghost. Or self hosting… god forbid.