SEO

May 29, 2024

Google lies

If you are interested in any way in the worlds of SEO -- black, white, and every shade in between -- you are going to be aware of the massive leak of what looks like internal documentation about search. For an SEO, this is almost Holy Grail level stuff. Although it doesn’t detail how or even which of the factors that Google is collecting data on determines the rank of an individual page, it’s safe to say that it’s all in here, somewhere.

A lot of the initial focus of posts about the leak has been on the fact it shows that many of the factors Google has consistently claimed not to use for ranking signals are, in fact, being collected -- and so are likely to be ranking signals. This includes everything from individual author authority through to the overall authority of a site.

So... Google has been lying. But the reality is I don’t know a single SEO practitioner in the publishing space who believed this stuff anyway. We all knew that making it obvious that a writer had authority in a particular topic would gain you ranking. We all knew that sites themselves had some kind of measure of authority. We all knew that freshness matters (who amongst us has not gamed updates to increase the freshness of content?)

Everyone who has worked around the Google space has known which of the company’s pronouncements to take at face value and which to look at with a raised eyebrow.

I will leave it to others to pick over the bones of this and work out what matters and what doesn’t. While I still keep an eye on the SEO world, part of me thinks that the era of publisher SEO is drawing to a close, as traffic from search inevitably declines and Google turns from the world’s biggest referrer into the global answers machine.

Although its initial foray into AI answers on the page has run headlong into some issues, the direction of travel is clear. I would strongly advise anyone who is spending too much time snickering about dumb answers not to be too complacent. AI probably isn’t going to end up coming for everyone’s jobs, but sooner or later it is going to come for your traffic.

April 27, 2024

Why do people try and forbid linking?

Websites That Forbid Other Websites From Linking to Them – Pixel Envy:

Some of these are even more bizarre than a blanket link ban, like Which? limiting people to a maximum of ten links to their site per webpage. Why would anyone want to prevent links?

I can actually answer this one: it’s an attempt (albeit pointless) to prevent sites linking in a way which Google will define as spammy. Low-quality backlinks used to be a bit of an SEO nightmare, and you used to have to disavow them as toxic in Google Search Console. More than ten links to the same site from a single page is classic link spam, hence Which?’s attempt to stop it.

March 15, 2024

Ten Blue Links, AI is bad now edition

First up, apologies that there's been no long form post this week. I've had some family stuff which had to take priority over writing. Normal service should be resumed from next week.

And now on to the good stuff…

1. The last refuge of the desperate media

Ahh, low rent native ads — the kind that are designed to fool people into clicking by appearing to be genuine user or editorial posts. Always a sign that a company is desperate for revenue, any kind of revenue, and never mind the longer-term implications on quality. Now, why would Reddit want to do that?

2. Repeat after me: AI is not a thing

More specifically, AI is not a single technology, and what we talk about in the media as “AI” is, in fact, quite a limited, relatively new tool coming out of AI research — the Large Language Model, or LLM. Why does this matter? Because (how shall I put this?) less technically educated executives are likely to read articles like this one, about the successful use of AI in the oil industry, and think that they need to jump on the AI bandwagon by adopting LLMs. These are two very different things: Robowell, for example, is a machine learning system designed to automate specific tasks. It learns to do better as it goes along — something that LLMs don't do.

3. Tesla bubble go pop

The notion that Tesla was worth more than the rest of the auto industry combined was always bubble insanity, and it looks like the Tesla bubble is finally bursting. And this, of course, is why Musk is grabbing on to AI and why he proposed OpenAI merge into Tesla: AI is the current marker for a stock to end up priced based on an imaginary future rather than its current performance. Musk needs to inflate Tesla again, and just being an EV maker won’t now do that.

4. This is fine

I'm almost boring myself now whenever I post anything about the era of mass search traffic for publishers drawing to a close. But then someone comes up with a new piece of researching showing an impact of between 25-60% traffic loss because of Google's forthcoming Search Generative Experience. The fact that Google effectively does not allow publishers to opt out of SGE — you have to opt out of Googlebot completely to do so — should be an indication that Google has no intention of following the likes of OpenAI in paying to license publisher content, too. And I think the SGE is just the first part of a one-two punch to publisher guts: computers and how we access information is going to become more conversational and less focused on searching and going to web pages. As that happens, the entire industry will change, and it could happen faster than we think.

5. Feudal security

I often link to Cory Doctorow's posts, and it's not just because he's a friend -- it's because a lot of the things that he's been talking about for years are beginning to be a lot more obvious, even to stick-in-the-muds like me. This piece starts with a concept that I have struggled to articulate -- feudal security — and sprints on from there.

6. LLMs are terrible writers

Will Pooley has written a terrific piece from the perspective of an academic on why LLMs just don't write in a way which sounds human. They don't interrogate or question terms (because they have no concepts, so can't), there is no individual voice, they make no daring or original claims based on the evidence, and much more. My particular favourite — and one I have encountered a lot — is that LLMs love heavy sectionisation and simple adore bullet points. I've got LLMs to write stuff before, specifically telling them not to use bullet points, and they have used them anyway. As Tim succinctly put it in a post on Bluesky, LLMs create content which is “uniformly terrible, and terribly uniform”.

7. Craig Wright is not Satoshi Nakamoto

Craig Wright spent a lot of time claiming he was the pseudonymous creator of Bitcoin, and suing people on that basis. Finally, a court has ruled that he was lying. Whoever Nakamoto is/was, he's probably on an island somewhere drinking a piña colada.

8. Google updates, manually hits AI-generated sites

You might have noticed that Google did a big update in early March, finally responding to what everyone had been saying — that search had become dominated by rubbish for many search terms. Smarter people than me are still analysing the impact of that update, but one thing which stood out for me is there was a big chunk of manual actions to start. Manual actions are, as the name suggests, based on human review of a site, which means they are a kind of fallback when the algorithm isn't getting it right. And guess what the manual actions mostly targeted? AI content spam. All the sites that were whacked had some AI-generated content, and half of them were 90-100% created by AI. Of course, manual action is not a sustainable strategy to combat AI grey goo, but it should be a reminder to publishers that high levels of AI-generated content are not the promised land of cheap good content without those pesky writers. If you want to use it, do it properly.

9. The web is 35 years old, and Tim Berners-Lee is not thrilled

The web was meant to be a decentralised system. Instead, it's led to the kind of concentration of power and control that would have made the oligarchs of the past blush. That's just the starting point of Tim Berners-Lee's article marking the web's 35th anniversary, and he goes on to provide many good suggestions. I don't know if they are radical enough — but they are in the right direction.

10. A big tech diet

It's a long-standing journalistic cliché to try some kind of fad diet for a short period of time and write up the (usually hilarious) results, but in this "diet" Shubham Agarwal tried to drop products from big tech companies, and of course, it proved harder than you would think. Some things are pretty easy — swapping Gmail for Proton isn't hard (and Shubham missed out some tricks, like using forwards to redirect mail). But it's really difficult to avoid some products, like WhatsApp or LinkedIn, because there are few/no viable alternatives. That, of course, is just how the big tech companies like it because they long-ago gave up on the Steve Jobs mantra of making great products that people wanted to buy in favour of making mediocre products that people have no alternative to using.

February 29, 2024

The end of the line for Google

“Personally, I don’t want the perception in a few years to be, ‘Those old school web ranking types just got steamrolled and somehow never saw it comin’…’”
Google engineer Eric Lehman, from an internal email in 2018, titled “AI is a serious risk to our business”

I should, of course, have put a question mark at the end of the title of this, but I very much do not want to fall foul of my own law. And, of course, talking about the end of the line for Google as a company is like talking about “the end of the line for IBM” in 2000, or “the end of the line for Microsoft” in 2008. Whatever happens, Google has so much impetus behind it, so much revenue, that a quick collapse is about as likely as my beloved Derby County winning League One, Championship and Premier League in three consecutive years. It isn’t happening, much as I might dream.

This is one of the reasons I quipped that Google could see the $2.3billion that Axel Springer and other European media groups want for its alleged monopolisation of digital advertising as “just the cost of doing business.” It’s the equivalent of someone having to pay a £250 fine for speeding: annoying, but not the end of the world, and not actually that likely to keep you down to under 70mph in the future.

Google’s problems, though, do run deep. Other than, as my friend Cory Doctorow has noted, the 1.5 good products it invented itself (“a search engine and a Hotmail clone”), the most successful Google products are acquisitions. Android? Acquired. YouTube? Acquired? Adtech? Acquired. Even Chrome, which dominates web browsing in a way which many people (including me) find more than a little scary, was based on Apple’s WebKit rendering engine – which was, in turn, based on the open source KHTML.

The fact is, Google is incredibly bad at successfully bringing products to market, to such a degree that no one trusts them to do it and stick with it for long. It continually enters markets with fanfare, only to exit not long after.

Take social networking. You probably remember Google+ (2011–2019). You may even remember Orkut (2004–2014). Perhaps you know about Google Buzz (2010–2011). But do you remember Jaiku, an early Twitter competitor which Google bought – and buried? The resources of Google could have been used to accelerate Jaiku’s development and – perhaps – win the battle against Twitter and the nascent Facebook. Instead, the company took two years rebuilding Jaiku on top of Google’s App Engine, with no new features or marketing spend to support the product. Two years later, they killed it.

What Google is pretty good at is producing research. Its 2017 paper on transformers directly led to many of the large language model breakthroughs which OpenAI used to create ChatGPT. Failing to spot the potential for its research isn’t unknown in technology history, but really great companies don’t allow others to turn themselves into competitors worth $80 billion on the back of it.

And particularly not when those other companies create technology which directly threatens core businesses, in this case, Google’s “one good product” – search. The bad news for Google is that even in the middle of last year, research showed people using ChatGPT for search tasks performed just as well as using a traditional search engine, with one exception — fact checking tasks. That, of course, is a big exception, but ordinary people use search engines for a lot more than just checking facts.

What’s also notable about the same research is that ChatGPT levelled the playing field between different educational levels, giving better access to information for those who have lower educational achievement. That strikes at the heart of Google’s mission statement, which promotes its goal of “organis[ing] the world’s information and making it universally accessible and useful” (my italics). Search, as good as it is, has always required the user to adapt to it. Conversational interaction models, which ChatGPT follows (the clue is in the name), change that profoundly.

In The Innovator’s Dilemma, Clayton Christiansen talks about the difficulties that successful companies have in sustaining innovation. Established businesses, he notes, are excellent at optimising their existing products and processes to serve current customers (this is called “sustaining innovation”). However, they often struggle when faced with a “disruptive innovation” – a new technology or business model that creates a whole new market and customer segment.

One of the potential solutions to this which Christiansen looks at is structural: Creating smaller, independent units or spin-offs tasked with exploring the disruptive technology can allow them to operate outside the constraints of the main company. This, of course, is probably what Google intended to do when it changed its structure to create Alphabet, a group of companies of which Google itself is just one part.

The biggest problem with this putative solution is that if you do it well, innovation doesn’t necessarily flow to where it is most needed. Google’s search products needed to seize on the research made in 2017 and integrate it. It didn’t, and – worse still – no one saw this as a potential disruption of the core business. The blinkers were too firmly on.

Perhaps that’s changing. Notably, last year that Google moved all its AI efforts into a single group, Google DeepMind. The “Google” in its name is significant: previously DeepMind was a separate business within Alphabet (and, in true Google style, it was acquired rather than built in-house). Now, on the surface, it looks likely to focus more on continuing Google’s mission, which means disrupting the traditional ten blue links.

Can it succeed? I’m not optimistic (publishers, take note). What we have here is a company which is good at research, but not at exploiting it; whose history is of one good product and a good Hotmail clone; that has a terrible record of announcing, releasing, and killing products, often multiple efforts in different categories all of which fail; and which has failed to keep its core product – search – up to date.

Perhaps the real question isn’t whether Google has reached the end of the line, but how exactly it made it this far?

February 1, 2024

Adapting to the new reality of search

It’s obvious at this point that the landscape of search traffic for publishers is rapidly changing, and not generally for the better. Every SEO I know is complaining about the same patterns: Google results getting swamped by low-quality content; the rise of quick fire-and-forget AI-generated SEO farms, which can impact heavily on short-term traffic in any topic area; and user-generated content being overvalued by Google.

Or, to summarise it: quality content is not, currently, winning the battle for attention.

And then of course there is Google and others’ experiments with putting more answers to search queries on the results page. I’m on record as believing that a lot of traffic, especially for pages designed to answer specific queries, is going to go away as AI gets better at answering questions. Even for affiliate content, I think the appeal of answers that you can have a conversation with, so you get a completely custom answer to, say, what laptop to buy, will be so high for consumers that publishers will see declines in traffic over the coming years.

So then, publishers are facing a few years of transition from old models – where it was possible to get a lot of traffic from terms like “when is the Super Bowl” or “how much is a Ford Fiesta?” — to a future where every single question like that can be answered on the page.

Knowing this, there is no point in setting a strategy for the coming year which doesn’t take account of this longer-term trend. But how can you do that, while also not losing large chunks of visits?

SEO strategies for the next year

The starting point is to look at keyword intent and analyse how likely it is that there is a long-term future for traffic. I follow a fairly standard intent-based split into four buckets:

Informational: Getting specific answers, usually starting with how/why/whats and commonly answered with some kinds of tutorial
Commercial: Usually showing some kind of purchase intent, at either early or late stages in the funnel. Almost always including bests, comparisons, reviews, product categories or product/service names. Best answered by reviews and comparisons, and, of course, the heart of affiliate revenue.
Transactional: All about completing the immediate action of purchase. Usually involves keywords like “buy”, “cheap”, “quote” and sometimes also location-based, such as “buy cheap tires in Canterbury”.
Navigational: Site and brand names, typically typed in because you want to find a specific brand/product site.

As SEMrush noted last year, transactional and commercial keywords are on the rise, while informational and navigational are declining. That’s good news if you’re looking to affiliate content to drive your revenue over the next year or so, but it also means that informational queries are both dropping in volume and will be answered more on the page through AI-driven features like Search Generative Experience (SGE).

For entertainment brands that have come to rely on informational content about, say, Love Island and have no authority at all about products, this could lead to a particularly bad short-term squeeze.

The temptation will be to try to turn entertainment brands into product focused ones, but it’s worth not going overboard with this, as over the long term it could dilute authority in other areas. To put it another way, if it doesn’t fit, don’t force it: no one really wants reviews of Love Island false eyelashes (sorry, Liverpool Echo).

Where you should be focusing across the board, though, is on quality, particularly in three areas:

Originality
Authorship
Experience

For a long time, one of the dirty secrets of SEO work was the amount of time you could spend trying to steal traffic from your competition by creating “me too but better” content. Check out what keywords they were ranking for, and if you didn’t have equivalents, create them and go on an updating binge to get them to rank. This had the double whammy of both getting you traffic, and weakening your competition.

I told you it was dirty, didn’t I?

The problem with this was that combined with headings targeting related keywords, everyone ended up with content which was highly optimised but unoriginal. It all looked, and often read, the same. It’s no wonder that this is the kind of approach which has worked for using AI to generate quick sites for profit: any content approach which can be reduced to a mechanical process will ultimately be able to be done by an LLM.

Unleash the quirk-en

To stand out, you are going to have to engage some originality in your approaches. That doesn’t mean abandoning the basics of on-page SEO, or never looking at your rivals for ideas. But it means that if your rival is taking an approach, thinking of an original way to answer the same need for the audience will help you stand out. And in a world of AI-generated grey goo content, you will need to stand out.

How do you do that? Well, that is a creative question for you to answer – and you do still have some creative journalists left in the building, right? My personal favourite is The Verge’s magnificent pastiche of an affiliate article, but your mileage may vary – and more importantly, the things which make your audience laugh, cry, and so on are areas that only your experts can tell you.

Why Roland Barthes would have been a terrible SEO

The second area is our old friend authorship because far from being dead, the author is back at the centre of the universe. Unless you have been hiding under a rock, you will already have good quality author pages which link to every single article from your author. You will also have purged your sites of those dreadful “Brand Byline” things which indicate either a confused content strategy or content with quality so low that no one wants to put their name on it.

Now it’s time to go deeper, and that will mean using any means necessary to establish the authority of authors. Make sure that your authors are “out there” – no, not wearing tie-dye clothes and going to Grateful Dead gigs, I mean getting as many authoritative mentions on media you don’t own as possible. Guesting on podcasts, writing guest posts, being quoted by news organisations – encourage your authors to have and raise a professional profile. If one of your journalists is the go-to expert about a topic area, that will pay off over the long term in increases weight of their authority by Google, and the sum of their authority is your authority as a brand.

There is no on-page or technical SEO fix for this. If your journalists spend all their time in the office churning out “me too” articles and never actually doing any work to raise their profile, they are never going to have enough authority. Set them free. Get them out making connections. Fly, my pretties, fly!

Are you experienced?

This brings us nicely to the last point: experience. Not everyone noticed when, at the end of 2022, Google stopped talking about EAT (expertise, authoritativeness, and trustworthiness) and added an extra E: Experience. As they put it at the time, “does content also demonstrate that it was produced with some degree of experience, such as with actual use of a product, having actually visited a place or communicating what a person experienced?”

Now, here’s another dirty little secret: quite a bit of affiliate-focused content out there is written with little or no actual experience of the product. Yes, that’s right, some people write reviews having never had the products in their hands. What, you think PRs are actually sending out products to hundreds and hundreds of big and small publishers for test?

There was a good argument for this: doing reviews based on desk research was a time-saver. Rather than consumers having to comb through spec sheets and a thousand user reviews on Amazon, one journalist could do it well and get a better result, with the application of their expertise. But… it was always a bit of a cop-out, at least for major publishers who could get the real thing in for review.

In the era of experience, desk research is dead. You need to write from first-hand experience of the product, and you need to demonstrate it as often as you can in the copy. You are using first-person, right? Not only that, but you’re not still clinging to old-fashioned “we tested this” are you? If you are, 2024 is the year you stop doing that. It matters.

Adapting to the new reality

This advice should be good for you in 2024, but it’s also vital as the foundation for the AI-driven search landscape to come.

All three factors – originality, authorship, and expertise – are things that LLMs don’t have, and importantly probably will never have. Although a human can use an LLM to achieve original results, LLMs are, essentially, unoriginal thinkers. They are also not authors in their own right (no, LedeAI, you are not a journalist), so are unlikely to be able to build a profile outside your site. And while they have ingested a lot of expertise, LLMs are really experts in nothing – and, as good as it is, no one is going to invite Copilot on to the evening news to discuss anything (sorry Microsoft).

But here’s the thing: all of these human factors are expensive. Too many executives, particularly ones with boards that lack experience of frontline journalism (and yes, they do exist, and you can do your own research to find them) think that when journalists spend time not writing, they aren’t being productive.

If your metrics are the number of articles, and not the quality of those articles, then you are going to struggle to adapt to the new reality. And then new reality really starts today.

August 14, 2023

On Cnet deleting its archive

CNet Deletes Thousands of Old Articles in an Attempt to Game Google Search – Pixel Envy:

Google says this whole strategy is bullshit. A bunch of SEO types Germain interviewed swear by it, but they believe in a lot of really bizarre stuff. It sounds like nonsense to me. After all, Google also prioritizes authority, and a well-known website which has chronicled the history of an industry for decades is pretty damn impressive. Why would “a 1996 article about available AOL service tiers” — per the internal memo — cause a negative effect on the site’s rankings, anyhow? I cannot think of a good reason why a news site purging its archives makes any sense whatsoever.

There’s been quite a kerfuffle about this. This is an area where I have more than a little experience, and although it sounds counter-intuitive it is completely true that there are instances where it's better for users and the site for old content to be removed.

Although, as Nick points out, Google advises that simply deleting content does nothing for you there are three circumstances where deleting content very definitely does improve your SEO. But you don’t just delete it. Deleting content without redirecting it or in an unstructured manner just leaves you with a bunch of 404s, which you don’t want. It will also almost certainly break some of the crawl paths which Google and other robots use to find their ways around the site.

But there are circumstances where you want to delete and redirect content, either because it’s a bunch of content which is actively harming your site’s authority with Google or because it no longer best serves the needs of your audience.

The first is where that content is thin. Thin content is typical old-style news in brief pieces which are very short. Google has always disliked short content (the rule of thumb is under 300 words) and while a few pieces are fine if a sizeable percentage of your content is thin it can hurt you. Those kind of stories tend to date from the early/mid 00’s, when blasting out tonnes of content was the fashion, and a lot of new-in-brief pieces got written.

The second is when you have lots of repetitive or duplicate content – content which essentially says the same thing, over and over and over again. Big news sites do this a lot, because often with news you have covered the same story with more or less the same facts for a long time. But you will often also have content which is essentially the same, because people have the same idea for an article and don’t bother to check if it already exists – leading to two very similar articles.

Why does that matter? Because Google likes it when there’s one article on your site which provides a clear answer to a specific search query. If you have written two articles on, say, the history of the Mac Plus then it doesn’t know which one to rank and so basically down-ranks both.

The third circumstance is where you have old content receiving no traffic but which is about a keyword you are targeting. Every page has authority on some topics, even if it doesn’t rank well or at all. Often, old content isn’t maintained well. Google likes content which is updated with fresh information, because that content tends to best-serve users arriving from search. If you don’t update content, it tends to gradually lose ranking over time.

Sometimes the best approach with content like this is to start fresh – particularly when you have multiple articles on the same topic. In that case, deleting the old piece and redirecting it to a new URL is the right approach. You get the minimal authority of the old page, sending a clear signal to Google that the new page is the right one for any search queries you previously ranked for.

The Cnet memo on its process is actually a model for how you should do it, with clear guidance and opt-outs for content which is of historical value. Most content isn’t – remember the old adage that today’s news is tomorrow’s fish-and-chip paper – but some stories clearly are. They also ensure that anything deleted is in the Internet Archive (which is another reason why the clear attempts of some publishers to kill it are so stupid).

As a writer, all this can be hard to take – after all you want to see all your articles available – but there are things you can do about it. First, make sure that you keep copies of your work. If you work for a site with an SEO team, talk to them about republishing it on your own personal blog (you can add a canonical to your post to show where the original version was published, and this is actually good for their SEO). And use Authory to keep an archive of everything across every site you publish on.