James Dooley: Hi, today I'm joined with Dan Petravich and the topic of conversation today is about has AI affected link building strategies for SEO. Dan Petravich: Hey James, how we doing? You all right? James Dooley: Doing well. James Dooley: So with regards to link building then what's changed with regards to now artificial intelligence is upon us. What do you think's changed with regards to link building strategies? Dan Petravich: Yeah, look I have a lot to say about the topic. I've presented on link building for many years. I stood on stage in front of very large audiences and I told them to clean up their act and do better. I'd like to give it a little bit of history and maybe highlight where link building always fails. Dan Petravich: Link building always goes as a sort of an afterthought in the SEO process and you're always trying to make it fit the strategy that you already have. So you start with, okay, we've got this thing we want to rank for. The page is already done. That's finished. We need to get links for it somehow. And we're just going to try to round a whole square peg. We're trying to make the content, put it somewhere else, and then force the links to exist on that page. You know what I'm talking about. Dan Petravich: This we've been doing for a long time to the point where people who accept our links are now aware of what we're doing and they ask for money. But not just that. They are fitting our silly narrative of one link for yourself for your client and two links to make it look natural. The most ridiculous thing I've ever heard. One for Wikipedia and one for some gov website to make it look natural. When you do the one plus two formula you're basically putting a target on your link making it super obvious. Hello I'm the only commercial link on this page and these two are fillers. Dan Petravich: I get up on the stage I think I was in Munich and I say to people this is what's wrong at the moment. This is what I found. If I can spot your links so can Google. Nothing changed. People just keep doing the same thing. And those who accept our links now, they have policies that mirror that. They're paring things back to us. They're saying one link for yourself and two natural looking links. Dan Petravich: And I was really furious about the whole thing because we ruined it for everybody. We trained the bloggers to expect that as well. Dan Petravich: So what did I do? Let's get back into AI. I'm going to go down to the machine learning level now. Dan Petravich: TechCrunch, Mashable, Wired. I basically took top 10 biggest blogs in the world. Regardless of the topic, just by volume and readership. And I reviewed their link integrations. I realised one thing that stood out for me straight away. Holy cow. Twelve links on a page. Twenty four links on a page. Fifty links on a page. Dan Petravich: When you go to those spammy guest post farms, one link, two links, three links, maybe four or five. So that's already an immediately obvious signal. Dan Petravich: But I was like, what if I could train a model to think about links in the same way that these top level, highest quality blogs in the world think about links and link out naturally. Dan Petravich: It took me a couple of months. I scraped all of them. I scraped TechCrunch, gigabytes of data. I pre processed everything, cleaned up the text, extracted sentence by sentence, and I marked up every time a location in the link existed in the character count. I would mark everything. This is a link, this is a link, this is a link. I ended up with gigabytes of content with markup where links used to be. It doesn't matter where the link goes, but that's a link. Dan Petravich: I pre processed the data and I took a small off the shelf pre trained model. I think it was Microsoft's DeBERTa V2 or V3. And I fine tuned that model using token classification. Dan Petravich: Token classification is not sequence classification. Sequence classification is positive sentiment, negative sentiment. Token classification goes down to the granularity of a single token. So it predicts the spans in the text which are more likely to be links than not. Dan Petravich: In my pre processing I marked all the non link text as zeros and all the link text as ones. That went into the model. The model converted into token IDs. I did padding, batching. That machine in the background processed everything. I trained for a couple of days. Voila, a model that's intuitive about links on the web. Dan Petravich: Now I feed a blank page of text, no links, no markup, no HTML, nothing. Just plain text. It'll paint with great precision where a link falls in as learned from the best of the best of the web, how they link out naturally. Dan Petravich: So how can you use AI to improve link building. Two things. Dan Petravich: One, you're writing an editorial piece and you're trying to come up with ways to integrate your links. This will paint the spots where links fit in naturally. So when you're trying to think about where do I put the link on this page, put the link there. If there's no nice place, rewrite your content, reprocess the content in the model, paint it again, pick the best spots. That's a link planning stage. Then you integrate that, then you do your outreach and place links for all your links that you've already generated in the past. Dan Petravich: Two, you can run the processing. Extract text from all those linking pages from your link profile. You process your entire link profile and you run the analysis using this model, LinkBERT, and you do the predictions where the links naturally fit in that narrative. You can do the scoring. Did I pick the same spot that the model picked. Dan Petravich: That's first level research. Fit where the links fit naturally. Dan Petravich: The second thing is I have another model. Since we're talking about AI and links. The second model is called Penguin and its job is to spot your link. Dan Petravich: The sole purpose of the model is to see who wanted the link on that page. It effectively acts as a Google web spam member. Goes, visits the page, reviews all the links. Is there one that's obviously for commercial purposes. Who wanted a link on this page. If it cannot detect it says I don't know. If it can it flags the link and it flags the filler links, the ones used to make it look natural. Dan Petravich: I've been doing link profile analysis with this for two years now. The model outperforms human link builders on link detection. Dan Petravich: I have an agentic flow in place now that takes a piece of text, tries to integrate the links in a certain way and then the Penguin algorithm tries to break it. If it fails it goes back in the loop and it cycles until you can fit the link in such a way that it fools my link spam model. Dan Petravich: Basically, I have a writer and rewriter, an evaluator going in an agentic loop, constantly looping. Dan Petravich: I wrote an article. I pretended I'm posting this on moz.com. And I said, I want the link to this page to be on that article. Make it work. It went through ten iterations. Twenty iterations. Fifty iterations. One hundred iterations. It couldn't make it work. My writer model, my link integrator model, my link builder model never could find a way to fool the judge. Dan Petravich: I want to leave this with everyone listening. If that's the case, if you cannot make it fit, don't do it. Don't make that link. James Dooley: So you're saying relevance there is mightily important because otherwise you're just trying to push a square into a circle and it's just not going to fit. Therefore you've got to try and do it. So almost less is more, going with quality as opposed to just trying to force it. James Dooley: I've got another question for you then. Forget about the actual link. It's related to link building and AI. How important now is an implied link but not physical link being put on the page, like an unlinked mention, a branded mention or whatever. How important has that become more important with the AI or less important or what, with regards to link building or corroboration. Dan Petravich: For ranking purposes it doesn't really matter. For training purposes it does matter. But where I find most utility is there's an interesting behaviour that if you're a well known brand, going back to branding, if you have a mention on somebody's website that doesn't have a link, Gemini in AI mode will fill it with a link. James Dooley: Really? Dan Petravich: Yeah. James Dooley: I didn't know that. Dan Petravich: It's like a gift. James Dooley: So that comes back down to branding. If they're familiar and they've got confidence and clarity that they know exactly who that brand is. James Dooley: Would that only do it if it's got a KGM ID as being a known entity. Have you looked into it to see do they do it for some companies that might not have a knowledge panel? Dan Petravich: If you're not a known entity, it's not going to happen. And I suspect you also have to be a source in the grounding. Dan Petravich: Gemini is obsessed about preventing hallucinations. The Gemini app or the AI mode or AI overviews have had some recent embarrassments with glue and rocks and giving poor advice. So they are a little bit paranoid now and I think that's the reason that they're grounding everything with multiple sources. Dan Petravich: To prevent hallucinations they are only relying on things that are already in the grounding sources. So if you're not in the grounding sources, if you're not authoritative, I don't think there's a chance you're going to get that gift. James Dooley: Some people might be watching this and saying what is a KGM ID, which stands for knowledge graph machine ID. You mentioned you need to be a source. Could anyone that's got a real genuine business that isn't a source at present. What's the easiest way to build that authority and brand because you've mentioned on every single one of the episodes that brand is key. How does someone make that real business into being a source? Dan Petravich: Invent time machine, go back seven years back, edit Firebase before acquisition. James Dooley: In the UK it's Crunchbase that's a massive site. Dan Petravich: Not Crunchbase. It was like the Google acquired one database. I'm pretty sure I think it is Firebase. James Dooley: Firebase. Yeah. Dan Petravich: Joke aside, if you want to see how all this works, Google actually has a proper system of entities. They have all the entities mapped out. I have an extension that helps you. You can go on Google search results page and you can check on that extension to see who is a known entity and it gives you the entity ID from Google's knowledge graph. James Dooley: Is that pulling in from the knowledge graph API within Google? Dan Petravich: Yeah. It just looks at the rendered source of the page and finds that. Dan Petravich: On dan.ai/tool, one of the many tools that I have there listed is Google entities. You can do a search. Look up a name or a brand or a product and you can see if you have an entity in Google's knowledge graph for that. That's your proof that you're a registered known quantity within Google's knowledge graph. That's Google's MID machine ID. Dan Petravich: Why is that relevant. That logic and reasoning is throughout Google's systems. If you look at Vertex documentation, whether you're doing custom search or general Google search, MIDs are always there and you can ground with that. They have a complete knowledge graph on all the known entities. Dan Petravich: There is no way to just download all that and map things out because that's proprietary now. You can get it from old school frozen in time when Firebase was snapshot. Dan Petravich: There is an alternative but the name escapes me. Maybe we'll sync up after the call. I'll send you the link. James Dooley: We'll put the link in the description. James Dooley: For me with regards to link building for AI, everything around our business model now comes back down to ranking, knowledge graph confidence and clarity, and then being cited and recommended in the LLMs. Align link building strategies to improve confidence score with who you are and what you do in the knowledge graph, corroborate the framing for LLMs, and get the rankings. Those three together is what we're doing with link building strategies nowadays. Is there anything else on there related to improving link building for AI? Dan Petravich: Before I say that, Wikidata. Get on it. James Dooley: Yeah. Dan Petravich: I did something really important. I used all the Wikidata entities and I've drawn a parallel between embedded known entities. I've done the semantic similarity between Gemini model and its little cousin Gemma and I found they're basically in the same semantic space. The figures are different but when you rotate the embeddings they always converge on the same semantic thing. Dan Petravich: There's something about Wikidata, even if it's not verbatim from Google's knowledge graph, there's something about Wikidata that's of really strong utility for SEOs looking to gain an edge in not just SEO but also AI visibility. Seriously check out Wikidata. James Dooley: Anyone who's watching this, make certain you don't go out creating a Wikidata account and editing it yourself if you don't have some sort of knowledge of who you are online. Start building up who you are online. Get an entity home. Having a jamesdooley.com and wrapping that. Schema helps to pull everything together. Otherwise people try to create them and have them deleted. Same with Wikidata. Dan Petravich: It's not going to work. I refer to it as a resource for understanding the current makeup of the entities because there's not just Google. There's other systems and those systems will use this as both training data and grounding. I think this is an important resource. Dan Petravich: It didn't cross my mind that I could try to inject my own entry in there, but there has to be a parallel and actual Wikipedia for this to work. James Dooley: There doesn't need to be a Wikipedia for it. You can inject your own information. If there's a new brand, create that new brand or business and get it a Wikidata entrance. Connect it with other entities. You need to connect the entities. It's nodes and edges. The more connections you have on the web then it's more likely to stick. James Dooley: Many people don't add themselves to Wikidata and it's important as long as you are a genuine business and you have those connections. If one hasn't been created then go in and create one. It can trigger a knowledge panel especially for an individual. Podcasts, IMDb profile, all that adds to confidence and clarity. Dan Petravich: It's very similar to Google's internal knowledge graph. You mentioned graph. I actually built the full graph. James Dooley: Really? Dan Petravich: Yeah. The whole of Wikidata. I downloaded the whole dataset. Extracted the label information and then I built up the full undirected knowledge graph. I treat the text labels of each entity as a node and I've got edges. There was data clean up because for each label you have multiple language versions as well. Dan Petravich: It's a 68 GB file. It's a SQLite database with full connectivity. The screenshot you saw earlier was I actually did the embeddings, vector embeddings, of the entire knowledge graph. I now have a semantic search engine. If I type in Rand Fishkin it gives high cosine similarities towards SEO but low cosine similarities towards cake making. Dan Petravich: These embeddings are generated by Gemini. So you can put your brand as a search term and it returns the most aligned concepts with that brand in the semantic space of the embedding model from Google. Same technology as Gemini, the journey model in AI search. Think about the utility. Dan Petravich: It's great for keyword research, clustering, keyword classification, keyword gap analysis, content ideas. Link building is insane that this data is free and available to us. If it wasn't for AI, I would have never been able to implement this. James Dooley: It's crazy how nearly every single episode comes back down to building brand, getting yourself in the knowledge graph, and confidence. Dan Petravich, it's been an absolute pleasure. We hope you like the video on link building and what has changed in the AI era. Check out a couple of the links in the description. There's one about the future of SEO and there's another one which is over 45 minutes long about how to optimise for the LLMs, ChatGPT, Gemini, Perplexity, and all the other AI platforms. Dan Petravich, it's been an absolute pleasure. Thank you very much. Dan Petravich: Thanks, James.