James Dooley: Hi, today I'm joined with Dan Petravich and the topic of conversation today is about has AI affected link building strategies for SEO. Dan Petravich: Hey James, how we doing? You all right? James Dooley: Doing well. So with regards to link building then um what's changed with regards to now artificial intelligence is upon us. What do you think's changed with regards to link building strategies? Dan Petravich: Yeah, look I I have a lot to say about the the topic. I've presented on link building for many years. Um I stood on stage in front of very large audiences and I told them to clean up their act and do better. Um, and so m I'd like to give it a little bit of history. Um, and maybe highlight where link building always fails. Dan Petravich: So link building always goes as a sort of an afterthought in the SEO process and you're always trying to make it fit the strategy that you already have. Right? So you start with, okay, we've got we've got this thing we want to rank for. The page is already done. That's finished. Uh, we need to get links for it somehow. And we're just going to try to, you know, round a whole square peg. We're trying to make the content, put it somewhere else, and then force the links to exist on that page. You know what I'm talking about. You've done links. Dan Petravich: So this we've been doing for a long time to the point where people who accept our links are now aware of what we're doing and they ask for money. But not just not only that, but they are fitting our silly narrative of one link for yourself for your client and two links to make it look natural. The most ridiculous thing I've ever heard. one for Wikipedia and one for some gov website to make it look natural and one. So guess what you're doing when you do the one when you do oneplus two formula you're basically putting a target on your link making it super obvious hello I'm the only commercial link on this page and these two are fillers right so Dan Petravich: I get up on the stage I think I was in Munich and I say to people this is what's wrong at the moment this is what I found if I can spot your links so can Google nothing changed people just keep doing the same thing. Um, and those who accept our links now, they have policies that mirror that. They're paring things back to us. They're saying one link for yourself and two natural looking links. Dan Petravich: And I'm really I was really furious about the whole thing because we ruined it for everybody. We and we trained the bloggers to expect that as well. Um, so what did I do? Let's get back to into AI. And actually I'm going to go down to the machine learning level now. Dan Petravich: Techrunch, Mashable, Wired. I basically took top 10 biggest blogs in the world. Regardless of the topic, just the the the by by volume and readership and I reviewed their link integrations just like ad hoc uh ad hoc view at everything. And I realized one thing that stood out for me straight away. Holy cow. 12 links on a page, 24 links on a page, 50 links on a page. Wow. When you go to those um spammy guest post farms, one link, two links, three links, maybe four or five them. So that's already an immediate immediately obvious signal. [snorts] But I was like, what if I could train a model to think about links in the same way that these top level, highest quality blogs in the world think about links Dan Petravich: and took me a couple of months. I scraped all of them. I scraped Tech Crunches, gigabytes of data. I pre-processed everything, cleaned up the text, extracted sentence by sentence, and I marked up every time a location in the link existed from two in the like character count. And I would mark everything. This is a link, this is a link, this is a link. So basically, I ended up with gigabytes of content with markup where links used to be. Um, doesn't matter where the link goes, but that's a link. Dan Petravich: Um, so I pre-processed the data and I took a small off-the-shelf pre-trained uh model. Uh, I think it was Microsoft's dea uh V2 or V3 and I fine-tuned that model uh using uh token classification. Token classification is not sequence classification. sequences classify positive sentiment, negative sentiment. Token classification goes down to the granularity of a single token. So basically it predicts the spans in the text which are more likely to be links um than not. So so in my pre-processing I marked all the non-link text as zeros and all the link text as ones. That went into the model model uh model converted into token ids. I did my padding, batching, um, that machine in the background, processed everything. I trained for a couple of days. [snorts] Voila, a model that's intuitive about links on the web. So now I feed a blank page of text, no links, no markup, no HTML, nothing, just plain text. It'll paint with great precision where a link falls in as learned from the best of the best of the web, how they link out naturally. Dan Petravich: So, how can you use this? How can you use AI to improve link building? Two things. One, you're writing an editorial piece and you're trying to come up with ways to integrate your links. This will already paint the spots where links fit in naturally. So when you're trying to think about where do I put the link on this page, put the link there. If there's no nice place, rewrite your content, reprocess the content in the model, paint it again, pick the best spots. So that's that's uh sort of like a link planning stage. And then you integrate that and then you do your your outreach and place links for all your link links that you've already generated in the past. Dan Petravich: You [snorts] can then run the processing uh extract text extraction from all those uh linking pages from your link. You basically you process your entire link profile and you run the analysis using this model link it's called and you you run the text analysis and you do the predictions where the links naturally fit in that narrative and you can basically do the scoring did I pick the same spot that the models that the model picked. Dan Petravich: So that's your first level of first level of research just to fit where the links fit naturally. The second thing is I have another uh model uh since we're talking about AI and links. Um the second model is called penguin and its job is to spot your link. Dan Petravich: So the the sole purpose of the model is to see who wanted the link on that page. It effectively acts a act acts as a Google web spam member goes visits the page reviews all the links is there one that's obviously for commercial purposes who wanted a link on this page if it cannot detect it says I don't know and if it can it flags the link and it flags the filler links the ones used to make it look natural Dan Petravich: and and I've been doing link profile analysis with this for two years now and um the model outper performs human link builders on link detection. And I'm excited about this and nobody actually knows this. First time I'm talking about this. I have a um an agentic flow in place now that takes a piece of text, tries to integrate the links in a in a certain way and then the penguin algorithm tries to break it and if it fails goes back in the loop and it cycles until you can fit the link in this in such a way that it fools my um link spam model. Basically, I have a writer and rewriter, an evaluator going in an agentic loop, constantly looping until now, I've I've tried to fit a link in like one of my um I I wrote an article. I pretended I'm posting this on mo.com and I said, I want the link to this page to be on that article. Make it work. Went through 10 iterations. Dan Petravich: Went through 20 iterations, 50 iterations, 100 iterations, it couldn't make it work. My like my writer model, my link integrator model, my link builder model never could find a way to fool the judge. Dan Petravich: And okay, so I want to leave this with everyone listening. If that's the case, if you cannot make it fit, don't do it. Don't make that link. James Dooley: So you're saying relevance there is is mightily important because otherwise you're just trying to push like you always say like a square into a circle all and it's just not going to fit. Therefore you've got to try and do it. So almost less is more of going with the quality as opposed to just trying to force it. I've got I've got another question for you there then for forget about the actual link well it's related to link building and AI. How important now is an implied link but not physical link being put on the page like an unlin mention a branded mention or whatever how important has that become more important with the AI or less important or what like with regards to link building or corroboration Dan Petravich: for for ranking purposes it doesn't really matter for training purposes it does matter um uh but where I find most utility is that there's an interesting it's really cool that you mentioned that there's an interesting behavior that if you're a well-known brand, here we go. Going back to branding. Um, if you're a really well-known brand and you have a mention on somebody's website that doesn't have a link, Gemini in AI mode will fill it with a link. James Dooley: Really? Dan Petravich: Yeah. James Dooley: Oh, I didn't know that. Dan Petravich: It's like a gift. James Dooley: Yeah. Yeah. Dan Petravich: Yeah. So basically, you know, you're like Nike, Adidas, Under Arour, and then if it's familiar with those brands, it'll just link them up even if they're just mentioned but not linked. Yeah, but I I this comes back down to again branding and if they're familiar and they've got confidence and clarity that they know exactly who that brand is. James Dooley: Does would that only do it if it's got a KGM ID as being a known entity? Have you ever have you looked into it to see do they do it for some companies that might not have a knowledge panel? Dan Petravich: If if you're not if you're not a known entity, it's not going to happen. Um and I I suspect you also have to be a source in the grounding. Um anyway, not necessarily in that spot like I'm saying it will fill that spot. Um but um you you have to be um a source in a grounding because Gemini is obsessed about preventing hallucinations. Dan Petravich: Not Gemini. Gemini is a model I should say the Gemini app or the AI mode or AI overviews. They've had some recent embarrassments with glue and rocks and um giving poor advice, poor health advice to people. Um so they are a little bit paranoid now and I think that's this the reason that they're grounding everything with multiple sources. Um so to prevent hallucinations they are only relying on things that are already in the grounding sources. So if you're not in the grounding sources, if you're not not authoritative, I don't think there's a chance you're going to get that gift of a of um you know AI mode result but then there's like a link in there for you to click on um that I haven't seen yet. James Dooley: with someone some people might be watching this now and saying um let's say what is a KGM ID which stands for knowledge graph machine ID and you're mentioned there you need to be a source could anyone that's got a real genuine business that doesn't aren't a source at present what's the easiest way to build that authority and brand because you you've mentioned on every single one of the episodes a key takeaway from every single one of them is brand brand, brand, brand, and everything seems to relate back to being brand. The trust signals that come with a brand, the confidence that comes with a brand, the clarity that comes with a brand. How does someone make that real business into being a source? Dan Petravich: Um, invent Time Machine, go back um I don't know, seven years back, edit Firebase [laughter] before acquisition. Uh I I'm glad I uh I'm glad I spammed Firebase when I did cuz I got in where I wanted to get in. But um was it Firebase? Am I getting the right James Dooley: we do in the UK it's Crunchb is a massive site. Dan Petravich: Not Crunchb. It was like the the Google acquired one database. I'm pretty sure I think it is Firebase. Yeah. Yeah. James Dooley: Firebase. Yeah. Dan Petravich: Um I could be it was a long time ago that they did that acquisition. And so if you This is interesting. Uh I think I mean joke aside, time machines and everything. If you if you want to see how all this works, um Google actually has a proper system of entities not just for knowledge knowledge graph and knowledge panels. They actually have uh all the entities mapped out. Um, and I even have a I have an extension um that helps you like you can go on Google search results page and you can hit or I think on a yeah research results page and you can check on that extension to see who is a known entity and it give you the entity ID from the Google’s Google’s knowledge graph. Um, and I also have let me see if I can find it. Um, entity James Dooley: does that where is that pulling in from the knowledge graph API within Google? Dan Petravich: Yeah. Yeah. It just looks at the rendered source of the page and um and finds that um um I Okay. So, basically on don.ai/tool. One of the many tools that I have there listed is Google entities. So, you can basically do a search. You can just look up a name or a brand or a product and you can see if you have an entity in Google's knowledge graph for that. That's basically your proof that you're like a registered known quantity. with Google within the knowledge graph. Um, so that's that's Google's mid machine ID. James Dooley: Yeah. Dan Petravich: Uh, basically, so why is that relevant is because they that that sort of logic and reasoning is is throughout Google's um systems. Um, if you look at vertex documentation, whether [snorts] you're doing custom search, if you're doing like general Google search, mids are always there and you can ground with that. You can you could they have a they have a complete knowledge graph on all the known uh uh known entities. Dan Petravich: Now there is no way to just download all that and map things out because that's proprietary now. You can get it like from old school like frozen in time when when the Firebase was snapshot. Um but there is an alternative. Um I'm wondering if I can think of it. Um cuz I just recently uh just recently I was working on it and trying to map out all the um all the entities in that. Um there is yeah maybe maybe we'll sync up uh after the call. I'll send you send you the link. Uh the name escapes but it's like a pretty well-known um entity database. James Dooley: Yeah, we'll put the link in the description. We'll find send it me in a bit and we'll put the link in the description. But for me on this with regards to link building for AI and stuff like that, I know we're talking a little bit about known entities and being a source or having a KGM ID. James Dooley: Everything around our business model now comes back down to not just ranking um where it used to be. We used to be obsessed with just ranking in Google and obviously then we've realized many years back to rank in Google. You want brand of social media and real traffic and engagement and everything else that comes with it. The second part is the knowledge graph is trying to improve that confidence score in the knowledge graph for confidence and clarity. And then the third is in the LLMs trying to be site not just cited but recommended in the LLMs. And I think if within everything you do with your link building strategies if you can try to align it to to be helping your confidence score with who you are and what you do in the knowledge graph. try to corroborate and get him like the framing for LLMs but then also get the rankings. I think those three together falling in line with each other is is kind of what we're doing with our link building strategies nowadays. Is there anything else on there then related to improving link building for AI? Dan Petravich: Before I say that um Wikid data James Dooley: wikid data yeah yeah yeah get on it. Dan Petravich: Yeah. Um, so basically basically um I I I did something really um uh really important. What I'll do is I'll um do a quick uh uh screen share um just to show you what I've what I've um found is basically I I used all the wiki data entities and I've um and I've drawn a parallel between entities that I've embedded known entities that I've embedded and I've done the semantic similarity between Gemini model and its little um cousin Gemma and I found they're they're basically in the same semantic space when they're not the the figures all the figures are different but when you rotate the embeddings when you mix things up they always converge the same semantic thing and I think there's something about wikid data even if it's not verbatim from Google Google's knowledge graph there's something about wikid data that's of really really strong utility for SEOs looking to gain an edge in not just SEO but also AI visibility Um, I would I would seriously um I'm I'm glad I didn't forget about this. So, yeah. Um, seriously check out check out uh Wikid Data. It's a great James Dooley: The only thing I would say on that, anyone who's watching this, is making certain that they don't go out creating a Wiki data and account himself and editing it himself. If they don't have some sort of knowledge of who they are online, I would start building up who you are online. make ideally getting an entity home. So like having a jamesdly.com and wrapping that I mean schema helps to pull everything together but trying to pull that together that then otherwise I know a lot of people that's tried to create them and actually had them deleted and once you try to then create it again it becomes hard it's almost like trying to create yourself a Wikipedia page before you actually deserve having a Wikipedia page. Um it's the same with um Wikid data. A lot of people have tried to create it and had it deleted. Dan Petravich: Yeah, it's not going to work. Um I um I refer to it as a resource for understanding the the current makeup of the entities because there's not it's not just Google. There's other systems and those systems will use this as a both training data. James Dooley: Yeah. Dan Petravich: Um and and sort of like a crutch to lean on as for grounding grounding of that uh of that models. So, I think this is an important resource. Um, I [snorts] it didn't cross my mind that I could try to like inject my own entry in there, but because I think it's it has to be there has to be a parallel and actual Wikipedia for this to work. Yeah, James Dooley: there doesn't need to be a Wikipedia for it. So, there needs to have a Wikipedia page, but you can you can you can inject your own information. So, if I've got a new brand, I can create that new brand or business and get it a Wikipedia kind of entrance. I need to be connecting it. ideally with other entities. So if I say James Douly is the founder of Petravvic SEO as being an example and that and I'm saying that that's a business because now it's got connections with me that is an entity then it works better where if you're just trying you and you don't have any sort of relevance online it becomes a lot more difficult. You need to connect the entities. It's almost like nodes and edges. You need to be connecting those relationships together and the more connections you have on on the web then it's more likely to stick and and creating those some a lot of pe what I would say is instead of it just being a hack to say everyone should go and add a wiki data kind of things so many people don't add themselves to wiki data and it's so important to do it as long as you are genuine business and you have got those connections and stuff like that but yeah if if one's not been created then go in and inject one and create one for sure. that getting that in there is huge cuz then actually a lot of the time that triggers in time a knowledge panel especially for an individual it can trigger a know like if you can go and like offer a book or even be on podcast like this you can get an IMDb profile and stuff like that and all that adds to the confidence and clarity score of who Dan Petravic is and it's repeating who you are and what you do then it's building the confidence score it's own little algorithm just on knowledge graph and who they are that I think Gemini is going to be leaning on more and more in time. It's it's very similar um uh to Google's internal um knowledge graph and uh uh what I you mentioned graph I actually built the full graph Dan Petravich: really James Dooley: yeah full graph so the whole the whole of uh wiki data so I'm showing my age where I couldn't recall the name um to be fair it is 8:00 p.m. I'm done. Um mentally mentally foggy already. But um so what I did is I basically downloaded the whole data set. I extracted the um label information and then I built up I built up the full um undirected knowledge graph um where I treat the labels the text labels of each entity as a um node and the um obviously I've got edges but that there was some some data clean up in there because for each label you have multiple language versions as well. James Dooley: Yeah. Yeah. Dan Petravich: So then you have to think about how to treat that and this and that but like I'm not going to go into details of that. Um it's a 60 I'm just reading from my screen now. It's 68 GB file right? Um so see so it's a SQLite database um with full connectivity and the screen screenshot that you well screen share that you've seen earlier was Dan Petravich: I actually did the embeddings vector embeddings of the entire link gra sorry uh the entire knowledge graph. James Dooley: Yeah. Dan Petravich: So I now have a semantic search engine that if I if I type in um Renfishkin for example, it'll be give me high cosign similarities towards SEO but low cosign similarities towards cake making. James Dooley: Yeah, Dan Petravich: right. I actually have. So basically what this what this gives you this gives you uh a window of insight and these embeddings are are generated by Gemini. So Gemini how Gemini thinks about brands. So you can you can basically put your you can put your brand as a search term and it will return the most aligned concepts with that brand in the semantic space of the embedding model from Google. Dan Petravich: [snorts] same technology as Gemini, the journey model in an AI search. Just think about the utility of that. James Dooley: Yeah, it's unbelievable. Dan Petravich: Um, it's great for keyword research, great for clust great for clustering, great for um keyword classification. It's good for um uh keyword gap analysis, content ideas. It's just ins link building is just insane um that this data is free and available to us to use. But if it wasn't for AI, I would have I would have never been able to implement this. Um so just super grateful that we live uh you know post AI um revolution where we can do all these things. James Dooley: It's crazy how nearly every single episode we've spoke about and this one is about um link building for AI kind of comes back down to again it's building brand it is genuinely getting yourself in the knowledge graph in my opinion there's only brands that are in there or obviously there's individuals there's people there's businesses and stuff like that but the more confident they are in yourself um in it Dan Petravich it's been an absolute pleasure we hope you like the video on link building and what has changed in the AI I era, I strongly recommend checking out a couple of the links in the description. There's one about the future of SEO and there's another one which is over 45 minutes long about how to optimize for the LLMs, chat, GPT, Gemini, Perplexity, and all the other AI platforms of what there is. Dan, it's been an absolute pleasure. Dan Petravich: Thank you very much. James Dooley: Thanks, James.