00:00:00 Dr Genevieve Hayes Hello and welcome to value driven data science brought to you by Genevieve Hayes Consulting. I'm your host, doctor Genevieve Hayes and today I'm joined by guest Doctor Alessandro ***** to talk about graph powered data science. Alessandro. Welcome to the show. 00:00:19 Dr Alessandro Negro Thank you and thank you for inviting me. 00:00:22 Dr Genevieve Hayes Alessandro has literally written the book on graph data science. In addition to being the chief scientist at Graphaware, the world's number one, Neo 4J consultancy and managing director at Graphaware, Italy. He is also the author of graph powered machine learning. 00:00:42 Dr Genevieve Hayes And the author of the recently released knowledge graphs applied. 00:00:47 Dr Genevieve Hayes Now I haven't read Knowledge Graph supplied yet, but I have read graph powered machine learning and I'll just say this is an excellent book that I would recommend to any data scientists looking to get started with graph data science. Not only does it provide all the necessary theory in a manner that's. 00:01:08 Dr Genevieve Hayes Very easy to understand. 00:01:10 Dr Genevieve Hayes It also gives worked examples including Python And Cypher source code that's can be used to produce them. And yeah, when I worked through that book, I created my own use case and you know, just created a Sherlock Holmes knowledge graph and I was able to use that. 00:01:30 Dr Genevieve Hayes Code to build it on my own home laptop. Something you should be very proud of, Alessandro. 00:01:36 Dr Alessandro Negro Yeah, I'm definitely happy that. 00:01:37 Dr Alessandro Negro You found it useful, you know, for concrete use cases and for practising with your graphs. 00:01:45 Dr Genevieve Hayes One thing it's probably worth calling out before we go too far is when we're talking about graphs. We're not talking about histograms or pie charts here are we. 00:01:56 Dr Alessandro Negro Yes, definitely you know because this is something that the languages sometimes can generate issue. So actually when we say graph is just nodes and relationship, generally we refer to Instagram as charts. So just to be clear, we will use the charts for histogram or pie chart or whatever else. 00:02:17 Dr Alessandro Negro Is a graphic and graphs or whatever is a model that represents our business use cases through nodes and relationships. 00:02:29 Dr Genevieve Hayes Yeah, so it's sort of like a network like a social network. 00:02:33 Dr Alessandro Negro Exactly, social Network is a an example of a graph application. 00:02:37 Dr Genevieve Hayes Yeah, so if everyone just thinks Twitter or Facebook or LinkedIn, then they're probably going to be fine. 00:02:43 Yep, that's OK. 00:02:44 Dr Genevieve Hayes How did you first become interested in working with graphs? 00:02:48 Dr Alessandro Negro It happened many years ago, I would say. 00:02:51 Dr Alessandro Negro 2012 00:02:53 Dr Alessandro Negro And the first time that I was reasoning in terms of graph was because there was a designing a a sort of multi layer hierarchy representing these agents and subagents. 00:03:09 Dr Alessandro Negro World, you know in which you have agents having under themselves many other people per area, for example and and such. 00:03:17 Dr Alessandro Negro And I found out that the best way to represent this was of course using a graph, because this allowed me to sort of represent in the exact way that. 00:03:28 Dr Alessandro Negro The reality as it is, and so I was, let's say, exposed the for the first time to to the graph, and specifically to near for Jay that at the time. 00:03:37 Dr Alessandro Negro Was like the version 0.9 or such and and from this first meeting many many other ideas. 00:03:49 Dr Alessandro Negro Came and so I built the first recommendation engine. 00:03:54 Dr Alessandro Negro On top of. 00:03:55 Dr Alessandro Negro Neo 4J, as my first experience of applying data science, let's say to the graph word. 00:04:03 Dr Genevieve Hayes Did the organisation that you are doing work for already have NEO for Jay or did you have to actually do the exploration in order to discover Neo for Jay was the best tool for this use case? 00:04:15 Dr Alessandro Negro Well at that time. 00:04:16 Dr Alessandro Negro This was just a night and weekend project, you know, it was my personal interest in the in the field of the science 1st and then graph. So I was just playing around and I built a career by this and item. 00:04:35 Dr Alessandro Negro In prod. 00:04:36 Dr Genevieve Hayes So new for Jay. That's a graph database which I take it as the name would suggest, is specially designed for holding graph data, so the nodes and the relationships. 00:04:48 Dr Genevieve Hayes Is it feasible to work with graph data if you don't have a graph database underpinning it? 00:04:55 Dr Alessandro Negro Well, in theory. 00:04:56 Dr Alessandro Negro Is possible? It depends on the sides, from. From my point of view, in the sense that. 00:05:03 Dr Alessandro Negro There are many graph databases that they offer. They say graph interface, so they reason in terms of the nodes and relationship. 00:05:13 Dr Alessandro Negro But behind the scene they have. I don't know whatever relational database or key value data store and and such. Of course there are. 00:05:22 Dr Alessandro Negro There are pros and cons in in any approach. Let's say that the the so called the graph native. 00:05:30 Dr Alessandro Negro Databases like NEO 4 J. They store the the data as a graph, so literally they have a list of nodes and for each node do they store their relationships and so on. So forth. So literally they have these additionally list storage mechanism but. 00:05:48 Dr Alessandro Negro Makes the traversal of this graph. 00:05:52 Dr Alessandro Negro Faster because of course, while you are for example finding shortest path or you are navigating a graph starting from a node. 00:06:00 Dr Alessandro Negro This is a much faster because you don't have to go in a table or in a key value store and look up for that node and all the relationship and then the other node and all the relationship and such because. 00:06:12 Dr Alessandro Negro You have all these attached to each node, so you start from a node. Then you see all the relationship. 00:06:17 Dr Alessandro Negro Navigate this relationship and you move farther from from this. So in terms of graph traversal, this type of storage mechanism is much faster. The drawback is of course that it. 00:06:29 Dr Alessandro Negro Cannot be, let's say short so you cannot spread it across multiple machines because it is much more complicated. You know there is no easy way unless the graph can be easily split in independent subgraph. So other graph that the bases leveraged these different data structure. 00:06:49 Dr Alessandro Negro These are again. 00:06:50 Dr Alessandro Negro On key value store for example for sharding the database, which means dividing in peace and storing in different servers. 00:07:00 Dr Alessandro Negro That of course has some other advantages that are not definitely for graph traversal, but for certain type of graph analytics. So the the way in which you store this graph. 00:07:11 Dr Alessandro Negro Has a direct impact on the efficiency of certain type of use cases versus others. 00:07:19 Dr Genevieve Hayes What's the largest graph database you've come across in your work? 00:07:23 Dr Alessandro Negro Well, we definitely stored or created in store big databases. A few of them had like billions of of nodes and the relationship and it was related to a certain law enforcement use cases. 00:07:43 Dr Alessandro Negro In which you have to collect data from an huge number of data sources and the hands you have a a big database to handle in this case. 00:07:55 Dr Genevieve Hayes I'm guessing that something like Twitter or Facebook has an there for Jay or similar database underpinning their operations. 00:08:04 Dr Alessandro Negro Well, yeah they may. They have a graph database, but in both cases they created their own graph database data structure. 00:08:15 Dr Alessandro Negro Both Twitter and Facebook have, let's say, their own version of a graph to the base that they created by themselves, other companies. 00:08:24 Dr Alessandro Negro Relied on on. 00:08:26 Dr Alessandro Negro Only for Jay, but these are, let's say. 00:08:29 Dr Alessandro Negro Big Big social network providers have their own because they have very specific type of analysis to do and so they created their own. 00:08:39 Dr Genevieve Hayes Yeah, and I mean a big tech company has the financial capability to develop their own graph database, whereas your average company does. 00:08:49 Dr Alessandro Negro Yeah, of course. 00:08:51 Dr Alessandro Negro They have their resources that they need to build by themselves and let me say also that they started a bit earlier than you. For Jay, you know Twitter was there before me and for Jay, so they had this need before near for Jay sort of democratised. 00:09:09 Dr Alessandro Negro The concept of graph database to all the other the other companies you know. As usual you have the early adapters and definitely Facebook and Twitter. 00:09:18 Dr Alessandro Negro Where in in this area and the new for Jay literally took this. This idea and made a a product out of it that other companies. 00:09:29 Dr Alessandro Negro News and the same lead. The Twitter in some way or the other. If you want you can access to the software that they use for storing the graph database, but it's not. 00:09:40 Dr Alessandro Negro Indeed, you know for any use case has a a very specific set of features and very set specific tasks that you can accomplish with that database instead near for J. 00:09:52 Dr Alessandro Negro Let's say since they were doing a business out of this case, they made it and they are still making it. 00:10:00 Dr Alessandro Negro Generic, let's say it's for solving multiple type of problems rather than just one. 00:10:06 Dr Genevieve Hayes So far we've touched on 2 use cases for graph databases, so we've touched on the social network use case and you also mentioned law enforcement use case that you'd come across, uh, what other use cases have you come across for graph databases? 00:10:22 Dr Alessandro Negro Well, I would say many really their recommendation that I was mentioned before because it belongs to my heart since I started my career in this area with a recommendation engine is definitely something that is. 00:10:40 Dr Alessandro Negro You know, very active, not only because it empowers complex type of recommendation engine, but it solves also complex issues around these type of machine learning task. I'm thinking specifically to code start or contextual recommendation. So these kind of problems can be. 00:11:00 Dr Alessandro Negro For the in an easier way, if you are using a graph database, but apart from recommendation, that is still an on topic in the graph space, there are many others that are jumping out. I'm thinking about fraud detection. For example. I'm thinking of criminal intelligence that we were discussing before. 00:11:21 Dr Alessandro Negro But also very recently there is a new trend that I will define the knowledge graph trend in which you know these semantic web encounter the the the graph space and from these. 00:11:36 Dr Alessandro Negro It's a merging of ideas. These knowledge graph idea, you know was born in some way, and these are literally opened many other domains to this graph. Way of thinking, because imagine that. 00:11:55 Dr Alessandro Negro You have a a a medical use case. The knowledge graph can literally help you in gathering. 00:12:05 Dr Alessandro Negro From various type of data sources, literature as well as ontologies, as well as structure that the sources and combine in these big single source of knowledge where a clinician or where researchers can, let's say, rely on. 00:12:24 Dr Alessandro Negro Or making any type of analysis, but also for exploration purposes and speeding up the current research. For example, this is another very relevant use case, so in the biological or specifically biomedical space, the. 00:12:40 Dr Alessandro Negro Graphs so these specific. 00:12:42 Dr Alessandro Negro Type of graphs are becoming a sort of standard and and the same is for example in the financial sector in the banking sector, where again they are using this knowledge graph. 00:12:54 Dr Alessandro Negro Again, this single source of knowledge for offering not only for detection that I mentioned before, but also advanced services. 00:13:01 Dr Alessandro Negro To their to their customers. There is this concept of customer 360. 00:13:05 Dr Alessandro Negro That's it, jumping. 00:13:06 Dr Alessandro Negro Out of here and there, in which what they are doing is to collect all the information around on the a user and performing a cross selling for example or performing advanced type of suggestion. Recommendation again as well as tailoring a certain type of offering to. 00:13:26 Dr Alessandro Negro To them, based on their specific needs. 00:13:29 Dr Alessandro Negro Or to the needs that they could have in the in the future. In all these cases, what there is something in common that is the ability of the graph and specifically the knowledge graph to aggregate data from different type of data sources above structure and the structure and the offering. 00:13:50 Dr Alessandro Negro A unique view, let's say a global view on the on the. 00:13:55 Dr Genevieve Hayes So I'm I'm just still trying to visualise this so it's very easy to visualise the idea of a social network because you've got you know the nodes being people and the edges being the Connexions between me and someone who's my Facebook friend for example, but with a knowledge graph. 00:14:16 Dr Genevieve Hayes I'm guessing that the nodes would be individual concepts. 00:14:21 Dr Genevieve Hayes For example, a person or a place or a disease. If we're talking about medical research. 00:14:29 Dr Alessandro Negro Yes, exactly. 00:14:30 Dr Genevieve Hayes Would a relationship be something like? You know, if we're talking about a tennis player, say, Novak Djokovic has played tennis at the Wimbledon Tennis Court, for example, would that? 00:14:41 Dr Genevieve Hayes Be right. 00:14:42 Dr Alessandro Negro Well, that's exactly what it is. You know. Let me give you a blood overview. The graph as it is, is a very, very simple mathematical concept, you know. 00:14:53 Dr Alessandro Negro It is just a set of notes and relationships or a set of vertex and edges if you prefer so as a mathematical concept is super simple. 00:15:04 Dr Alessandro Negro You know everybody can understand then what happens with the social network that you were mentioning before is that we are adding a sort of semantic on top of this. 00:15:13 Dr Alessandro Negro Yeah so. 00:15:15 Dr Alessandro Negro We are saying that nodes represent people and relationships represent social relationship between between people, friendship or working and and whatever you know the knowledge graph is exactly the same concept. It is a graph, nothing more, nothing. 00:15:35 Dr Alessandro Negro Yes, but we applied a much more semantic on top of it, you know. And according to the domain which you are, these nodes represent different concepts. So if we are in the biomedical space as we were mentioning, nodes can be genes, diseases, protein. 00:15:52 Dr Alessandro Negro Means treatments, whatever and the relationships are. For example, biological connexion between a gene and related protein, or can be a relationship between the proteins because they interact together or between diseases because they are connected somehow. 00:16:10 Dr Alessandro Negro And then these diseases can. 00:16:12 Dr Alessandro Negro Can be connected to relative. 00:16:14 Dr Alessandro Negro Genes that are well known Connexions, for example, between genes and and diseases, and so on. So we can say that the knowledge graph is a set literally of interconnected entities with their attribute. 00:16:27 Dr Alessandro Negro But and and and relevant relationships between between these nodes and then concepts that are specific for a domain. 00:16:36 Dr Genevieve Hayes And with that example of the diseases and the genes I'm imagining you could have something like a disease like COVID which is connected to this particular. I don't know pro. 00:16:47 Dr Genevieve Hayes Teen and then you could have that proteins also linked to this protein and this disease and that would allow you. 00:16:53 Dr Genevieve Hayes To find similarities between diseases and presumably, and I know nothing about medical research, so I'm just making this up, but presumably that would help a medical researcher to identify Connexions which. 00:17:08 Dr Genevieve Hayes Might help them to create some sort of novel treatment for this particular disease. 00:17:14 Dr Alessandro Negro Yes, exactly, there is an interesting point about this example. You know, because first of all there is a a concept that I like to mention all the time that I. 00:17:24 Dr Alessandro Negro Was speaking about graphs. 00:17:25 Dr Alessandro Negro You know, once you have all your data stored in the form of. 00:17:29 Dr Alessandro Negro A graph every single node and every single relationship could be an access point for your. 00:17:34 Dr Alessandro Negro Analysis for your exploration, exactly as you mentioned, you know I have a specific disease in mind. I would like to explore the the surroundings. Let's say around this COVID. 00:17:44 Dr Alessandro Negro That's one perfect and I will say one of the most common use cases or for for graph or usages. 00:17:52 Dr Alessandro Negro If you prefer, then of course there are other type of more complex analysis and again you mentioned these you know in your example. So one of the. 00:18:04 Dr Alessandro Negro Major use cases. I will say in the biological space is the so called drug. 00:18:09 Dr Alessandro Negro Using that means that you have drugs or compounds already existing and you would like to see if existing drugs existing compounds can be used for a new disease. 00:18:22 Dr Alessandro Negro This is exactly what happened for COVID. You know if you remember when we were using hydroxychloroquine for example as a as a way of. 00:18:29 Dr Alessandro Negro Treating COVID, then we discovered that was not the case, but unfortunately at the time we didn't have the knowledge that we have right now. 00:18:38 Dr Alessandro Negro This is the classical use case, in which case we are using complex machine learning. Let's say tasks for. 00:18:47 Dr Alessandro Negro Performing this type of drug, repurposing this that translated is no more, no less than a so-called link prediction. So you have a graph with existing links and existing, let's say relationships and you would like to predict unseen or unused relationships, and in this case. 00:19:07 Dr Alessandro Negro Instead of doing what we were mentioning before, like a simple exploration, you're doing a deep analysis of your graph. 00:19:14 Dr Alessandro Negro In order to accomplish a much more complex tasks that is like in this case link prediction. So you are literally discovering. 00:19:26 Dr Alessandro Negro New relationships where they are hidden somewhere in the structure of the graph. For example, you know and juggler proposing is a classical example. 00:19:34 Dr Genevieve Hayes So if a new version of COVID came out so COVID 23 God help the world, let's hope that doesn't happen, but that would be something that previously not existed in that graph. 00:19:46 Dr Genevieve Hayes But given whatever limited information we had on that, we could then predict what previous drugs is that linked to. 00:19:55 Dr Genevieve Hayes And then hopefully come up with some treatment for it very quickly so that the world doesn't end up in another series of lockdown. 00:20:03 Dr Alessandro Negro Yeah, exactly, this is in reality what happened already with the COVID-19. If you think that after Zeneca, for example, that was one of the first company producing a vaccine, they used literary and knowledge graph for producing their vaccine. So there are plenty of talks about this specific topic, so it happened already. 00:20:24 Dr Alessandro Negro Of course, hopefully. 00:20:25 Dr Alessandro Negro Like the the next time, little as you said, we wish won't happen again, but in the case knowledge graphs can definitely play another key role in the say discovery of new cures for the diseases or for finding existing. 00:20:45 Dr Alessandro Negro Drugs that can. 00:20:46 Dr Alessandro Negro We help. 00:20:47 Dr Genevieve Hayes One knowledge graph that I'm familiar with is the Google Knowledge graph. So for any listeners out there who aren't familiar with it, whenever you search on something on Google, like for example a person's name or a city location, you'll often get that box down the side of the page. If you're using the desktop version. 00:21:08 Dr Genevieve Hayes Or at the top of the screen if you're using it on mobile and it'll give you key facts about that person or that location, what is the practical application of that Google knowledge graph beyond providing interesting facts about locations and people when you search? 00:21:27 Dr Alessandro Negro Well, definitely I would say that knowledge graphs were introduced in this world and for this specific type of usage, from from Google for the first time, you know, if you search for knowledge graphs on this Google trend, let's. 00:21:46 Dr Alessandro Negro Say feature available. 00:21:46 Dr Alessandro Negro Available in Google, you will notice that around 2:00. 00:21:49 Dr Alessandro Negro 1012 00:21:51 Dr Alessandro Negro You will see, let's say a spike that is related to the introduction for the first time. When this concept after then that. 00:21:58 Dr Alessandro Negro Nothing was the same and they had an interesting way for introducing this concept that was searching for things instead of searching for strings, you know that is exactly what you described. 00:22:12 Dr Alessandro Negro You know if I'm I'm searching for a specific concept, I don't want to get only the list of documents mentioning. 00:22:19 Dr Alessandro Negro This specific word or set of words that is searching for strings, but I would like to get exactly that specific thing, so the box on the on the side of the search. 00:22:30 Dr Alessandro Negro Well is literally the the thing, hopefully or the things that we were searching for and that change change dramatically. 00:22:39 Dr Alessandro Negro The way in which they were offering these search results, and it is entirely powered by knowledge graph, but definitely is one of the most relevant usages. 00:22:51 Dr Alessandro Negro Of knowledge graph for their specific case. 00:22:54 Dr Genevieve Hayes At the end of your book graph powered machine. 00:22:58 Dr Genevieve Hayes You go through a use case of how to build a knowledge graph from scratch. Would you be able to give the listeners a condensed version of how they'd go about building a knowledge graph? 00:23:10 Dr Genevieve Hayes Because one of the things that I thought was really cool about. That book was even though obviously someone like me couldn't build. 00:23:17 Dr Genevieve Hayes The knowledge graph the size of Googles it was pretty cool to be able to build my own Sherlock Holmes knowledge graph just on my laptop on the weekends. 00:23:27 Dr Alessandro Negro Let me say that these are, you know, two chapters were so useful to many people that we decided to to write another entire book that will be on that topic. 00:23:38 Dr Alessandro Negro So I would say that. 00:23:39 Dr Alessandro Negro The knowledge graph applied, that is the. 00:23:41 Dr Alessandro Negro Book we are. 00:23:42 Dr Alessandro Negro Working to in in these norms started. 00:23:48 Dr Alessandro Negro Exactly from from this idea, you know from the last two chapters of the of the previous book in which I was building this knowledge graph and extended to, let's say, other 600 pages, more or less. 00:24:01 Dr Alessandro Negro The the reason is, uh, what you mentioned. You know, this is definitely one of the major concerns that many people and many companies have. 00:24:10 Dr Alessandro Negro You know how can I build another graph? Well, let me say that there are two major not issues but approaches. 00:24:20 Dr Alessandro Negro And the both of them are valid, and they should in some way merge on one side of. 00:24:24 Dr Alessandro Negro Course you can have. 00:24:25 Dr Alessandro Negro A structured data sources, CSV files or relation other bases or many other sources that are structured by. 00:24:35 Dr Alessandro Negro And for this it's relatively simple once you identify. 00:24:40 Dr Alessandro Negro The the the key. 00:24:41 Dr Alessandro Negro Entities or the key concepts. As we were saying before that you would like to store in the in the knowledge graph. 00:24:48 Dr Alessandro Negro Sorry in the knowledge graph and and you identified also the relationships and the global schema, then it's pretty. 00:24:55 Dr Alessandro Negro Straight forward to. 00:24:57 Dr Alessandro Negro Load this data in the form of a graph you know. Generally everybody can do it. 00:25:04 Dr Alessandro Negro Then there is another interesting area that is much more complicated, but definitely more satisfying. That is, the conversion of the so-called unstructured sources in a knowledge graph, and that's where you could have more fun. As I said, because imagine that you have a text, you know. 00:25:24 Dr Alessandro Negro Uh, the text, as I said, generally is referred as unstructured, but in reality our. 00:25:29 Dr Alessandro Negro Languages have a lot of structure inside. You know our grammar, syntactic dependencies and such so you can leverage this structure and literally extract an enormous amount of information from from the text. Typical example is the so-called named entities, which means that you should recognise in a text. 00:25:51 Dr Alessandro Negro If let's say, a couple of words are how to say a person rather than a location rather than a company, and so on so forth, or a disease and a apart from recognising these entities, you should be able to also to recognise the relationship. 00:26:10 Dr Alessandro Negro Between these entities, you know that in some way are simple to extract and others are more complicated because the the simplest example is the connexion between subject, verb and object, and then you can extract the easy relationship between the subject and the object, for example. 00:26:29 Dr Alessandro Negro Of of a specific sentence, others are a bit more complicated to extract. 00:26:34 Dr Alessandro Negro But still doable. So this task is called entity relationship extraction and can be accomplished by using the rules as it is presented in the in the book. 00:26:44 Dr Alessandro Negro But also you can create a complex, let's say machine learning models to extract these sort of relationships between. 00:26:54 Dr Alessandro Negro Between entities 00:26:55 Dr Alessandro Negro And there is a I'll say 1/3 task. 00:26:58 Dr Alessandro Negro Again related to this area of conversion from unstructured to knowledge graph. 00:27:03 Dr Alessandro Negro That is the so called the named entity disambiguation or entity linking, which means that you are connecting these extracted entities to a sort of knowledge base. So for example. 00:27:15 Dr Alessandro Negro If you are extracting a the word diabetes then you should be able to connect it to the right. 00:27:22 Dr Alessandro Negro Type of diabetes you know, and so on and so forth. So this connexion between an entity extracted from a text, and let's say the well known entity in a in a knowledge base, allows you to not only know more about that specific entity more than just the name. 00:27:42 Dr Alessandro Negro But also extractor Connexions between the this entity and other entities inside the text or inside the the knowledge base that you have. 00:27:51 Dr Genevieve Hayes I'm just guessing this is how Google did their knowledge graph so they could have taken basically every web page, extracted the named entities from those web pages, or even just from something like Wikipedia and then use that to connect nodes and entities and build their knowledge graph. 00:28:09 Dr Alessandro Negro Yes exactly. I would say that these Wikipedia that you mentioned is still the most relevant knowledge base that everybody is using. 00:28:18 Dr Alessandro Negro In many cases you know I will say in many generic cases like Google for example, you know you know this already, that whenever you search for some well known name or well known. 00:28:30 Dr Alessandro Negro The first box that we were discussing before will be a wiki page, so definitely you know Wiki page represent the the the main source of this knowledge graph for things there is only one drawback that is related to the. 00:28:50 Dr Alessandro Negro Let's say to the specific domains that you could have, you know on your path. For example, if you are speaking about a medical domain or other very tiny domains. 00:29:03 Dr Alessandro Negro Unfortunately, the availability of a well known and well structured knowledge base is is less, let's say probable and, which means that you need to build your own knowledge base. 00:29:16 Dr Alessandro Negro You need to build your own mechanism for extracting relevant information from text. We will of course build its own. 00:29:24 Dr Alessandro Negro Namit recognition models and into relationship instruction models on generic applications on generic domains. Not very specific ones. 00:29:33 Dr Genevieve Hayes So it's basically the same as what you find with any of those pre built model. 00:29:37 Dr Genevieve Hayes The pre built models are designed to cater for the generic use case that the majority of people want to use. 00:29:44 Dr Genevieve Hayes But if you have a very specific organisation based application you're going to have to build your own use case. 00:29:53 Dr Alessandro Negro Yes, exactly, that's perfectly representing what happens every day. You know it's rare for a specific company like I don't know. 00:30:02 Dr Alessandro Negro In the financial sector or in the law enforcement sector to can rely on existing models because they are too generic. You know they have specific needs they would like to recognise. 00:30:12 Dr Alessandro Negro Specific entities in the text that that are just not available in the generic language models available on ageing face, for example. So they have to build their own. 00:30:23 Dr Genevieve Hayes Yeah, I I was recently at a conference where there was a woman from Ambulance Victoria speaking and she was saying how ambulance Victoria had to build their own named entity recognition model because the generic models or no, sorry it's a sentiment analysis model because the generic models did not understand the way paramedics speak. 00:30:43 Dr Alessandro Negro Exactly, there is a very very common problem. That's why we are partnering with with a company called. 00:30:50 Dr Alessandro Negro Ubi? 00:30:52 Dr Alessandro Negro The eye that offers a sort of annotation tool, you know in which domain experts can just go through tonnes of documents or fewer documents and annotate entities and relationships that are relevant for day specific domain and build the automatic automatically models language models. 00:31:12 Dr Alessandro Negro Out of the of of this annotation, uh, this is a very, very relevant task because, as we said, once you approach specific domains with this specific problem, you need to build your own language model and this tools. 00:31:30 Dr Alessandro Negro Allow you to to do this specific task that through annotation you can create your own model to recognise what matters for for you. For the domain that you are trying to handle. 00:31:42 Dr Genevieve Hayes Yeah, in in my previous job we were working in a very specific domain and one of the biggest challenges we found was get finding individuals within the organisation who understood the data well enough to annotate it and who were prepared to spend all the hours or days that. 00:32:02 Dr Genevieve Hayes Would require in order to annotate that data. 00:32:06 Dr Alessandro Negro Well, I can't say what is more difficult to to find people with the right expertise or to convince them to spend time on a laptop or a computer, you know. 00:32:17 Dr Alessandro Negro And performing the annotation well, let me say that this is hard everywhere. What we are trying to do is to make these. 00:32:27 Dr Alessandro Negro This process more auto. 00:32:29 Dr Alessandro Negro Created, which means that through I don't know dictionary. For example, you can feed the the first annotation, for example and and creating a sort of feedback loop. 00:32:43 Dr Alessandro Negro You know, while you are updating the language models is build and this language model can be used for pre annotating, the next set of documents so that. 00:32:50 Dr Alessandro Negro Really, the amount of time concretely required for the real users real people to provide feedback could be reduced, you know, and so they will be less. 00:33:02 Dr Alessandro Negro Annoyed by these these task? So really it's hard to say what is more complicated because you are right, you know convincing them to spend hours on. 00:33:13 Dr Alessandro Negro In front of a. 00:33:13 Dr Alessandro Negro Computer to annotate. 00:33:15 Dr Alessandro Negro It's not that simple. 00:33:17 Dr Genevieve Hayes So, So what you're saying is if someone's already annotated Australia as the name of a country? 00:33:23 Dr Genevieve Hayes Before and every time Australia comes up, it's always annotated as a country name. 00:33:29 Dr Genevieve Hayes Then it could skip over that and just focus on. I don't know if it's never come across the name of a small country like I don't know. Lichtenstein, for example, which doesn't come up as often. 00:33:42 Dr Alessandro Negro Yeah, exactly. That's basically the idea. 00:33:45 Dr Alessandro Negro So let's say the the dictionary. 00:33:47 Dr Alessandro Negro Base is much simpler because. 00:33:50 Dr Alessandro Negro If the name. 00:33:51 Dr Alessandro Negro Matches then you know what it is and this can be used for training a more complex language model, not dictionary based. 00:33:59 Dr Alessandro Negro And again, this language model can be used to pre annotate. Of course the dictionary generally. 00:34:06 Dr Alessandro Negro Let's say has. 00:34:07 Dr Alessandro Negro A bigger precision? 00:34:10 Dr Alessandro Negro So you know if Australia is recognised as a. 00:34:15 Dr Alessandro Negro As a key entity, it will be always like this. You know, there are few chances that it is wrong, but the recall is very limited. 00:34:24 Dr Alessandro Negro You know which means that you won't be able to recognise all the the name of the locations. For example, you know because you don't have a dictionary containing all this. 00:34:36 Dr Alessandro Negro Of course, the locations is not a good example, but I think you understood. 00:34:39 Dr Alessandro Negro What I mean? 00:34:39 Dr Alessandro Negro Yeah, that's why on the other side the language model called the give you the opposite could have, and I recall. So in theory this language model is capable of. 00:34:50 Dr Alessandro Negro Let's say catching more names, but at the same time could be wrong, you know, because the structure of the. 00:34:58 Dr Alessandro Negro Sentence could suggest. 00:34:59 Dr Alessandro Negro That that specific entity is a location, for example, but could not be. 00:35:04 Dr Alessandro Negro It's just that it seems to be a location, but it's not, and that's where again, where the humans can not only. 00:35:11 Dr Alessandro Negro Add annotation but can also correct annotations you know, and then in this processor you can have this sort of a human in the loop in which you are, you know, let's say helping concretely the the machine. 00:35:24 Dr Alessandro Negro To understand the human human language, I will say that based on my personal experience, all this work payoff has a really a good payoff. 00:35:33 Dr Alessandro Negro You know, because what you can get out of. 00:35:35 Dr Alessandro Negro This is a. 00:35:36 Dr Alessandro Negro Custom language model that nobody has, for example. So there is a lot of value resulting out from from from this. 00:35:44 Dr Alessandro Negro Effort really specifically for tiny domains that you were mentioning before you know this is a a key step to extract relevant information and then build a knowledge graph out of your text. 00:36:01 Dr Genevieve Hayes And I could imagine some startup company, for example, going to the trouble of building one of those knowledge graphs, and then they could build some sort of product around that which would presumably if it's the right product and people really want it, it would be unique and allow them to charge quite a high. 00:36:23 Dr Alessandro Negro Yes, yes there are many, many companies you know that are doing this for living we are in contact with a few of them in which you know what they have as a business. Value is literally the the right expertise you know. So they have on one side domain experts that. 00:36:43 Dr Alessandro Negro Or, say engaged for annotating documents for building ontologies also, for example. So not only annotating documents, but also creating relevant information in the form of. 00:37:00 Dr Alessandro Negro Ontologies right Connexions between key concepts and on the other side they have also technical people that could help. I don't know pharmaceutical company, for example to leverage these language models. These ontologies in the right way for building a complex. 00:37:20 Dr Alessandro Negro Let's say applications for example, you know, so that's that's an enormous. 00:37:25 Dr Alessandro Negro Way, let's say. 00:37:27 Dr Alessandro Negro There is an enormous opportunities for many small companies you know to build a niche domain language, for example, and offer this to their to their customers. So it's a new word. I would say opportunities for for these companies. 00:37:42 Dr Genevieve Hayes I've come across graph databases in my own work and that was in a relatively large organisation with us in within Australia. But from speaking to other people I know many data scientists have never come across graph databases. 00:38:00 Dr Genevieve Hayes How prevalent are graph databases at the moment? 00:38:03 Dr Alessandro Negro Still not that much in the sense that it is growing. I mean in the last 10 years, definitely. It's much easier now to find people that are expert or at least aware of this new area. But still, you know, I would say that the data science field is so huge that. 00:38:24 Dr Alessandro Negro Everything is very specific. For example, you have many data scientists. For example working in the NLP space you know. 00:38:34 Dr Alessandro Negro Like not only extracting the relevant information, but also building questioning, answering systems and and so on and so forth. 00:38:41 Dr Alessandro Negro Then you have, let's say there are scientists that are expert over for detection, and again this is a huge area. You know in which people. 00:38:54 Dr Alessandro Negro Really specialised in that specific field or in the recommendation, or in many other, let's say high level set of applications and. 00:39:04 Dr Alessandro Negro Let's say that. 00:39:04 Dr Alessandro Negro A graph in this space could be an help. In each of these of these vertical, but still is still not so well understood. You know, because it's not only a niche, it's really a a new arrow in their bow. 00:39:24 Dr Alessandro Negro They could use for example, the graphs for improving recommendation engine solving a. I don't know cold start problem. For example, the same could be for for detection. You know they can use graph for solving the for revealing. 00:39:42 Dr Alessandro Negro Things which means that the people that are connected to each other, you know they are trying to accomplish a. 00:39:48 Dr Alessandro Negro Certain type of frauds. 00:39:50 Dr Alessandro Negro So it's it's not that you have to use one or the other, but they can be combined in many domains for offering better services today. 00:40:02 Dr Alessandro Negro Their internal company or to their users. Unfortunately, this is still not perceived as this. You know, there are not that many conferences speaking about graphs or knowledge. 00:40:15 Dr Alessandro Negro And then not then we will attain ending just yet it will take time. But definitely I see that the trend is very clear. 00:40:24 Dr Alessandro Negro You know you can see the number of companies using graphs or leveraging graph technologies for their advanced services. It will come. It's just that we need. 00:40:35 Dr Alessandro Negro More more time and definitely you know books like other books or other people book can can can help in this in this process. 00:40:45 Dr Genevieve Hayes Is the prevalence of graph database uptake differ by country? 00:40:50 Dr Alessandro Negro Well, we definitely noticed that there are some differences in in different countries. For example, when we first landed in Australia, we noticed that it was a sort of Greenfield for us. You know differently than the US where this concept was very well established. 00:41:10 Dr Alessandro Negro But you know US, you know they are always cutting edge. 00:41:15 Dr Alessandro Negro In the. 00:41:15 Dr Alessandro Negro The technology in Australia was a bit more complicated for us to convince people that this could be the way to go, but I will say that after. 00:41:25 Dr Alessandro Negro While we noticed that these generated a lot of interest and now we have different. 00:41:31 Dr Alessandro Negro Companies working with us. 00:41:33 Dr Alessandro Negro And we are offering our services and even our, let's say teaching effort to to them in order to educate to the user graphs as a again another technology that can be. 00:41:46 Dr Alessandro Negro Useful in many many different scenarios. 00:41:50 Dr Genevieve Hayes Next thing I want to explore is how can machine learning be applied to a graph database? 00:41:57 Dr Alessandro Negro OK, this is an interesting question because I I see let's say graph databases and machine learning that can let's say use each other in. 00:42:07 Dr Alessandro Negro In different ways. 00:42:09 Dr Alessandro Negro Let me say that on one side you can have that the graph databases can be used for organising. 00:42:17 Dr Alessandro Negro Your data before applying any machine learning model, you know one of the major. 00:42:26 Dr Alessandro Negro Tasks in the machine learning is a data preparation data cleaning. Let's say feature engineering. These are complex tasks. You know that sometimes take more than 80% of the data scientists time. In this sense, graphs can help you. As I mentioned before, you know to collect. 00:42:47 Dr Alessandro Negro The data, but not in the same way in which, for example, Data Lake was doing before because in the data lake what happened was just that people. 00:42:55 Dr Alessandro Negro Well, we're putting all their data in the whatever structure you know and then data scientists that the poor data scientists have to literally go through an enormous set of tasks for cleaning, improving and reaching before even start. 00:43:15 Dr Alessandro Negro Thinking about any machine learning model. 00:43:20 Dr Alessandro Negro Graphs and knowledge graphs specifically have the semantic applied to this to this data, so it's not only data, it's organised data, which means that you know that a person is. 00:43:34 Dr Alessandro Negro A person with a the. 00:43:35 Dr Alessandro Negro Relevant, let's say Connexions, and with the relevant. 00:43:39 Dr Alessandro Negro Attributes, it's totally different. You know it's really well organised source. 00:43:44 Dr Alessandro Negro True that you can then use for performing data cleaning, but also for extracting the features that you you need for the next step. 00:43:53 Dr Alessandro Negro So in this case, graphs can help you really in the early stages of your of your processes or your analysis. 00:44:01 Dr Alessandro Negro Other than that, what you can do? 00:44:04 Dr Alessandro Negro On the other way around is to literally leveraging graphs. 00:44:08 Dr Alessandro Negro Or building your machine learning models. You know you can use a graph algorithms for example directly. If you imagine the social network case you know you can easily use the network to identify key people. For example, you know this is a classical example, but you can use it. 00:44:28 Dr Alessandro Negro For identifying. 00:44:30 Dr Alessandro Negro Clusters like communities inside the, let's say the graph. 00:44:35 Dr Alessandro Negro Of course this. 00:44:35 Dr Alessandro Negro Is useful not only in the social network analysis. 00:44:39 Dr Alessandro Negro Imagine for example, if you are storing protein to protein interaction in your graph database, you know and you perform a community detection. 00:44:49 Dr Alessandro Negro In this case, what you are recognising are set of the proteins that are generally well connected together and they can be for example connected to a well defined. 00:45:03 Dr Alessandro Negro Set of diseases. 00:45:04 Dr Alessandro Negro So you can literally create models of your reality based on graph algorithms. Recently, for example, there is this new trend called graph neural networks. 00:45:16 Dr Alessandro Negro You know, in this case what you do is to store your information again in the form of a graph. Then you apply these neural networks. 00:45:24 Dr Alessandro Negro Model and you are able to, let's say, move literally from the graph space to a multidimensional space. You are vectorizing. For example these. These nodes and these vectors are. 00:45:36 Dr Alessandro Negro Are the input of complex model that you can for example use for building a classification? You can build a also link prediction as we were discussing before. So literally you know you can use graphs in many areas of your machine learning tasks you know. 00:45:56 Dr Alessandro Negro As I said, you can use as an input so you can use as a core element. 00:45:59 Dr Alessandro Negro Of your let's say. 00:46:03 Dr Alessandro Negro Machine learning tasks. Or you can use even for for exploration. You know for sometimes even for understanding how certain type of models are are working. You know that recently there is also this. This new trend related to explainable AI, you know because. 00:46:22 Dr Alessandro Negro If you are offering recommendations, nobody care. You know, nobody even asks. 00:46:29 Dr Alessandro Negro You know how the Netflix recommendation engine is working? I don't care, you know, if Netflix will recommend this or that, I can say, oh, wow, this is very relevant for me, or I don't care if a self driving car is driving me somewhere. Well, I have no idea how it works. You know how this car? 00:46:49 Dr Alessandro Negro And read all the the environment variables and convert these in in a path. You know. Of course I care that I. 00:46:56 Dr Alessandro Negro Like to to reach a specific place in a in a safe way, but no more than that. But imagine if you are a a doctor and you have to recommend a specific treatment to to a patient based on a machine learning model. Well you would like to know how this. 00:47:16 Dr Alessandro Negro Specific treatment has been produced by the machine in this sense, graphs can help you to better understand certain type of internals. Let's say of the models and so they can. 00:47:29 Dr Alessandro Negro Since they apply this semantic on top of data, it's easier for you and for the machine to explain how certain type of decisions have been taken from the machine that you know allow the. In this case the the doctor to understand why this is. 00:47:49 Dr Alessandro Negro Coming out from the machine and of course being more confident before healing the the patient with a specific treatment, for example. 00:47:57 Dr Genevieve Hayes Yep, so that's because people can actually look at the graph itself and say. 00:48:02 Dr Genevieve Hayes Yeah, and say this node connects to this node and et cetera. 00:48:07 Dr Alessandro Negro Well, that's basically the the most relevant one, but of course you know in certain cases you can explore a huge area of the graph in one shot and understand exactly from where these decisions are are coming. 00:48:21 Dr Alessandro Negro You know so. 00:48:22 Dr Alessandro Negro But yeah, it is exactly exploration that allows you to to discover. 00:48:27 Dr Alessandro Negro Certain type of decisions. 00:48:28 Dr Genevieve Hayes And I could imagine that's also very important in the financial and legal. 00:48:33 Dr Genevieve Hayes Domains because, well, if someone's going to be sent to gaol for something they wanna know why. And if someone's gonna be penalised financially, obviously. 00:48:43 Dr Alessandro Negro Yeah, absolutely. This is a critical aspect, you know, really, this explainable AI is coming up here and there more and more often, even in the criminal intelligence that you were mentioning, you know there are some studies in which for several reasons they noticed that certain type of machine learning algorithms were a bit. 00:49:03 Dr Alessandro Negro Bias it, you know. 00:49:04 Dr Alessandro Negro So by introducing these expendability they were able to to understand why these models were biassed by certain type of I don't know characteristic of the people. 00:49:17 Dr Alessandro Negro For example, you know and these are for them to fine tune, for example and such, so this is becoming a a really a relevant information to know about, you know. 00:49:26 Dr Alessandro Negro How these models are working? Because the more we are using these tools, of course, the more ethical issues are jumping out and explain ability is a key aspect that allows. 00:49:39 Dr Alessandro Negro To judge once the machine is providing a certain type of output and and then take the right decisions, you know if it's biassed or not. This will allow them to really use the best the. 00:49:54 Dr Alessandro Negro These tools 00:49:55 Dr Genevieve Hayes Yeah, and avoid a data scandal in the process. 00:49:58 Dr Alessandro Negro Of course, of course, because you know what happens then is that by for a mistake. All the processes are then considered not valuable. You know, even though you spend years and years and just. 00:50:11 Dr Alessandro Negro Because for certain reason the system is not performing well because the data that we provided is not correct. Then you know the entire processor is through is thrown away, and this is definitely not what we want as data scientist or as a machine learning engineers. 00:50:28 Dr Genevieve Hayes I was reading a book earlier today and one of the quotes they had in it was the author was saying that he couldn't believe the number of times he'd been asked. 00:50:39 Dr Genevieve Hayes If the wrong data goes into a particular model, will the model still spit out the right answer? 00:50:45 Dr Alessandro Negro Yeah well, this is especially mentioned in in my book, you know, and I like to to mention this in many of my talks you know that of course the the final quality of your model is definitely dependent on the quality of the input data. 00:51:01 Dr Alessandro Negro Yeah, that's absolutely true, and unfortunately not all the people, even in the data scientist role. Think of the, let's say, input data at the earlier stages, and again, that's where I really see that the the value of graph can. 00:51:22 Dr Alessandro Negro Can shine, you know? Because of course if you can look at the data from a different perspective, navigate it in a simple way. Maybe that this will. 00:51:32 Dr Alessandro Negro Course many of us to think of the data from a different perspective. You know, before using it for, let's say, feeding complex machine learning because unfortunately machine learning. 00:51:46 Dr Alessandro Negro As a as. 00:51:46 Dr Alessandro Negro A generic concept is an inductive process. You know it tries to generalise from. 00:51:52 Dr Alessandro Negro From simple data there is a this nice example in which you. 00:51:57 Dr Alessandro Negro Know you have a. 00:51:58 Dr Alessandro Negro Bag and you are taking out from this pennies. 00:52:02 Dr Alessandro Negro After three run, you know three tests the the machine learning will say OK, all the the coins in the bag are penny. 00:52:11 Dr Alessandro Negro It's because. 00:52:12 Dr Alessandro Negro Yo, there is nothing else that it can say, but in reality is not like this. So unfortunately these data input problem should be considered more and more rather than less and less. 00:52:24 Dr Genevieve Hayes Yeah, and just because something's happened everyday, forever still doesn't mean it will happen tomorrow. I remember I used to teach Bayesian statistics and I remember one of the questions I used to get the students to answer was what is the probability that the sun will rise tomorrow given it's risen every day since the world. 00:52:45 Dr Alessandro Negro That's an interesting question. 00:52:49 Dr Genevieve Hayes Suppose a data scientist who's listening to this programme got really interested in graph data science and knowledge graphs. What steps could they take to get started in this field? 00:53:00 Dr Alessandro Negro Well, I would say that it's plenty of book in this area, so not only my book, but there are many others. 00:53:08 Dr Alessandro Negro In which you can. 00:53:09 Dr Alessandro Negro Yeah, you know, find a useful beginning example you know which you can just start looking at small data set and start working with the with these data set and understand the basic algorithms. For example, like I don't know page rank or community detection like levane. 00:53:29 Dr Alessandro Negro And such, and I think at a once you started looking at the power of these. 00:53:35 Dr Alessandro Negro Let's say tools. 00:53:38 Dr Alessandro Negro So not only the graph that are based, but also. 00:53:40 Dr Alessandro Negro The algorithms you will. 00:53:41 Dr Alessandro Negro Fell in love, you know fall in love with with them and start using more and more again I don't. 00:53:47 Dr Alessandro Negro I don't want to say that graph databases can solve all the issues, but definitely it should be part of any data scientist background. You know, knowing that there is a a third way of doing such a type of things. 00:54:03 Dr Alessandro Negro And maybe with the time you know certain type of practises will become a sort of standard. And then let's say for certain type of applications like. 00:54:15 Dr Alessandro Negro As I said, recommendation, for example for detection, so simple basic databases and and then you will see that you will ask for more and more. 00:54:24 Dr Genevieve Hayes Well, one thing I found really useful when I was getting started with graph databases were the NEO 4J sandboxes. 00:54:31 Dr Alessandro Negro Oh yes. 00:54:32 Dr Genevieve Hayes Yeah, so these are temporary environments that you can create that have Neo flagey preload. 00:54:38 Dr Genevieve Hayes And they come with test data and you could experiment with the various graph algorithms in them. 00:54:44 Dr Alessandro Negro Yeah, absolutely they are plenty of examples in that sense, but also, you know if you search on this is not data sets, you know it's plenty of example of simple graph databases that you can easily import. You know they are almost in a CSV format. 00:55:05 Dr Alessandro Negro With a clear explanation of what they. 00:55:08 Dr Alessandro Negro Change contain and you can really import them in a easiest way. So like a run a load CSV common for example in for J and once you have the database and you have also the this graph data science library that is available with with NEO 4 J you can easily run. 00:55:28 Dr Alessandro Negro Over 70 different algorithms. 00:55:31 Dr Alessandro Negro On top of. 00:55:32 Dr Alessandro Negro The of the database and and see what comes out and definitely we'll find something interesting. You know some story, let's say around the data to to tell, and that's the good start. 00:55:46 Dr Genevieve Hayes So is there anything on your radar in the AI data and analytics space that you think is going to become important in the next three to five years? 00:55:54 Dr Alessandro Negro Well, definitely as I mentioned there is this explainable AI that is clearly recurring more and more often. You know, because since let's say more domains are approaching the the, let's say the graph space specifically, but also the machine learning. In general, you know. 00:56:14 Dr Alessandro Negro What questions are coming in the in this? In this sense, you know how can I explain why the machine is taking a certain type of of decision? Another very relevant aspect of it? 00:56:25 Dr Alessandro Negro Also, what we were discussing before about these annotation process, you know with the specific goal of extracting relevant information out of text annotation is a key. Let's say step to help people to, let's say, teach to the. 00:56:46 Dr Alessandro Negro Machine how to recognise certain type of things once you have the entities. 00:56:50 Dr Alessandro Negro In the graph. 00:56:50 Dr Alessandro Negro Well, you know many, many things can be can be done on top of on top of it. But unfortunately without this step clearly it will be difficult to do so again, annotation can. 00:57:05 Dr Alessandro Negro Explainable AI I mentioned already I I see also a lot of of interest around these questioning answering system you know and in this area specifically it's not all about. 00:57:19 Dr Alessandro Negro You know there is this new trend about. 00:57:21 Dr Alessandro Negro The Charter GPT, in which you ask something and you get. 00:57:24 Dr Alessandro Negro A very big paragraph that describe what you asked for, but there is another. 00:57:32 Dr Alessandro Negro Let's say tiny. 00:57:34 Dr Alessandro Negro Yet, area in which when you ask for a question you would like to get a precise. 00:57:40 Dr Alessandro Negro Answer like a number like the name of a disease and not an entire paragraph, you know, and in this area specifically, graphs have a key, a key role. 00:57:50 Dr Alessandro Negro Because what happens? 00:57:51 Dr Alessandro Negro In many of these, let's say studies, is that what they do is to take the question and convert literally in in a query. 00:58:00 Dr Alessandro Negro But generally in a SPARQL query or in a cypher query and then they use this query to access a graph and get literally the answer. So a set of nodes or a set of. 00:58:12 Dr Alessandro Negro Relationships coming out from this. Let's say from this. 00:58:16 Dr Alessandro Negro Graph so it's a. 00:58:17 Dr Alessandro Negro Totally different type of questioning answer system because in the first time you asked to chat GPT and obtain an explanation it is cool. 00:58:25 Dr Alessandro Negro It is fine, absolutely. I also tried it a couple of days ago and it's super, you know, fun. 00:58:33 Dr Alessandro Negro To to to get the answers even to complex questions like what is the meaning? 00:58:37 Dr Alessandro Negro Of the life you know we we. 00:58:38 Dr Alessandro Negro Paid a lot. Definitely useful, but there are many, many other use cases in which you don't want to get a paragraph you would like to get an answer a number again, specific set of numbers. 00:58:50 Dr Alessandro Negro For example, you know based on your on your question, in this case, knowledge graphs are playing a key role there because. 00:58:59 Dr Alessandro Negro They contain information, let's say structured in a way that is not text. You know, it's like nodes and relationships, so it's much easier for the for the model to extract out of these specific answers that are. 00:59:11 Dr Alessandro Negro Not paragraphs or. 00:59:13 Dr Genevieve Hayes So if I asked what's the population of Australia, it would extract the keywords population in Australia, convert that into some sort of query and return 20 something million. 00:59:25 Dr Alessandro Negro Yeah, exactly. That's the purpose of. 00:59:27 Dr Alessandro Negro This question and answer system. 00:59:28 Dr Alessandro Negro You know, totally different than. 00:59:30 Dr Alessandro Negro Chat bot that of course has let's say other type of issues like keep the conversation you. 00:59:35 Dr Alessandro Negro Know eventually keeping. 00:59:36 Dr Alessandro Negro Context, but for in this specific case you would like to get the number you know because we have a specific question you like to know. 00:59:44 Dr Alessandro Negro Do you know what a paragraph describing a where is Australia when it was discovered? You know you like to have a number like? OK, this is the population. 00:59:56 Dr Alessandro Negro You know for these type of questions, the let's say chat GPT or similar type of conversational AI bot cannot be helpful. 01:00:08 Dr Genevieve Hayes What final advice would you give to data scientists looking to create business value? 01:00:12 Dr Genevieve Hayes From data. 01:00:14 Dr Alessandro Negro First of all, focusing on the, let's say on. 01:00:17 Dr Alessandro Negro The business case. 01:00:19 Dr Alessandro Negro You know, because what the note is the. 01:00:21 Dr Alessandro Negro In the past. 01:00:23 Dr Alessandro Negro In the even for us is that when you look at a certain type of domain or a problem what? 01:00:28 Dr Alessandro Negro You do first is to. 01:00:29 Dr Alessandro Negro Collect all the data that you can and then try to get. 01:00:32 Dr Alessandro Negro Out to some answer. 01:00:34 Dr Alessandro Negro We should start from a totally different approach. You know we should use these crisp the let's say approach. You know that is the cross industry standard for data mining. 01:00:46 Dr Alessandro Negro That is a very good standard. You know, even when you come to the machine learning in general the the interesting thing is. 01:00:53 Dr Alessandro Negro That you should. 01:00:54 Dr Alessandro Negro Always start from the from the business case, so you should you should first understand the business and the goal that this business has. So ask you OK. 01:01:03 Dr Alessandro Negro What is the the value that they would like to get out of this? Once you have this information, you should look at the data and extract only the relevant information that you need. 01:01:12 Dr Alessandro Negro You know the the relevant portion of this data, that is the bare minimum to accomplish the task that you would like to accomplish. It is totally different than before. 01:01:22 Dr Alessandro Negro I mentioned already this data lake issue. It was exactly this bottom up approach which you what you do is to say OK I have this bunch of data. 01:01:30 Dr Alessandro Negro Let's put everything together and then data scientists will do their job and it was a. 01:01:35 Dr Alessandro Negro Nightmare, you know? 01:01:36 Dr Alessandro Negro Because you have this data like plenty of useless data for whatever transactional data, unstructured data and this pure data scientists have to really go through this lake and find a very tiny data you know distributed across this huge. 01:01:57 Dr Alessandro Negro Set of information. Start with the problem no and focus on the on the value that there is the solution to this problem can deliver. Then go back to the data and say OK. 01:02:09 Dr Alessandro Negro Where where it is the the the minimum set of information that I can extract and how I can extract and use these for solving my problem. 01:02:18 Dr Alessandro Negro Most probably it won't be enough. 01:02:20 Dr Alessandro Negro But at least you will reach. 01:02:21 Dr Alessandro Negro Immediately your scope and then you can reiterate and for example, extend the set of data that you're using, or verifying that the results are correct. 01:02:30 Dr Alessandro Negro And iterate again and again and again and finally you will get the result much faster than starting instead looking first for your data and spending 80% of your time just cleaning. 01:02:39 Dr Alessandro Negro Things that you don't care about, so that's my personal suggestion. 01:02:43 Dr Genevieve Hayes Basically, you're better off trying to fish in a barrel rather than trying to fish in the whole Pacific Ocean. 01:02:50 Dr Alessandro Negro Absolutely yes. 01:02:51 Dr Genevieve Hayes So that's about all we've got time for today. Alessandro for listeners who want to learn more about you or get in contact, what can they do? 01:03:01 Dr Alessandro Negro Well, definitely they can send me an email at alessandro@grabber.com or they can search for my name on LinkedIn or just on Google because it's plenty of my talks and use my book if they want or my books now. I mean still the second one is on me, which means that it's still not fully available. 01:03:22 Dr Alessandro Negro And you know, it's plenty of reference for them to learn. And if they have any any questions they can reach out to me through LinkedIn or through email or whatever other mean they they prefer. Even come to lecture and visit me in the office. 01:03:40 Dr Genevieve Hayes And I'll put a link to your LinkedIn page and to your books in the show notes. 01:03:44 Dr Alessandro Negro Thank you for that. 01:03:46 Dr Genevieve Hayes So thanks very much for joining me here today, Alessandro. 01:03:51 Dr Alessandro Negro Thank you for inviting me again. It was a great pleasure speaking with you. Definitely a lot of interesting questions. 01:03:57 Dr Genevieve Hayes And for those in the audience, thank you for listening. I'm doctor Genevieve Hayes and this has been valued. Riven data science brought to you by Genevieve Hayes Consulting.