00:00:00 Dr Genevieve Hayes Hello and welcome to value driven data science brought to you by Genevieve Hayes Consulting. I'm doctor Genevieve Hayes. And today I'm joined by Doctor Yaron Fendrick to discuss the challenges of making AI commercially viable. 00:00:17 Dr Genevieve Hayes Yaron is the chief technology officer of proof. 00:00:22 Dr Genevieve Hayes An Australian technology startup specialising in the development of AI driven software for damage detection and assessment for high value assets. 00:00:33 Dr Genevieve Hayes He has over 20 years experience in video analytics for world leading R&D labs. 00:00:40 Dr Genevieve Hayes And has over 25 patents in force. 00:00:43 Dr Genevieve Hayes Yuin, welcome to the show. 00:00:46 Dr Jeroen Vendrig Thanks for having me, Genevieve. I've I've listened to some episodes and they're very interesting and I hope we can give the same failure to your listeners. 00:00:53 Dr Genevieve Hayes I hope so. Too many data scientists dream of using their skills to develop groundbreaking AI technology, but if you manage to translate their dreams into commercially viable products, to be honest, I suspect most data scientists haven't got the faintest idea of where to even begin about. 00:01:13 Dr Genevieve Hayes Doing so, yet this is something you've managed to successfully achieve through your own startup proof tech. 00:01:21 Dr Genevieve Hayes And that's something I'd really like to explore in this episode. 00:01:25 Dr Genevieve Hayes However, for listeners who haven't come across proof tech before, could you begin by telling us a bit about it and how it makes use of AI in its products? 00:01:36 Dr Jeroen Vendrig So as you said my my background is in computer phishing, so I've always been looking for problems to attack in that domain and I found a co-founder who is not technical at all. 00:01:48 Dr Jeroen Vendrig But had a problem basically with damages that needed to be detected and so we joined forces to do that. 00:01:57 Dr Jeroen Vendrig And what we basically do, technically two key things here is we detect anomalies in our datasets and those anomalies eventually are likely to correspond to damages, which is what? 00:02:08 Dr Jeroen Vendrig Our users are interested, but the second aspect to it, which is often overlooked, is we not only do we detect our. 00:02:17 Dr Jeroen Vendrig We try to project them back to the real world object if you like, and to make it concrete, we do a lot of work with cars as assets. 00:02:26 Dr Jeroen Vendrig So and we monitor them overtime if we find in a normally in one part of an image and next week we do it again and we find in a normally in a part. 00:02:35 Dr Jeroen Vendrig Of the image we somehow need to say that's the same spot of the car. 00:02:39 Dr Jeroen Vendrig And we do a lot of work on that as well. So those are the key things that are AI engines work now. 00:02:47 Dr Jeroen Vendrig My last count I I don't actually even know, but we have four or five neural networks plus some some other more traditional computer fishing techniques in our system. So there's a big pipeline where it's part of it. 00:03:00 Dr Jeroen Vendrig That's a different task to fulfil, so it's not just one AI module that is being deployed. 00:03:07 Dr Genevieve Hayes So it's four or five sequential. I assume convolutional neural networks. 00:03:12 Dr Jeroen Vendrig That's correct. So there's different ways to acquire data, for example, so different networks are. 00:03:17 Dr Jeroen Vendrig But for that, and eventually they all come together and go through the same process. 00:03:22 Dr Genevieve Hayes You have to use different types of networks for different types of vehicles. So for example, for a car versus a motorbike. 00:03:30 Dr Jeroen Vendrig You could, but I don't think so. We haven't tried to motorbikes. You could specialise, but in the end there is a power in having it more general. 00:03:40 Dr Jeroen Vendrig Eric and an interesting example here is. So we've been doing this from the start for the use guys of cars and when we started out, we actually were driving cars in, in our on our own driveway and we we got a a question from somebody. Well, can we detect damage on infrastructure assets like mobile phone chargers? 00:04:00 Dr Jeroen Vendrig Well, it has me made for that. But you know what? When we do this in our driveway, we actually detect dents on the fence. We don't do that anymore because we mask out the car at the moment. 00:04:11 Dr Jeroen Vendrig But that's how we know. Yes, we can do that because in the end it's in the normally in the surface and it's a very different use case. 00:04:19 Dr Jeroen Vendrig You can use the same model for that. If you have a big enough data set and and spoiler alert. Usually in AI you don't have that, but if you do yes then you can specialise in that and there would be a benefit. 00:04:32 Dr Jeroen Vendrig But when you have smaller data sets, it's actually better to use the same model for different purposes. 00:04:39 Dr Genevieve Hayes I remember this was a couple of years ago reading and I think this might be an urban myth about one of the earliest neural networks which was used for detecting American tanks versus enemy tanks, and apparently one of the reasons why it didn't work was because the images they had of the American. 00:05:00 Dr Genevieve Hayes Tanks were all nice, clear images that were taken in good light. 00:05:04 Dr Genevieve Hayes And up close, whereas the images of the enemy tanks were at a distance in battle, etcetera, etcetera. Did you find when you first started building your neural networks that there were issues to do with where the photos of the cars were taken? 00:05:23 Dr Jeroen Vendrig No, but on that topic. So I what you say I think is a true. 00:05:27 Dr Jeroen Vendrig Story and in fact yesterday one of my staff members came with exactly that problem. 00:05:33 Dr Jeroen Vendrig We found out that we we put something in the neural networks, we we masked out the background to focus on our area of interest. 00:05:42 Dr Jeroen Vendrig And this particular new network that we tried actually started to learn the shapes of what we marked out rather than the content. 00:05:50 Dr Jeroen Vendrig And that's a similar case. Now we we happen to know from the main knowledge that in this case the shape doesn't actually matter. It doesn't correlate to the labels we want to detect. So yes, these problems happen regular. 00:06:02 Dr Genevieve Hayes You've had experience working in both academia and the commercial world. What are the key differences you've found between doing data science or AI in an academic setting compared to doing it in the commercial world? 00:06:17 Dr Jeroen Vendrig Let me let me narrow it down a bit because that's a very broad question. So I've been in commercial R&D with folks on computer. 00:06:23 Dr Jeroen Vendrig Efficient and there's there's plenty of differences there, so we can start there. It it will be different in other fields or it will be different if you just apply AI as a component rather than doing the R&D on it. 00:06:35 Dr Jeroen Vendrig The key difference won't surprise you, so there's a business problem or product fishing. And unlike in academia, you can't really pick your own problem. 00:06:44 Dr Jeroen Vendrig So it's the use case that drives everything. Having said that, you still have some freedom in debt and and go in the right direction or where where the best business opportunity matches the technical feasible. 00:06:56 Dr Jeroen Vendrig And there's many consequences of that, that drive everything you do, basically. So the the first one is your problem definition. Second one. 00:07:05 Dr Jeroen Vendrig And this is a bit more specific to computer phishing. Is the constraints that you can apply. The third one is more generic, I call it budget, but you should think of it as. 00:07:15 Dr Jeroen Vendrig Scalability. The deployment, our 4th one is data your favourite topic and then there is context. Maybe maybe you can go through them one by 1:00, but that is something that's just not export in academia. 00:07:29 Dr Jeroen Vendrig The first one I mentioned is the problem definition, and I know you've been talking with other guests about it as well, and it's a very general problem. 00:07:36 Dr Jeroen Vendrig In fact, it's not even specific to AI, but it's a real battle to find out what are the actual business objectives that are behind the stated objectives, and often the kind people in the business. 00:07:49 Dr Jeroen Vendrig Build I think along with you and they basically say well the problem is that I want a solution and I don't have this. 00:07:55 Dr Jeroen Vendrig Solution. So you gotta cut them back to. Well, what's the actual problem? Because they often don't understand that themselves. 00:08:02 Dr Jeroen Vendrig To be Frank, even though I've been in this, you know, business world for quite a while, it still surprises me that the business people don't understand their own problem. You really need to guide them through it. And one of the reasons I believe. 00:08:16 Dr Jeroen Vendrig Is that in the business world, everything is quite fluid. Everything is negotiable. There's a large extent of hand waving going on, but if we want to go down because in the end what we're gonna end up with is a loss function that we present this business objective, they're they're very far away. 00:08:32 Dr Jeroen Vendrig But you need to formalise what these problems are step by step, and I actually I have a plan of steps to do that which I try to apply if I can, to basically ease people in and get a nice transition to the point where the business people can sign off on something that is formalised and where the technical people understand it enough that they can take it over. 00:08:53 Dr Jeroen Vendrig And and usually I would be the bridge between that. 00:08:56 Dr Genevieve Hayes That's a bridge between our business problem and an analytics problem. 00:09:00 Dr Jeroen Vendrig Correct. So I use something which I call key technology indicators. So they're basically high level evaluation criteria that measure success, but they're not the actual evaluation criteria, but they're the ones that business people can understand. The simple ones are like I want results within 5 seconds. So so the, the. 00:09:20 Dr Jeroen Vendrig You mentioned there is time, but usually you get more involved in the use case, so everybody likes to use accuracy but accuracy itself as you know, a technical measure often is not suitable, so you gotta really go in depth. 00:09:33 Dr Jeroen Vendrig What does that mean? What you get them first. You got a way to assess the project success in the end. 00:09:39 Dr Jeroen Vendrig But usually those key technology indicators are not easy to quantify. You usually can't use them to run your experiments on, so you have to translate those to the proxies for those. 00:09:52 Dr Jeroen Vendrig And if you've made that translation, then basically you you have the business side and the technical side happy, both of them. 00:09:59 Dr Jeroen Vendrig And then do their work on top of that. 00:10:02 Dr Genevieve Hayes Could you give an example of 1? 00:10:03 Dr Genevieve Hayes Of the proxies. 00:10:04 Dr Jeroen Vendrig Well the the proxies for example the simple ones that everybody knows are Rico and precision. That is very hard to understand for people on the business side. I mean, I pick a hard one here because there's immediately a trade off in there. 00:10:19 Dr Jeroen Vendrig And trade offs are not something that they're very comfortable with, but what you usually want to do if that's a suitable measure, you kind of come to fix one of those. 00:10:28 Dr Jeroen Vendrig So you come to go to the use case and you can say, OK, well, maybe we can fix the precision also 50%, right? And translate it differently. And now you can use recall. 00:10:39 Dr Jeroen Vendrig As a measure, because one measure they can understand and and that is how you marry those. 00:10:45 Dr Jeroen Vendrig But often you get more complex ones, so we've actually this was not with proof tech, but we had one where we actually considered patenting the evaluation criterion because it was quite convoluted, but it was representing what the business needed and this had to do with tracking people's particular way of tracking. 00:11:05 Dr Jeroen Vendrig People that were suitable for the applications that that business had. 00:11:10 Dr Genevieve Hayes Did you end up editing it in the end? 00:11:12 Dr Jeroen Vendrig To be frank, I don't remember. We might not have. So there there are some enforceability issues with that. But the reason we considered it was that we thought actually we can even use it as a marketing thing. So we can set the baseline with this and and basically force competitors to use this evaluation criterion. 00:11:34 Dr Jeroen Vendrig And which would give us an advantage because we've been thinking about it for a year longer than they have. And but sorry, I don't remember if it ended up. 00:11:41 Dr Genevieve Hayes As someone from a academic background, what I'm hearing that you're saying. 00:11:46 Dr Genevieve Hayes Thing is, the way you look at things in the commercial or business world is in terms of patents, how can you patent IP whereas in the academic world, a lot of the focus is on publishing research in academic papers? Is that a good analogy? 00:12:05 Dr Jeroen Vendrig Yes, in some ways. So I was previously working for Cannon. So Cannon is top three in patenting in the world, so everything, at least in the R&D apartments, everything is about. 00:12:13 Dr Genevieve Hayes OK. 00:12:17 Dr Jeroen Vendrig That, and you're absolutely right about saying that. So if we looked at the problem when we say sometimes you would say, oh, we can solve that, but there's no way to pay in it. 00:12:26 Dr Jeroen Vendrig Then we wouldn't do it. We would send it to another department, but not it would not be done. 00:12:30 Dr Jeroen Vendrig By the R&D department. 00:12:32 Dr Jeroen Vendrig But the other side of the story, it's what you need to do for patents. It's not so different from publishing a paper, and in fact, it actually forces you to do better science. 00:12:44 Dr Jeroen Vendrig Because the thing about patents is, it's not going to appear review, it's going to a reviewer from the Patent Office. 00:12:52 Dr Jeroen Vendrig And he's basically it's standard response. He's going to say. 00:12:56 Dr Jeroen Vendrig The work that you did is obvious, and this is a particular legal term. The the word reflects what it means. 00:13:04 Dr Jeroen Vendrig What you can't do in patients, you can't just take some existing techniques, put them together and say, OK, I got something, I got something new which you do, but that's not quite good enough for a patent. On the other hand, in academia, you can do that. 00:13:17 Dr Jeroen Vendrig And there's lots and lots of fibres that do, dude. And maybe sometimes there's value in it, but I I stopped reading them myself while in Peyton. She really have to focus on making a contribution. 00:13:30 Dr Jeroen Vendrig Self so you can still put things together, but there needs to be something difficult about putting them together, right? 00:13:37 Dr Jeroen Vendrig Something that you need to be, you know well, creative or intelligence about doing that, I call it the glue. You can patent the glue. 00:13:45 Dr Genevieve Hayes It's it's the spark of inspiration. 00:13:47 Dr Jeroen Vendrig Yes, although you know it's 90% transpiration. Yeah. And and those are the academic papers that get the high citations right. That would be the equivalent. 00:13:59 Dr Genevieve Hayes Like Larry Page and Sergey Brins's, Google algorithm pay. 00:14:04 Dr Jeroen Vendrig Yeah, and and there's still enough papers like that, but they're out of the maybe millions of papers that are being published. And so that that really focuses you or on your contribution to the art. 00:14:18 Dr Genevieve Hayes Once you've got the patent, do companies publish any of the research as research papers typically, or do you just keep that patent locked away in a safe so that no one knows? 00:14:30 Dr Genevieve Hayes About it. 00:14:31 Dr Jeroen Vendrig Getting a pain is very hard and it takes a very long time. It can take like five years, so nobody is interested in the paper anymore, but what usually happens is you can file for the payment. 00:14:43 Dr Jeroen Vendrig So you have the application and it's kind of like time stamping it. And so once you've time stamped it and you can publish it now cannon. 00:14:51 Dr Jeroen Vendrig Was very conservative and that's how we didn't do that that much. But other companies are much quicker in. 00:14:56 Dr Jeroen Vendrig That and then you can basically publish that paper and it comes through through the community before the application is even made visible. 00:15:06 Dr Genevieve Hayes And you often see that with things like Facebook or Google. Often there'll be research papers where every single researcher belongs to one of those companies. 00:15:15 Dr Jeroen Vendrig And you you can. So you can come back a year later and search for the applications in your local. 00:15:20 Dr Jeroen Vendrig Mind that there's something underlying that paper which usually multiple birds. 00:15:25 Dr Genevieve Hayes When you're performing commercial research, do you draw much on existing academic research? 00:15:30 Dr Genevieve Hayes So on other journal articles. 00:15:33 Dr Jeroen Vendrig Yes, now things now are different from when I did my PhD. So right now it's kind of the expectation that people publish their code. 00:15:42 Dr Jeroen Vendrig It's not required, but most people do, so it's even better than a paper, right? You can go to the actual code and they usually have a nice explanation of it as well. 00:15:53 Dr Jeroen Vendrig And uh, so that might be the first port of call and then you say, well, that's this is really going somewhere now. 00:15:59 Dr Jeroen Vendrig Now I'm going to read the paper, so it's kind of the the other way around. So there's definitely a lot of interesting stuff out there. 00:16:07 Dr Jeroen Vendrig And I also love to have a look at the side papers with code. I'm not sure if you're familiar with that. 00:16:12 Dr Genevieve Hayes Oh yeah, yeah, I know that. 00:16:14 Dr Jeroen Vendrig Has some disadvantages, but so you have to be careful using it. 00:16:18 Dr Jeroen Vendrig But on the one hand, it's great. It's it's where everything comes to Keffer. You can see how they benchmarked and you go straight into the GitHub from there, or the papers on archive. So it's it's a fantastic world to defecate at the moment. 00:16:35 Dr Genevieve Hayes One of my previous guests, who works in AI engineer for a metaverse. 00:16:40 Dr Genevieve Hayes And he commented that he often would look at the code associated with research papers, but he often found that because the code was written by an academic and not a professional software developer, if he was going to incorporate that into his work, he had to rewrite it so that it was. 00:17:01 Dr Genevieve Hayes Suitable for production. 00:17:04 Dr Genevieve Hayes Patient is that a challenge that you've encountered? 00:17:07 Dr Jeroen Vendrig Throughout my career, well, definitely. So we've actually done a little work with universities and it may surprise you, but sometimes we wouldn't even run the code that produce. 00:17:19 Dr Jeroen Vendrig We could already see it. That's gonna be a lot of work running this sometimes, you know, best case, we would use it as a Rep. 00:17:26 Dr Jeroen Vendrig Prince, it would never come anywhere near a product. But then even the R&D code, so the commercial R&D code which. 00:17:33 Dr Jeroen Vendrig Is closer to production quality would usually be thrown away or also used for Esperance when we put something in this, especially if we put something on the chip. So you can't really afford any mistakes on it. 00:17:46 Dr Jeroen Vendrig There might be 3 or 4 versions of the codes before it ends up in the product, but nowadays with software it's much shorter. 00:17:54 Dr Jeroen Vendrig And I have to say I I understand your guests observation, but some of the stuff that is published in academia is actually pretty good. 00:18:04 Dr Jeroen Vendrig So the software quality has much improved with the PhD students do, so some stuff is actually usable and it can go in production, maybe not on the chip, but if you run it on the cloud. 00:18:15 Dr Jeroen Vendrig And you can easily replace it. And to be Frank, we have some open source code that we use. You can't launch a project without having 300 open source dependencies and not many issues. 00:18:27 Dr Jeroen Vendrig With that, however, some of the published things look interesting, and they never make it, because indeed the code is just not good enough to try it. 00:18:36 Dr Jeroen Vendrig And then yeah, you can improve it or you can just take something that is slightly different, but that's working out. So you take the path of least resistance. 00:18:48 Dr Jeroen Vendrig I do find that sometimes these academic publications, or sorry, the code for it is very much geared to the publication and it's not so easy to repurpose it for something else. 00:19:01 Dr Jeroen Vendrig But you know that's that's not why these people made it. So if you want it, that should be your job. And if you're nice, you share that with the world as well. 00:19:10 Dr Genevieve Hayes What programming languages do you typically use? 00:19:13 Dr Jeroen Vendrig As the CTO and I'm, I'm pretty handsome. I'm I'm all for everything. So I've used many so I actually on a day I can be in like 3 different languages. 00:19:23 Dr Jeroen Vendrig Python is the most important one for our back end, and of course with the understanding that the libraries that we use open CV tensor flow torch where all the grunt work is done are written in other languages. So for our front end we use TypeScript which is basically JavaScript and then we also have apps. 00:19:43 Dr Jeroen Vendrig Where we use Java and SWIFT then we are fully on the Amazon cloud. There's sometimes some weird Amazon languages. 00:19:52 Dr Jeroen Vendrig That we have to do our thing. They're phasing them out. 00:19:55 Dr Jeroen Vendrig Thank goodness. 00:19:56 Dr Genevieve Hayes Oh yeah. 00:19:56 Dr Jeroen Vendrig But the the data science is part of it is basically done in Python. 00:20:01 Dr Genevieve Hayes OK, for the data scientists who are listening, what are the most important Python packages from your point of view? 00:20:08 Dr Jeroen Vendrig So well, I already mentioned opens a few Tensorflow torch, but we actually use a lot of tenders as well. So Pentas is not we, we don't actually do we we don't create models with it or anything. 00:20:21 Dr Jeroen Vendrig But to understand the data then this is a is a very powerful tool. The the Swiss Army knife for data. So we used that a lot. 00:20:29 Dr Genevieve Hayes Does SK learn comma? 00:20:31 Dr Jeroen Vendrig Yes, we use that as well. And again, it's not really for the actual models, but it has some tools in there especially for evaluation metrics that we use in there. And sometimes we even use it for some image processing. So that's that Sidekick image. 00:20:46 Dr Genevieve Hayes OK, I haven't used psychic and which I usually use. I've used open CV for image process. 00:20:51 Dr Genevieve Hayes Missing though. 00:20:52 Dr Genevieve Hayes One thing that I was thinking often you hear about these academic commercial collaborations and I've seen a couple of them in organisations that I've worked for, but I've never really seen them being all that successful. What are your thoughts on those? 00:21:10 Dr Jeroen Vendrig It's it's hard, but I have managed to do a couple of successful ones and in fact when I when I left the University of Amsterdam where I did my PhD, actually one of the things I worked on cut commercialised. So that's my experience from the university so. 00:21:26 Dr Jeroen Vendrig And after that I've been doing it from the other side. When I was at university, that was a bit ad hoc and it it was actually quite funny because I made something. 00:21:36 Dr Jeroen Vendrig I went on a sabbatical and I came back after a few months and the commercial guys that we were working with, they gave a very enthusiastic story about this product. 00:21:46 Dr Jeroen Vendrig And they they sold it to a big German outfit. 00:21:50 Dr Jeroen Vendrig And it was all great. And it had a name and everything. And I said, well, that that's great. And it this happened in a few months, I was where can you? 00:21:57 Dr Jeroen Vendrig Show me a. 00:21:57 Dr Jeroen Vendrig Demo and looked at me and I said no, this is your stuff that you made before you left and and that is the key point. If you come from a university viewpoint. 00:22:09 Dr Jeroen Vendrig You gotta have those goods sales guys or maybe business development would be a better way to name them because they can really match that and and reformulate what you did in way. 00:22:21 Dr Jeroen Vendrig That as an academic, you just can't, because as an academic you can't to say, Oh yeah, but this and this and but. 00:22:27 Dr Jeroen Vendrig And no, that's not exactly how it is. So they scheme over all of that Kathy essence out, package it in a way that companies say, oh, they express it as the value for the company, right? They're not come to talk about CNN's or. 00:22:41 Dr Jeroen Vendrig The technology behind it, but they kind of say this is going to be the value for you as a company and that's how it's done. 00:22:47 Dr Jeroen Vendrig Now they productized it. They actually took the prototype I just told you. Don't do that. But they did take the university card. 00:22:54 Dr Jeroen Vendrig But I checked on them last year and it's still a successful business. They differed that into completely different things now. 00:23:00 Dr Jeroen Vendrig That, but it's a company with 50 people now that basically came out of that original. 00:23:05 Dr Jeroen Vendrig Product that's pretty good. That's very successful spin-off, yes. 00:23:10 Dr Genevieve Hayes Where I've seen them be unsuccessful is when it sort of ends up in this sort of buck passing exercise. 00:23:18 Dr Genevieve Hayes The organisation wants the academics to come up with brilliant stuff and then the academics want to be told what to do by the organisation. 00:23:27 Dr Genevieve Hayes And it just ends up. 00:23:29 Dr Genevieve Hayes With money being thrown at the academics and everyone trying to pretend that nothing's happening and yeah, it gets swept under the rug eventually. 00:23:39 Dr Jeroen Vendrig That's an interesting experience because when I come from the commercial side, I see it. It's exactly the opposite. 00:23:46 Dr Genevieve Hayes OK. 00:23:47 Dr Jeroen Vendrig Going back to your question, what makes a successful one? You actually need an overlap between the research interests of the the university group or the individuals. 00:23:57 Dr Jeroen Vendrig And the potential to solve that problem for the company. 00:24:00 Dr Jeroen Vendrig But it requires the company to be quite mature in thinking about that. They're dealing with R&D there, so there's no certainty. 00:24:07 Dr Jeroen Vendrig So if they can't accept the risk, then that something is wrong. You shouldn't deal with the university for that, or they're they're special departments and universities that do those things, but not with a research part. On the other hand. So. 00:24:20 Dr Jeroen Vendrig What happens if you're not aligned? My my experience. 00:24:23 Dr Jeroen Vendrig This because there's some very good salespeople amongst the professors as well, so they're just going to tell the company. 00:24:30 Dr Jeroen Vendrig Yeah, we can do that to whatever they say. And then once they got the contract in, they just kind of shape the the problem until it fits whatever they intended to do for their research. And that is not necessarily a happy marriage. 00:24:44 Dr Jeroen Vendrig Later. So if you can get those things clear from the start, but it requires from the company side to understand the academics to some some. 00:24:54 Dr Jeroen Vendrig So if if the academia says or, this is the problems we want to pursue, this is what we're. 00:25:00 Dr Jeroen Vendrig Then they have to be able to understand that enough to say, oh, it can match this range of rooms. 00:25:07 Dr Jeroen Vendrig And if you get that together, then you can actually have a very successful collaboration because everybody's aligned. 00:25:12 Dr Genevieve Hayes I think that's an interesting point. That idea of the skill set matching a particular range of problems because one of the things that when I'm teaching data science, I always tell my students the business problem has to drive the solution, not the other way round. But you also have that situation where you've got someone who specialises. 00:25:34 Dr Genevieve Hayes In a particular. 00:25:36 Dr Genevieve Hayes Skill set like for example, you specialise in computer vision, it would be a waste for you to go to work for a company that wasn't trying to solve computer vision problems. 00:25:48 Dr Genevieve Hayes So I think even though the problem has to drive the solution, people with a particular skill set need to. 00:25:56 Dr Genevieve Hayes Match their skill set with a company that requires that skill set to solve the range of problems they're looking. 00:26:03 Dr Jeroen Vendrig Yes, although you always have to be careful because maybe a computer phishing approach can be applied to a non computer phishing problem, right? 00:26:11 Dr Jeroen Vendrig And if you want to do that actually going to a university would be the right thing to do because they can think at a bit more abstract level to jump between domains. But yes, in general, you're right. 00:26:23 Dr Genevieve Hayes What would you apply a computer vision solution to a non computer vision problem? 00:26:28 Dr Jeroen Vendrig So in the end, in computer fashion, well, let's say an image, you have two dimensional data where where the elements of the data are related to each other. 00:26:38 Dr Jeroen Vendrig So if you have other problems like that, you actually can call them. I mean the image right, even though it might not be the traditional image as we know it. 00:26:47 Dr Jeroen Vendrig And there might be other problems like that, so sorry from the top of my mind, I can't think of any, but there there is often migration between topics and computer phishing itself borrows a lot from natural language processing. 00:27:01 Dr Jeroen Vendrig So basically if you want to know what's going to happen 2 years from now in computer fishing, just cheque the state of the art. 00:27:07 Dr Jeroen Vendrig Initial language processing. 00:27:09 Dr Genevieve Hayes OK, so natural language, you've got one dimensional sequence of letters, whereas in computer vision you've got a 2 dimensional matrix of pixels. Is that right? 00:27:19 Dr Jeroen Vendrig Yeah, yeah. Or at least. Yeah, sometimes, yeah. 00:27:21 Dr Genevieve Hayes Or at least two dimensional, possibly three if you've got colour. 00:27:25 Dr Jeroen Vendrig Yes, so so that's why these techniques can't be used one on one. 00:27:30 Dr Jeroen Vendrig But you see, for example, the Transformers that that have shown great success in natural language processing, they are now, they already have been transformed, if you like. 00:27:41 Dr Jeroen Vendrig Into fish and transform. 00:27:42 Dr Jeroen Vendrig Numbers, and they're not exactly the same, but the concepts behind them the the the idea behind him are the same. 00:27:48 Dr Jeroen Vendrig So at a very high level, they look at the context and the self attention within parts of the local and and there can be a word in a sentence or it can be a spot in an image and that's how these things. 00:28:02 Dr Jeroen Vendrig Transition now not not everything by Mac, so I'm not sure. So we the convolutional neural networks, I think they're used in natural language processing as well. 00:28:11 Dr Jeroen Vendrig But those might actually have gone the other way around and being applied. 00:28:15 Dr Jeroen Vendrig To those situations. 00:28:16 Dr Genevieve Hayes So I always thought that image processing was the easier use case and that it went from image processing to natural language. So that's interesting that you say it's the other way around. 00:28:26 Dr Jeroen Vendrig Ohh, tell me. Tell me why? Because I would think that natural language processing is much easier. 00:28:31 Dr Genevieve Hayes It just feels like there's more image processing use cases or successful image processing use cases, but maybe that's just. 00:28:40 Dr Genevieve Hayes And outside observe a point of view rather than a inside a point of view. 00:28:44 Dr Jeroen Vendrig OK. Well, that's very interesting. So I think image processing is very hard and when you start so you, you said natural language processing, you start with letters, but actually you start with words. 00:28:57 Dr Jeroen Vendrig So there's already so much semantic information, and basically you've narrowed it down so much already. Well, when you get an image, you have. 00:29:04 Dr Jeroen Vendrig All these pixels. 00:29:06 Dr Jeroen Vendrig That by themselves are meaningless, and then you have these groups of pixels. You have to give them a meaning. 00:29:11 Dr Jeroen Vendrig But if the light is slightly different, these pixels completely transform, even though what you see has the same meaning. 00:29:18 Dr Jeroen Vendrig Those kind of problems, you don't really have them in natural language. That's why I think it's a much harder problem. 00:29:24 Dr Genevieve Hayes What you're just saying then it reminds me of that example I've seen on the Internet where people get an image and they change one pixel. 00:29:32 Dr Genevieve Hayes And it causes an object detector to think a dog's a banana or something insane, yeah. 00:29:37 Dr Jeroen Vendrig Absolutely, yeah. And and this, this is very hard and it's ever go back to the first question that you had. 00:29:44 Dr Jeroen Vendrig So in computer fashion, constraints are very important and I actually I I have well it's not formed, but I have a checklist right of constraints that can apply to promo and there's like 40. 00:29:56 Dr Jeroen Vendrig Of the that that generally happened and for when I come to a new domain, I actually go over all these 40 and cheque well. Which ones do apply. 00:30:03 Dr Jeroen Vendrig Not all of them apply, but you often end up with like 20 constraints that you need to put in place just to be able to tackle this computer vision problem in some way. 00:30:14 Dr Genevieve Hayes Could you give us some examples of some of those constraints? 00:30:17 Dr Jeroen Vendrig At the extreme side of computer phishing, you have machine phishing, so that's where you control everything. You control the lighting, for example, and lighting is a very important one not. Not only can it light your cameras, that's of course you can't do anything with that. 00:30:32 Dr Jeroen Vendrig But it can completely change how you represent what is happening in the. 00:30:37 Dr Jeroen Vendrig Real world, what you can do more in the wild as we call it, so that that's the topic I work on. 00:30:43 Dr Jeroen Vendrig But you don't have a whole lot of control of those circumstances. You can still do things about that. So you you can provide some shade. So for example. 00:30:52 Dr Jeroen Vendrig If you take images, you tell people do it in a shade, do it. Don't. Everybody knows it. Don't take photos against direct sunlight, so that's already constraining. 00:31:01 Dr Jeroen Vendrig Bit but you can do much more with that and in fact what we found. So one of our applications of proof tech is where people take the images. 00:31:09 Dr Jeroen Vendrig So they go around the car and take images, and if that's the brief we give them, it's going to be a complete disaster. 00:31:15 Dr Jeroen Vendrig We we can't do much with that from a computer phishing perspective. So we put constraints on them, we tell them. 00:31:20 Dr Jeroen Vendrig How do you go? 00:31:22 Dr Jeroen Vendrig Along the car, we actually have a little neural network that tells them how close they should be to the car and make sure they have the right angle of things because as soon as you change the angle you get all sorts of 3D effects that make the complexing well, not infinite, but a whole lot more complex and what you for example see. 00:31:42 Dr Jeroen Vendrig In the manual so nobody reads those pages. 00:31:45 Dr Jeroen Vendrig But if you buy a cannon product, there's a lot of exclusion, so simple ones to say, don't use this at night or sometimes they say yes, you can use it at night. 00:31:53 Dr Jeroen Vendrig So those are all constraints under which the application is working has been tested and maybe you can try what happens if you don't put a constraint in place, right? What happens if you do use it at night? 00:32:05 Dr Jeroen Vendrig Maybe it works, maybe it doesn't, but that's not what the work has been focusing on. And when we look at product releases. 00:32:12 Dr Jeroen Vendrig We often say, OK, we applied those 20 constraints and have version two. We cannot lift some of those constraints. 00:32:18 Dr Jeroen Vendrig Version three, we lift more of those constraints, but at some point we reach the limit. It says OK now certain constraints need to be in place, otherwise the problem is too opening and that is actually what you. 00:32:32 Dr Jeroen Vendrig Spent a lot of time. 00:32:33 Dr Jeroen Vendrig Them on and in terms of evaluation criterion. So constraints are not evaluation criteria, but they're kind of hanging in the same space, so they're they're very important and when we go to difference between business and academia, this is an important difference because you can't pick your constraints. You have to negotiate this. What is reasonable. 00:32:54 Dr Jeroen Vendrig Users. You can't just determine them and it's pro completely failed to do that in academia. But if we put the wrong constraints in, then nobody will buy your product. 00:33:03 Dr Genevieve Hayes With these neural networks that underpin proof text products, there must have been a point where you just had no data to work with. 00:33:13 Dr Genevieve Hayes How do you deal with the situation where you have absolutely no data at all? I mean, how did you create your original data set? 00:33:22 Dr Jeroen Vendrig Yeah, absolutely. So first I have proof tech. We had zero data. It didn't scare me because it was not the first time I was in that situation. In fact, I did my PhD in the 90s. So back then there were no standard data sets. 00:33:37 Dr Jeroen Vendrig In fact, I I think there was a standard 11 standard image, Lena. But but I was in video so I couldn't even use that. 00:33:48 Dr Jeroen Vendrig So I've always been working on making my own datasets. If I can't even think of a situation where I started out with a usable dataset. 00:33:57 Dr Jeroen Vendrig If you're lucky, there is data, but often you don't have any data and if there is data, there are no. 00:34:03 Dr Jeroen Vendrig Labels we we just talked about that these all these pixels are pretty meaningless. So without labels, you're you're really flying blind. 00:34:10 Dr Jeroen Vendrig You're not going to make any sense of it. And then if you do have labels in in the very lucky situation, they're inconsistent. And actually usually you end up throwing them away and relabeling them. So creating your own. 00:34:23 Dr Jeroen Vendrig Data set. It depends a bit on the situation. I've done a lot of work in surveillance, so security settings. 00:34:31 Dr Jeroen Vendrig So it's very hard to get your hands on actual footage. So we actually reenact. So sometimes we woke ourselves in front of video cameras and we've even had with a talent agency, we hired a bunch of actors to walk around, dress in different things, et cetera. This was a particular topic that we were doing. 00:34:52 Dr Jeroen Vendrig I can tell you where I I I understand now what's going on in the film set because the logistics are enormous. 00:34:58 Dr Jeroen Vendrig But basically we did a couple of days of recording like that just to create our own data set and and have that variety in it. 00:35:06 Dr Jeroen Vendrig What we're doing now for we were interested in damages, so we actually went to a yard near the airport and we just photographed a lot of rental cars. Fortunately, they do have a lot of damages. 00:35:19 Dr Jeroen Vendrig Initially I I also sometimes go around in my neighbourhood. Here I I don't know what it is. I think there's bad drivers in in my neighbourhood there is a lots of interesting damages on the cars. 00:35:29 Dr Jeroen Vendrig And that's how we start things and we can build our. 00:35:32 Dr Jeroen Vendrig Model now at the same time and I guess in parallel I worked with a company that that had this concept of a P0 and I'm. 00:35:40 Dr Jeroen Vendrig I'm not sure if it's a common term have. Have you heard of it and that P stands for product? 00:35:43 Dr Genevieve Hayes No, I haven't. 00:35:46 Dr Jeroen Vendrig So it's it's it's product 0. So it's product 0 doesn't have any fancy data models in it. Its only purpose is to collect data. 00:35:55 Dr Jeroen Vendrig And I would say it goes into the data stream of of some some use case and it taps it off what they were very good at is making that PCR of fairy. 00:36:04 Dr Jeroen Vendrig Nice. So that people were compelled to use it because that's the whole point. Use our P0IN return. You give us your data, we start collecting it, and maybe we ask some questions. 00:36:16 Dr Jeroen Vendrig So we we get some labels from it and then we can power up models and once you have a first model that is halfway decent. 00:36:25 Dr Jeroen Vendrig Now you can show and convince people say hey, well look, this is what we can do with it. It's not quite there yet, but there is a reward if you share your data with us. If you help us with. 00:36:36 Dr Jeroen Vendrig And it keeps growing and growing. And you, you get this cycle of getting data in what it means compared to to academia or or or data science where you have big data sets already is there's a bit of bias in your data sets inevitably because you're creating it, you're going with the flow so to speak. 00:36:57 Dr Jeroen Vendrig Where you can get the data. 00:36:58 Dr Jeroen Vendrig So it's not necessarily a distribution of the real world. So that's one of the dangers that you have there. 00:37:06 Dr Jeroen Vendrig I do indeed find that heart. So now we have a good model and we want to make it better. 00:37:11 Dr Jeroen Vendrig So false positives are very easy to get feedback on but missed detections. Basically you don't know what you missed. 00:37:18 Dr Jeroen Vendrig You don't get feedback on it usually, so those are the hard ones. When you have kind of this biassed data set acquisition approach. 00:37:27 Dr Genevieve Hayes If you've got missed detections, would you even know that you had a missed detection in order to record it as a false negative? 00:37:36 Dr Jeroen Vendrig Absolutely not. And in fact what we see we two months ago, we we made a new model. So not just another iteration, but actually we started from scratch with it. 00:37:46 Dr Jeroen Vendrig We ran it on our data sets and we have false positives. So we look at. 00:37:49 Dr Jeroen Vendrig The false positives. 00:37:51 Dr Jeroen Vendrig Except they're not false positives. They're true detections. We just hadn't labelled them in fact. 00:37:56 Dr Jeroen Vendrig We can we have one very small data set, a test set and it's small because it's very expensive to make that we've gone through multiple times. 00:38:07 Dr Jeroen Vendrig And say after going through it four times, the fifth time we still find new things that we have missed before as humans. 00:38:15 Dr Jeroen Vendrig And this is we. We've basically caught with the microscope of these things and we still find new things in there and and just for context, when I say that we detect damage, this is not a big crash. 00:38:25 Dr Jeroen Vendrig We're detecting very small damages. There can be few millimetres, maybe half a centimetre, and that's that's exactly why we make these things because that's too hard for humans. 00:38:36 Dr Jeroen Vendrig A human. 00:38:36 Dr Jeroen Vendrig And do. 00:38:37 Dr Jeroen Vendrig It but not on a scale and this is very common. People can do these tasks for about 10 minutes and after that they tune out. 00:38:45 Dr Jeroen Vendrig And that's what makes it so hard to do that labelling. So yeah, we don't know what we missed. 00:38:51 Dr Genevieve Hayes I was gonna ask do humans do your labelling or is so there's no magic way of doing it other than humans? 00:39:00 Dr Genevieve Hayes I I found when I was building machine learning models in a particular organisation I was working. The hardest thing was always convincing people to label the data set for me and you could never get it done. 00:39:14 Dr Jeroen Vendrig And that's why. So we haven't cracked that. What we're trying to similar to that PCR concept you have to reward them for it. 00:39:21 Dr Jeroen Vendrig And I don't mean playing a music or whatever, but getting them to have business value for that. So in our case, say, well, if you do label this. 00:39:31 Dr Jeroen Vendrig We actually automatically generate reports for you that you need, and that's how we're trying to convince them to to give that kind of feedback. 00:39:39 Dr Genevieve Hayes Yeah, I've heard some people have tried using things like Amazon Mechanical Turk, and I've never used it myself, but I've heard the results you get from it are terrible. 00:39:49 Dr Jeroen Vendrig I I think it depends on what you try to do. If you want to label cats versus dogs I'm sure. 00:39:55 Dr Jeroen Vendrig Looks fine. Yeah, but in in our case, experts do not agree on what the label should be, but there's no way a mechanical Turk is going to do. 00:40:04 Dr Jeroen Vendrig It and in fact, so we we do use parties overseas. So basically low cost countries, but they're not random people. These people have been trained for a few days. 00:40:15 Dr Jeroen Vendrig In order to to. 00:40:16 Dr Jeroen Vendrig This the the guidelines that we provide them, there's like 25 page guidelines right now. There's a lot of pages because there's images in there, but there's a lot in it and it's still not enough. So that that is it's worth spending a lot of time on that. 00:40:33 Dr Genevieve Hayes With the products that you're developing, they're obviously. 00:40:37 Dr Genevieve Hayes These are products that are used by real people. In the end, at what point do you get your potential end users involved in looking at your products? 00:40:48 Dr Jeroen Vendrig There's two parts to the to the product right there is how people interact with it and then there is technically keeping the the basically the chops to do that properly. 00:40:58 Dr Jeroen Vendrig I I think this was mentioned in one of your other podcasts, but I'm a big fan of the lean startup and the minimum fibre products, and you shouldn't take it literally, but there's lots of good ideas in it. 00:41:08 Dr Jeroen Vendrig And it it actually has a bit of a scientific basis and one of the lessons for that is cat early user feedback. 00:41:16 Dr Jeroen Vendrig You don't have to build the actual product as you're fish in it. It's very easy to do nowadays with clickable proto. 00:41:23 Dr Jeroen Vendrig Types so anybody can use figma. You don't need to be a technical person for it. You basically can give some select use as a clickable prototype and you get a lot of feedback on that about what I like and not. 00:41:37 Dr Jeroen Vendrig In our case, I always work business to business and most of the discussion is about how is that going to affect the workflow. How does it fit your workflow? 00:41:46 Dr Jeroen Vendrig And those things surface, and in fact some questions you could ask them and that you don't get a good answer to like we discussed. 00:41:53 Dr Jeroen Vendrig In the beginning. 00:41:54 Dr Jeroen Vendrig If you show them this clickable prototype, then they'll say to you. Oh no, that's not how we do it. This is how we do it, and you'll finally get your answer. 00:42:02 Dr Jeroen Vendrig So they're very powerful and that's what we do. We we're actually one of my staff is making one right as. 00:42:08 Dr Jeroen Vendrig We speak which? 00:42:10 Dr Jeroen Vendrig We've used before we make a second version of it to get more feedback from our clients to say how are you going to use our technology because the the key component that we're working on in parallel that actually hasn't changed. 00:42:23 Dr Jeroen Vendrig But how we present it to the users is going to be different based on the feedback. 00:42:29 Dr Genevieve Hayes Is knowing your product will ultimately be used by real people impact the way you look at AI product development right from the? 00:42:38 Dr Jeroen Vendrig Yes, and that simply comes back to those evaluation criteria. I go very far with that and it's probably most people don't, but I really think about how is this going. 00:42:50 Dr Jeroen Vendrig To be used. 00:42:52 Dr Jeroen Vendrig Basically watches the UX that's going to be on top of the the technical components and then work your way back. 00:42:59 Dr Jeroen Vendrig And a simple example of that is let's let's say you have rankings, and in academia that's how you are this. This is our top five or top ten ranking results. 00:43:10 Dr Jeroen Vendrig I, on the other hand, might look at top nine or top 12, so that's very simple. Why 9 or 12? 00:43:16 Dr Jeroen Vendrig Why not a round Number? Well, usually we present things in a three by three grid or a four by. 00:43:22 Dr Jeroen Vendrig Three grid, right? 00:43:23 Dr Genevieve Hayes Oh yeah. 00:43:23 Dr Jeroen Vendrig That's what the user count to see. They're not going to say 10 results. They account to see that, but that's a small shift, right? 00:43:31 Dr Jeroen Vendrig But there is bigger ones, so a very interesting example is actually coming from our life system and we got some feedback and it says the false positives are no good. Says well, yeah, we know that nobody likes us. No, no, no, that's. 00:43:46 Dr Jeroen Vendrig What I mean? 00:43:47 Dr Jeroen Vendrig It turns out there is acceptable false positives and non acceptable false positives, so it's it's not so easy to deal with, but it's very important to the user acceptance of your system and we're trying to take some measures to deal with that and and when I heard that, it reminded me of something. 00:44:07 Dr Jeroen Vendrig A very long time ago. So this 25 years ago, a television manufacturer, they added voice recognition to their televisions. You could give them some very simple come. 00:44:17 Dr Jeroen Vendrig I mean, back then this was very advanced at the time and I had an avatar. So. So a little human that would then respond to your voice. 00:44:26 Dr Jeroen Vendrig But the voice recognition wasn't that good, so it messed up people didn't like it. And the interesting thing that I did is then then they replaced the avatar. So the actual technology exactly the same. 00:44:37 Dr Jeroen Vendrig Instead of a a little virtual human, it was now a dog and all of a sudden people accepted it and said, yeah, this is great, he said. 00:44:44 Dr Jeroen Vendrig Well, what about it doesn't? It's not always come right, I said. Oh, yeah. But everybody knows dogs don't understand. 00:44:51 Dr Jeroen Vendrig So no change in technology, just in how it's presented can make a huge difference. 00:44:58 Dr Genevieve Hayes I keep thinking of, do you remember Clippy, the Microsoft Word virtual assistant, and everyone hated Clippy? And yes, we're. 00:45:06 Dr Genevieve Hayes All happy using virtual assistants now it's just. 00:45:10 Dr Genevieve Hayes There was something about Clippy. 00:45:14 Dr Jeroen Vendrig Yes. Well, I viewers can see it, but I see some nostalgia in your in your on your face. 00:45:22 Dr Genevieve Hayes So what do you believe are the most valuable skills for data scientists who are looking to build a career in developing commercial AI based technology? 00:45:33 Dr Jeroen Vendrig Well, continue on the topic that we just talked about. I actually think the data preparation and all that preparation that you do is more important than the actual data science that you do on it. 00:45:47 Dr Jeroen Vendrig And and partly this is because we now have auto, ML and and all those things that can do a lot of. 00:45:54 Dr Jeroen Vendrig Or what used to be manual work, but if you put a wrong data in it or you put a wrong evaluation criteria in, they're not going to. 00:46:02 Dr Jeroen Vendrig And I think that is where maybe that's not a new skill, but more focus needs to be on that. 00:46:08 Dr Jeroen Vendrig And I noticed that that people in academia and data scientists get very distracted by getting 0.01 more out of their model. 00:46:18 Dr Jeroen Vendrig In my experience and Andrew Yang, lending AI famous for his Stanford course, he he's actually got a presentation in which he backs it up with data and he can show that with an amount of effort spent on improving the model and getting 0.02. 00:46:34 Dr Jeroen Vendrig Improvement, less effort spent on massaging the data a bit better, gives them a 10 percentage points improvement. That's where I think data scientists should focus on. 00:46:46 Dr Jeroen Vendrig Now I know in the world it it doesn't happen in my world, but I've seen in data science there's data engineers that are separate from data scientists. 00:46:55 Dr Jeroen Vendrig I don't completely understand the difference, but where I did see it I would say let's give the data engineer a bit of training. You get better cells than training the data scientists to be data engineers, partly because they don't want. 00:47:11 Dr Jeroen Vendrig So if you're a pure data scientist, you have to be very good or, you know, you might be surpassed by the data engineers that have a bit of additional training. 00:47:22 Dr Jeroen Vendrig Another point is look at the actual data, so that's not a skill, but that should. 00:47:27 Dr Jeroen Vendrig Be a habit I do find even in my team. I have to tell people. 00:47:31 Dr Jeroen Vendrig They come with all these numbers, etcetera says yes. But have you gone to the actual data and looked at it because we as a torture we we don't have a distribution of the actual world, right? 00:47:43 Dr Jeroen Vendrig So there might be something wrong with the distribution your your numbers are not going to tell you that you have to pass some human judgement. 00:47:49 Dr Jeroen Vendrig On that, but also we we often work in a feature space and we might not have the right features. 00:47:55 Dr Jeroen Vendrig So you have to go back and look there rather than spend months and months on squeezing something out, something impossible. 00:48:03 Dr Genevieve Hayes There's a paper that I found that another guest pointed me in the direction of and the people who wrote the papers actually demonstrated how you could. 00:48:12 Dr Genevieve Hayes Create all these different data sets with exactly the same summary statistics and you know you've got some that are just, you know, straight lines of data type thing. But they've actually one of them is actually a dinosaur. So. 00:48:23 Dr Jeroen Vendrig OK. 00:48:29 Dr Genevieve Hayes And and actually show that to my class because the point I want to make is. If you're just looking at the means and standard deviations, it does not tell you that you've got a dinosaur there. 00:48:41 Dr Jeroen Vendrig That's right. Yeah. So yeah, make that a habit. Of course. You still. You should still look at the numbers. 00:48:47 Dr Jeroen Vendrig As well and some doubts with that, yeah. 00:48:48 Dr Genevieve Hayes Oh yes, definitely. 00:48:50 Dr Genevieve Hayes They they look after the dinosaurs. 00:48:53 Dr Jeroen Vendrig So. So the other thing is it doesn't happen in AI that often, but in more traditional data science I noticed that some data scientists, even fresh ones from uni, they can't really code, which it surprised me. I I hope that the new outtake is not like that anymore. But even if they can codes. 00:49:13 Dr Jeroen Vendrig Software engineering skills can be quite useful and I think you you talked about your own journey on that and that it's a bit of a revelation. 00:49:23 Dr Jeroen Vendrig So I'm not saying data scientists should be software engineers, but some of the thinking in there can be very useful. 00:49:31 Dr Jeroen Vendrig And what surprised me when I worked with data science students is for for me, coming from a computer science background, there's not really a difference between scripts and programmes. 00:49:44 Dr Jeroen Vendrig And that's because I learn how to programme and hence scripting comes naturally, don't need to make any effort for it. What I hadn't realised is that the other way around. 00:49:53 Dr Jeroen Vendrig It's not obvious at all. I often we often start out with scripts, right? You're just toying around with things, and scripts are much better than a completely software engineered thing. 00:50:03 Dr Jeroen Vendrig But at some point you say, oh, yeah, let's take this to the next level. And you turned it into more software engine. 00:50:09 Dr Jeroen Vendrig It codes you can do repeatable experiments, parameterize it and everything like that. And I noticed that this data science this weren't able to do that and was it's not just ability, they had never thought about doing that. 00:50:23 Dr Jeroen Vendrig It's not a hard skill to learn if you can code basic software engineering, you should be able to pick it up quite quickly. 00:50:30 Dr Genevieve Hayes I think it's cause a lot of data scientists do all their work in Jupyter notebooks, so they've never had that experience of actually working with direct script files. 00:50:41 Dr Jeroen Vendrig Yeah. So I would encourage them to take those scripts and turn them into codes. The actual programming won't be that big a deal when you now have copilot, et cetera. 00:50:51 Dr Jeroen Vendrig Who? Who can help you a lot with that. But what? What you need to do is basically know what do I want to achieve with this and somehow express that. 00:51:01 Dr Genevieve Hayes Knowing what you now know about startup life, would you recommend it to our listeners? 00:51:08 Dr Jeroen Vendrig Yes, I would recommend it because otherwise I would I would. 00:51:11 Dr Jeroen Vendrig Leave it of. 00:51:12 Dr Jeroen Vendrig Course, but it's not for the faint at heart. 00:51:15 Dr Jeroen Vendrig I'd actually some discussion with our investors because data scientists have found that the best age for startup founders is like 42. It's it's a beautiful. 00:51:27 Dr Jeroen Vendrig Number of course. 00:51:28 Dr Jeroen Vendrig That shouldn't deter anybody, young or young or old. But what it does tell me as an interpretation is that getting some experience under your belt. 00:51:39 Dr Jeroen Vendrig In a bigger company probably helps you a lot when you do your startup. So technically you might be ready for it. 00:51:46 Dr Jeroen Vendrig So let's say you you focus your CTO and you focus on the AI part of it. But there is a lot of organisational stuff that you may not have been exposed to as a younger person and that's coming on you. 00:51:59 Dr Jeroen Vendrig Right. Everything comes on you when you're when you're founders of the business. 00:52:03 Dr Jeroen Vendrig Until you're big enough to hire specialists for that, I wouldn't really trade in my my experience at larger companies, I'm happy for that to be part of my journey. 00:52:14 Dr Jeroen Vendrig Maybe I should have left a little bit earlier, but at the same time it's very exciting to work at startups because of the flexibility you have. I have actually done a release. 00:52:24 Dr Jeroen Vendrig While on the phone with a client who who had an issue with something, so you can just do. 00:52:29 Dr Jeroen Vendrig That those are the less interesting things, of course, but you can basically hear something from a client and say, OK, yeah, that's an interesting problem. Let's do something about. 00:52:38 Dr Jeroen Vendrig That there's no paperwork involved, you just do it. There's no skunk works every for. Everything is skunkworks. Being that close to customers makes it very rewarding to do those things, which in bigger companies you usually very far away from customers. So that's the part that I would recommend to pursue. 00:52:58 Dr Genevieve Hayes That statistic you gave about how the optimal age to start a startup is 42. I've heard that statistic before, and what I think is interesting is, are you familiar with Ericsson's stage of life work? 00:53:14 Dr Genevieve Hayes So Ericsson was a psychologist and he basically divided the human lifetime into all these different stages. And there are different things that you achieve at different stages. 00:53:25 Dr Genevieve Hayes So it's starting with basically all the different stages of infancy there and the stages after that are basically consistent with primary school. 00:53:34 Dr Genevieve Hayes Secondary school and stuff like that. 00:53:37 Dr Genevieve Hayes But once you get beyond eighteen, he divides adult life into 3 stages. Early adulthood, middle adulthood, and late adulthood. Late adulthood is basically your retirement stage, so let's just forget about that. 00:53:54 Dr Genevieve Hayes But with the early and middle adulthood, early adulthood is basically from when you're about 18 until you're about 40, and that's doing all the things that you need to do to set yourself up for the rest of your. 00:54:08 Dr Genevieve Hayes Life getting an education, getting experience, working in jobs. If you're interested in having a family, finding someone you know, things like that, and then at around age 40, it transitions into middle adulthood, which is when you're doing whatever it is that's. 00:54:28 Dr Genevieve Hayes Going to achieve your life. 00:54:32 Dr Genevieve Hayes It might be raising a family, or it might be starting a startup so it just felt to me when I heard that statistic that 42 is consistent with where Ericson puts the start of the middle adulthood phase. 00:54:47 Dr Jeroen Vendrig Yeah, that's that's very interesting. And you might be right there. Listeners who are young shouldn't be discouraged with that. But what what I think might happen is. 00:54:56 Dr Jeroen Vendrig You might actually be well suited to bring a startup to a certain level and then merge with a bigger company for the time where you where you need that that life experience. 00:55:06 Dr Jeroen Vendrig So that's another way to do it. And there there are advantages to being young as. 00:55:11 Dr Jeroen Vendrig Well, because you have more energy. So to be frank, I can't. I can't do it. 00:55:16 Dr Jeroen Vendrig I can't pull an all nighter anymore. 00:55:17 Dr Genevieve Hayes I could never pull an all nighter, even when I was in high. 00:55:20 Dr Genevieve Hayes School, OK. 00:55:24 Dr Genevieve Hayes What final advice would you give to data scientists looking to create business value from data? 00:55:30 Dr Jeroen Vendrig Yeah, which repeat. So don't take the data as a given. Don't take it as fixed. You can make your own data set. 00:55:36 Dr Jeroen Vendrig I I think with the language AI breaking through recently to the bigger public, there's been a lot of discussion. 00:55:42 Dr Jeroen Vendrig That these engines are basically stuck in their feature space, so people call it like they're they're not conscious. That's the way of saying it. 00:55:49 Dr Jeroen Vendrig They can't go and sense in new data, right? And they they can't. And maybe they shouldn't, but as a data scientist, you can do that for them and you can do that in a responsible way. 00:55:59 Dr Jeroen Vendrig So that's what I would recommend to pay attention to. 00:56:04 Dr Genevieve Hayes For listeners who want to learn more about you or get in contact, what can they do? 00:56:09 Dr Jeroen Vendrig Yeah, LinkedIn is the best way to reach me. So my my name is pretty unique, so I will be easy to find. And yeah, if you want to chat more about AI or. 00:56:20 Dr Jeroen Vendrig Startups. I'm happy to do that. And if you're in Sydney, you can find me at several events as well that are happening here in the ecosystem. 00:56:29 Dr Genevieve Hayes And I'll link to your LinkedIn page in the show notes. 00:56:33 Dr Jeroen Vendrig Thank you. 00:56:33 Dr Genevieve Hayes Well, thank you for joining me today. 00:56:36 Dr Jeroen Vendrig Thanks for the interesting questions. 00:56:38 Dr Genevieve Hayes I had a great time. I learned a lot from this, and for those in the audience, thank you for listening. 00:56:44 Dr Genevieve Hayes I'm doctor Genevieve Hayes and this has been value driven data science brought to you by Genevieve Hayes Consulting.