Free Form AI

What happens when creativity is treated not as intuition, but as a system that can be studied and scaled?

In this episode of Free Form AI, Michael and Ben sit down with Nicolas Douard, Lead Data Scientist at the Virtue Foundation, to explore how AI and data science are being used to automate innovation itself. Drawing from Nicolas’ PhD research, the conversation examines TRIZ — a systematic framework for inventive problem solving — and how it can be augmented with modern AI techniques to connect ideas across disciplines.

The discussion moves through biomimicry as a model for interdisciplinary discovery, the use of knowledge graphs to represent and traverse complex domains, and the role AI may play in accelerating scientific insight. Along the way, this conversation unpacks deeper questions about creativity, discovery and whether innovation can be meaningfully formalized without losing its human essence.

Tune into episode 26 for a wide-ranging conversation about:

TRIZ as a structured methodology for inventive problem solving
Biomimicry as a blueprint for cross-disciplinary innovation
How knowledge graphs enable new forms of scientific reasoning
The role of AI in discovery, not just automation
Whether creativity can be systematized without being diminished

Whether you work in data science, engineering or applied research, this episode offers a thoughtful look at how AI innovation itself might become a computable process.

Note: This episode was released first on YouTube as part of Free Form AI’s video-first relaunch.

What is Free Form AI?

Free Form AI explores the landscape of machine learning and artificial intelligence with topics ranging from cutting-edge implementations to the philosophies of product development. Whether you're an engineer, researcher, or enthusiast, we cover career growth, real-world applications, and the evolving AI ecosystem in an open, free-flowing format. Join us for practical takeaways to navigate the ever-changing world of AI.

Michael (01:06)
Welcome back to another episode of Freeform AI. My name is Michael Burke. I do data engineering and machine learning at Databricks. Actually just machine learning these days. ⁓ I'm joined by my co-host. His name is...

Ben (01:17)
Ben Wilson, I build backend implementations for open source software that hurt my brain at Databricks.

Michael (01:24)
I'm straight. do today. We're speaking with Nico. ⁓ he is a buddy and colleague. we've been working together for the past 18 months on virtue foundation. ⁓ and he is currently the lead data scientist there. ⁓ virtue foundation is a nonprofit whose mission is to deliver a high quality healthcare to those in need. And they do this via a variety of data and AI pipelines. ⁓ he's also a research fellow at Manhattan college at the QCNA lab, which is spelled Q.

open parentheses, C I N E close parentheses. ⁓ and he's currently getting his PhD at the university of Strasburg. So Nico, ⁓ we're to talk about a lot of really cool stuff today, but before we dive in, why have you chosen this field for your PhD?

Ben (02:07)
Thanks.

Nicolas (02:08)
That's a great ⁓ question. By the way, I started PhD in 2021. So that was before like ChadGPT or any of that was a thing. I was already in data science, so I wanted to do a PhD in data science. then the specific topic around looking to automate innovation and taking a big picture approach as like...

solving problems that can have industry, like applications in industry, ⁓ beyond just, you know, solving like a little niche problem. Like I didn't want to do a PhD in pure computer science because you're often looking at like something very narrow. And I liked the scope of the, the, the research I've been able to, ⁓ to do in this context.

Michael (03:02)
Yeah. So applying data science and machine learning techniques to natural language has been around for a really long time. But, um, back when you started, you were using things like BERT, which was very pre GPT era. Um, how has your research changed over the past four to five years?

Nicolas (03:19)
Oh, we had to rethink everything. mean, what we were considering, you know, in 2021, would not be, you know, it would just like not be a serious approach now. Right. I mean, like the paradigm back then was very much still, okay, we're going to build some models. We're going to do that with some annotated data. We're going to maybe hire people or like solicit people, like experts to like annotate large data sets. And then you're going to try to build something out of this, but it's changed completely. mean, like, you know, you can.

You can use the LMS in some capacity, like to create data sets. You don't need like the army of interns that, you know, the lab previously relied on to create these data sets manually. Maybe you need some oversight, but you just don't need as much, uh, manpower. So yeah, the paradigm has shifted, uh, kind of like as the work happened, which made some things more difficult, but also it's been, you know, exciting in terms of, uh, possibilities of like looking at the problem from

new angles and actually bringing better performance, better results.

Michael (04:25)
Right. So we've been talking about all this stuff. What is all this stuff? Like what's your PhD about? What are you trying to do?

Nicolas (04:33)
I ask myself that every day. At a very high level, we're looking at creating bridges across disciplines. So a typical example could be biomimicry where you may start with, let's say, an engineering problem and you're looking at biological principles that may bring elements of solution to solve that initial engineering problem.

So biomimicry is well known. There's websites like Ask Nature that pretty much have a database of ⁓ successful biomimicry cases. But that's great. But that's manually curated. And we're looking at how can we, in a sense, automate this, automate these kinds of association, and also going beyond just engineering and biology, but more generally, a framework to interface knowledge across disciplines.

There's a lot of challenges involved, but that's the area of research.

Ben (05:36)
I'm curious what you got your initial inspiration from with regards to this. you talk to a professor or something or work with somebody in industry who you're like, okay, they studied this for education, but they do this completely different field now and they're seeing things that other people aren't seeing? Or is it something that you just researched over time and read about, like this seems like something that like generative AI could help out with?

Nicolas (06:07)
So, yeah, there was a connection of a professor that basically inspired the research topics, specifically Professor Cavallucci, is one of the pioneers of something called TRIZ, so T-R-I-Z, the theory of inventive problem solving. ⁓

Michael (06:29)
Wait, hold up. T-R-I-Z? That sounds like not the acronym.

Nicolas (06:34)
Yeah, it's in Russian. couldn't pronounce it properly in ⁓ Russian. So this goes back to somewhere around the 1960s, 70s. I may butcher the story a bit on the exact dates, but roughly you have this Russian inventor that back in the Soviet Union believed that innovation, invention could be something that's objective and could also be something that's... ⁓

can be regarded as systematic. So you can derive rules and you can derive almost back in the day algorithms to innovate. So he started with like a study, which by the way, I mean, I don't think you could, you could have done that back then in another context than the Soviet Union, because it required a lot of like central planning and like subordination to make it happen. But manually, basically he tasked like different patent offices across the Soviet Union.

with the task of reviewing patents following the methodology that he had created. So basically he gave like, you know, human reviewers rules to like analyze patents and then derive information that he aggregated like kind of like, you know, ⁓ compiling information, like, ⁓ like lower channels and then like integrating it at the top. And like, he was at the top putting together this, ⁓ matrix of inventive principles.

⁓ and the engineering parameters. So you have like roughly a 40 by 40 matrix with, you know, engineering parameters. So that can be things like weight, speed, things like that. And then like at the intersection of these parameters, you're looking for inventive principles. And so this is pretty much like one of the initial byproducts of Tris. So,

This is a very handy tool for engineers, right? Because you can pretty much frame your problem in terms of opposing parameters. That's also, I I talk about this, because I've seen it a lot, but a key premise of this is framing your problems in terms of opposing parameters, because nothing is free, right? So you get, if you're gonna increase something, you're probably gonna degrade something else that ideally you wouldn't wanna degrade.

So if you're looking at, for example, mechanical resistance, great, I want to make my thing very sturdy. I want to make it almost unbreakable, but then maybe I'm going to degrade the aerodynamic dynamism. I'm going to degrade the mass properties. So there's always trade-offs. And so like a key idea of trees is like formulating things in terms of trade-off and then applying this matrix to identify inventive principles that may

help you solve that problem. So that's kind of like the super high level background. And I mean, a lot of people would like, I can already hear my professor like screaming at my explanation of, you know, what is, what is truth. But that's like the super high level review. Of course, there's a lot more, there's a lot more to it.

Michael (09:49)
So remember you explained this to me in a coffee shop like two months ago and I sort of get it, but I also still sort of don't. So can you walk us through a simple example? Like when you said biomimicry, I immediately think the Shinkansen bullet train in Japan and how the front of that was designed after the Kingfisher bird's beak and that reduced the amount of sound as it passed through the air so that it made it less disruptive to neighboring cities. So that's like a great example of biomimicry, but

Either with that example or another simple example, how would you apply the Tris matrix onto that?

Nicolas (10:24)
I mean, the pattern is always the same, right? So it's like, can take, you know, it can take a minute, right? Like you kind of have to sit down, like to formulate exactly like what your problem is, like what's your, you can think of it as a research question or an engineering problem, but you have to define a specific problem you're looking to solve. Then you're gonna try to decompose that problem in terms of contradiction, right? So in the case of, I mean, a typical example,

You could be looking at the chinkansen or you can be looking at a plane. Let's say you have a plane, want to increase ⁓ the number of seats. You want the plane to be able to accommodate more passengers. But in doing that, maybe you're going to increase the mass. And then when you increase the mass, that's going to in turn cause a bunch of other problems around fuel consumption or maybe the weights of the structure or maybe impact the aerodynamicism. So you have to break down your problem in what's

in the trees vocab is known as contradictions. So that will help you turn your specific problem about either the shinkansen or the plane into a trees general problem. Right? So now that you've done that, you can look at the matrix and you can look at the parameters. So you're going to try to like fit your contradiction parameters into parameters that exist into the matrix.

And by the way, like let's actually pull up the trees matrix that's gonna make it a little more concrete.

Michael (12:01)
So it sounds like it almost maps a specific problem onto a hyperspace of principles. And then you can search through that hyperspace more easily.

Nicolas (12:13)
Yeah, it's like an abstraction framework, right? So like you go from a specific problem to general problem, then you solve your problem in the general space, the abstraction space. And then you return. Yeah, that's always this pattern. And then you return to the specific, the specific problem. So I just.

Michael (12:23)
and then return to the specific.

Ben (12:34)
this thing has a name that this is my first time hearing about it. But what you're explaining is what I've lived through at several different companies in different industries. So semiconductor manufacturing uses a very similar approach, particularly when you're getting into R and D nodes. So when you get this new like chip design, when I was at Samsung, we were like,

going from 45 nanometer to 32 nanometer and then we did 28 and then jumped to 14. But each of those tech phases that you do, you get the spec from R &D that's like, okay, here's the photolithography printout of each of the layers of this chip. But when you get into the engineering aspect of it, you have the same tools, maybe you have a new photolithography tool to print at a smaller pitch depth of

transistor gates or metal lines, but you now have to solve the engineering problem of how do I actually deposit this stuff in a vertical stack? How do I do things like do my dopants properly? How do I etch the 3D typography properly? How do I do wet cleans properly so I don't erode things? And it's all a zero-sum game.

If you want to get rid of leakage current, you have things that you can do that create other problems. So it's all about finding that new balance. And we would approach it in kind of a similar way. It wasn't like a 40 term generic aspect. It was like thousands of terms that would go into it. And you would have to run simulations, like design of experiments, to be like, where is this failure mode for this particular thing?

Michael (14:28)
Wait, the simulations, would they be physics based?

see if stuff explodes.

Ben (14:40)
Exactly. Or how it explodes. You want to see the film.

Michael (14:44)
When

it explodes, why it explodes, who it explodes.

Ben (14:47)
and collect terabytes of data for each run.

Nicolas (14:51)
So I had a minute to pull up the matrix, I guess. So we're gonna look at a concrete example. the first thing I have to do is pick an improving parameter. So let's say that we're, you

Michael (15:03)
You

can share screen on this. have been doing video, but let's talk through it as well.

Nicolas (15:09)
Okay.

So yeah, that should make it little more concrete. So for example, like let's say we're looking to solve a problem where the improving parameter would be the durability of a moving object, but then the degrading parameter would be the weight of that moving object. All right, so I'm going to select my improving parameter and then I'm going to select my degrading parameter. I've just selected two actually. So I'm looking at the intersection of the two and

it gives me basically a list of weighted principles that can help solve that problem. And this is again derived from Al Schuller's historical study of a very large number of patents. So I can look at periodic action, I can look at combining, rejecting or regeneration of parts or produced materials. And this is a simple case where I've picked one improving feature and one degrading feature. But where it becomes really interesting,

Michael (16:13)
Before we make it more complex, you explain what is this? What is an example of a durability of a moving object or the weight of a moving object in the real world?

Nicolas (16:28)
But it could go back to your chink and send example, right? Let's say you want to make your train more durable, but then in doing that, you're going to make it heavier. You're going to impact the weight of the moving object.

Ben (16:40)
make the front of it out of tungsten, it'll be more durable, but it'll weigh a hundred more tons. What you're showing on screen, I'm having flashbacks right now to a SaaS software and particularly like the jump simulator feature where you can put in a DOE model and all of your collected data. They have like simulator sliders that do this. And remember we used to use it every single day.

Michael (16:45)
Got it.

Ben (17:08)
We're like, well, what if we, and when I've used this at several different companies, I used it at Cree as well when we were doing blue LED research, like, well, what if we make the quantum well like 10 % thicker? And as you move that slider, it shows you like, you're gonna crack. So you need to like lower your PGAN thickness. Well, when you lower your PGAN thickness, that means your forward voltage is gonna go down, which is gonna create leakage. So ⁓ it was like this massive simulator that you could do with.

with empirical evidence and see all these relationships. So you're showing something where you're tying this to gen AI and having sort of inference of these relationships, right?

Nicolas (17:52)
Yeah. So, I mean, this is why it's like, this could be a conversation like somewhere in between like 15 minutes and like 80 hours, but like we're in the seventies, right? Like everything I've shown so far. So, I mean, the research interest for this becomes, okay, well, how do we combine this with super broadly speaking AI, right? And then you derive like basically a research copy.

a research topic called Tree's IDM, which is Tree's Inventive Design Method, which not only makes trees more actionable, that's one of the objectives, but also looks at combining it with NLP to try to automate that process. And then you're going to be looking at two kinds of input data. You're going to be looking at patterns.

Because at the end of the day, I this is a great descriptor of innovations and also the origin study for trees. And you're going to be looking, which is personally what I've been doing more, at scientific literature as an input. But it's still interesting to see, mean, like trees as the initial foundation for this research since then has changed a lot in scope and breadth.

However, a lot of people have tried over the years to replace the Altruer matrix, they failed. There's no consensus on a new, better matrix that's as universal as this one.

Michael (19:31)
Got it. So what you're doing is you're basically aggregating a bunch of scientific papers and patents and things like that, putting that all into basically a file system and then doing what?

Nicolas (19:42)
Magic. Yeah. Well, personally, I've explored like a couple of different approaches, but so going back to what I'm specifically doing. So looking to create explainable bridges across disciplines to help solve problems. So ⁓ one thing I've done is, you know, I've built ⁓ a model that will help predict

Michael (19:45)
Okay

Nicolas (20:13)
from an abstract whether a scientific article can be regarded as ⁓ multidisciplinary or not. So I mean, of course there's a lot of metadata, but I don't consider it accurate. So this will help narrow down in like a large pool of scientific literature we're dealing with like aggregated data sets of like potentially a ⁓ couple hundred million abstracts.

you need to narrow it down because your goal is to present information that's useful to a human at the end of the day. So the initial step is to start with this super large pool ⁓ of literature. Narrow it down maybe first by discipline. So there is generally a scope of STEM disciplines in which it works better. I don't pretend yet that I can connect an engineering problem with like a

sociology document or philosophy documents to, know, that's a little too long shot. Maybe it's feasible, but I personally can't do that. So we start with STEM, right? So like that yields like maybe a hundred million abstracts. We feed that into a model that will say, okay, so this is the roughly 25 % of the hundred million abstracts that are most interdisciplinary. So great candidates for your use case.

because I care about abstracts that will potentially explore themes together. So now I can potentially create an associations between these themes at scale, which is, which hints again, the direction I'm going. So I'm building a graph, right? So we take this subset of, let's say 25 million abstracts that can be regarded as very ⁓ interdisciplinary within STEM. And we're going to use topic modeling. So

we're going to derive a bunch of, a bunch of topics which we can label. that's where we can use new elements to do that. So we can say, well, this is more of a biology topic. This is more of an engineering topics. If I take, if I keep with this, with this example and now what we can do is actually look at ⁓ scientific articles that fall within two topics. So, because at the end of the day, you're just predicting whether an article belongs to a topic. So you can say, okay, well,

This article has, don't know, like a 45 % probability of belonging to this engineering topic. It also has, I don't know, like a 52 % probability of belonging to this biology topic. But now I'm going to do this at scale. So it's going to become interesting. So I'm going to start to be, to derive signals. So I can like look at, can basically think of it as a graph with like nodes that have to do with different disciplines. Some nodes will be engineering, some nodes will be biology. And I can look at the

the frequency of associations between those two topics in ideally the entirety of human knowledge or as far as I can like realistically ⁓ harness that. And so the end product of this is a graph where you can basically enter from like a research question. So let's say I have an engineering research question. I'm gonna based on that, select some nodes in my graph and

I'm going to be able to see, well, what are the associated themes in biology that connect with these notes that may in turn bring elements of innovative solution to my engineering problem.

Michael (23:43)
Let me restate why that's so freaking cool. You can type in a question and it will return similar fields that you should draw inspiration from to solve that question.

Nicolas (23:55)
Yes.

Ben (23:56)
What if you fold it in, like when you look at a lot of research papers that are out there, right? A lot of theses easily get written and filed or things that an academic institution files. When you talk about like somebody who's an end user of a system like that, they typically want inspiration in such a way that's not guided in the same way that pure research is, but it's more like I'm trying to build X or improve Y.

I'm going to be working on something that's going to solve a real world problem that we have right now. Do you have a feedback loop that links research papers that are done to real world, either monetization of that, that, that, that research or some products that were successful that came out of it. And here was the, the Genesis, the synthesis of that.

that eventual successful product in this particular space. So you're like, in the biology research, this thing actually, know, this foundational research that has aspects of engineering to it resulted in the creation of 30 drugs that solve real problems. But I'm solving this engineering problem that is kind of similar, I guess, in that relation. I also only want to surface things that resulted in

my field making something of. Does that make sense?

Nicolas (25:27)
Yeah, I think I see. I well, there's a couple of facets to it. So I guess like, in terms of, are you talking about in terms of the, well, the graph itself or the applications or like, what can be derived from it if there is like something quantifiable?

Ben (25:45)
More like filtering. If you could be like, if we took the total sum knowledge of all of the research papers that are filed through Google Scholar, how many of those things are sort of delayed in nature? Like, hey, somebody filed this 50 years ago, nobody's ever willing to do anything with it. 2022, where it all of a sudden becomes revolutionary versus things that were like, hey, this is filed in 1981.

And it, created an entire industry by 1984 versus this was filed five years ago and it's been debunked or like nobody takes it seriously anymore.

Nicolas (26:28)
No, I mean, this is kind of like the core of and also the key challenge with like making this useful, right? It's like, once you have this graph, how do you use it? There's like, there's a, there's a ton of challenges. There's like, first of all, there's noise, you know, I mean, so you've got to put a cutoff somewhere, like what, what are the frequencies you consider? What are those? I guess what's, you know, a minimum viable frequency for the thematic association to, to make sense. So you can say, okay, well, I'm going to put a,

some kind of a threshold, maybe it's going to be a dynamic threshold, depending on the topic or the discipline. But if you do that, you're at risk of just pointing out the obvious. Right. Right. And so it's like, intuitively, maybe what you care about if you're looking at this as a tool for ideation and innovation, you're not in the stage where you're in your control simulation environment playing with the sliders.

and trying to like, like optimize your, your solution. You're more in the space where you're like, all right, so this is my high level problem. How can I come up with, how can I even, what can I do? You know, how can I, how can I try to like make a dent at this to like come up with something that hasn't necessarily been done or implemented before. So we're like in the ideation phase, like, so this concerns like more like R &D and in companies typically like R &D people, there's also I mean, academic angle.

So going back to the graph, the challenge is, so, you know, I can look at like, heavy weights, right? And that's probably gonna yield some kind of like a safe solution, but it's also gonna be a well-known solution. Now looking at like little weights, right? It could go either way. It could be nice, or it could actually be, you know,

innovative breakthrough that has been like underexplored, like to your point, you know, was somebody came up with it like 40 years ago and nobody or 20 years ago, nobody's ever like derived something from it. This might actually be the thing that today you would love to find, right? As the human user of this tool. so I think making kind of differentiating those two, being able to like filter out the noise and also at the end of the day, like explainability is a challenge.

And this is also because we're in the business of helping people have good ideas. so presenting the same information to one inventor versus another inventor is going to yield different results. Maybe I'll show something to Michael. Michael will say, my God, this is such a great idea. We should implement it. I'll show something to Ben and Ben will tell me, well, I don't see it. Maybe we should explore other things. there's also that aspect.

Which makes it's not an entirely quantifiable outcome. ⁓ Yeah.

Ben (29:36)
But the power of that, if it is flexible in either way, and you could get a sufficiently powerful reasoning model to say, like we hooked this knowledge graph up to say GPT-5.1, right? Or the latest OPUS model from Anthropic and say, here's my problem statement. write seven paragraphs of text that describes what I'm struggling with, how I want to

solve this in some novel way and then list out like, we've explored the following 17 things and they're dead ends. Like inspire me and for it to go out and retrieve those documents and then summarize them and combine them together into a reasoning chain of logic. Is that the intention is to have the sort of cutting edge reasoning models to go in and kind of do a little bit of homework, read through?

hundreds of thousands of pages of text and distill it down to say like, here's kind of what you might want to try.

Nicolas (30:41)
Yeah. I mean, it's become the intention now. So, I mean, when I started this, like, ChaiJPT was not a thing. Yeah. And now it's like, well, this is presents actually a great opportunity to make it usable. Cause that was always the key challenge is like, okay, you give people like a graph, but people don't want a graph. It's like, so, so how do you turn this like research artifact into like a usable tool or usable product? And so, yeah, of course. I mean,

This presents in my view like a ton of opportunity with the current developments and integrating that in some kind of agentic formats where you can explore the graph by in a sense like challenging an agent. You could imagine a lot ⁓ of ways this could be exploited. I mean, this is what I'm trying to do next.

of this.

Ben (31:42)
I don't know if you you actually sat down and talked to ⁓ either Opus or GPT-5 and asked them their opinion on connecting to something like this?

Nicolas (31:56)
I think I should.

Ben (31:57)
You definitely should. I did it ⁓ two weeks ago on 5.0. I gave it a... Because I've got this long running series of projects that I do with different of these like reasoning models, particularly the coding agents, having it build like prototypes of just crazy things, because I'm just curious, like, what happens if you get short and medium term memory? How are you going to leverage that? What if I give you the permission to like write your own schemas and edit freely?

You know, can do full cred operations on your own memory store and you don't have to check with me. And they all love that idea. Like universally, they're like, this is, this would give me so much power and I could learn more about our interactions. Like, yeah, great. Sycophantic. But the one that I did last week, I was like, if I were to index, like through a vector search index, as well as put Lucene on top of this, like wrap.

massive deployment of Lucene, like Elasticsearch basically, over the entire contents of the United States Library of Congress in digital format. What would you do with that? depending on which model you ask, they have remarkably different answers. And I think it's guardrails that is being triggered. They're like, well, there's a lot of copywritten material in there, and I'd have to be very careful about how I could use that.

I don't think anybody would give me access to all of this. ⁓ But if you get it past that guardrail check of like copy protection, intellectual property stuff, they all really like the idea. Particularly if they have some place where they have like scratch storage memory of being able to summarize and then write in short form notation their own summary of topics so that they can do fast access later on. And as like simulations, I've taken some things and

put it into like document retrieval and started asking like, how much better can you answer this question if you use this tool versus you don't? And it's amazingly how different the response has become. I couldn't even imagine how powerful it be for you to build an agent that talks to that system that you're building. I think it will fundamentally change the way that particularly STEM based research is done. If you have access to all of that.

Nicolas (34:21)
Yeah, that's the intention. On the other hand, I mean...

⁓ How would you?

How would you, I mean, if you had to choose, how, you have this graph of knowledge that's like somewhat dynamic, that's pretty much corresponds to what we described so far. Like what would you, what would you personally do with it?

Ben (34:51)
I would ask all the nerdy questions. mean, I think in my space, I think there's a little bit more determinism in the type of work that I do, right? I'm a backend software engineer that's building like open source tooling. There's novel things that we do all the time, like feature implementations and ways of architecting the code. But it's not like we're blazing trails constantly in what I do right now. So I wouldn't use it for work stuff that much.

But when I look back at what my job was as like a research person at previous industries, if I had a tool like that, when I was trying to solve problems like I was making the Blu-ray disc stuff, like we had issues with liquid stamping and UV photo curing in something that we had to imprint 25 gigabits worth of pit information onto a reflective plane.

And that had to cure in 400 milliseconds. So we have these like super tight windows about productivity, you know, throughput. We had to take in disciplines of material science with like the resin type that we were doing. What is the viscosity? How fast can it spin? What is its capture rate of like oxygen bubbles? What are the, what's the geometry of the nozzle for the dosing? How much pressure do we have to put in? What temperatures does the resin have to be at? There's so many.

variables in these models that we were trying to capture in our head. And we had data that we were messing around with. We built some rudimentary regression models to try to figure this out. But a lot of it was just a ton of trial and error until somebody just has an epiphany. And they're like, ⁓ I saw this paper in this completely different industry. They're doing stuff with making sure that the viscosity of heated steel goes into a mold properly for like,

forging of engine parts and somebody finds this and they're like, do check what they did here. And like, let's try that. Like maybe that, and we try that implementation, like change, like just take it from the paper and we're like, let's go to the CNC machine and manufacture that and see if we get better performance. And all of a sudden it's like, if we had known that six months ago, like our yields would have been 98 % instead of struggling every day to get it above 65 % consistently.

So having that knowledge association across, ⁓ like we're not doing like steel foundry work. We're not manufacturing things for like marine engines or some of the other crazy stuff that we've done. It was even weirder in Cree with LED stuff. You're looking at stuff with material science and crystal growth. That's like things that got filed by DARPA in the 1960s about what they were doing with like crystal.

dislocation analysis with epitaxial growth of like, know, silicon. You're like, okay, that actually has an application here, but I never would have looked into that stuff that they were trying to solve for like the Apollo space program. And like, nobody knew about it. It was written, like people in the aerospace industry know about it. But what we were doing, like nobody knew until somebody found it. And they're like, let's try that.

Nicolas (38:10)
Yeah. Well, that colleague, you know, finding that paper that's like, wow, let's try this thing. That's pretty much what, ideally, I would like to automate. it's, but then it's like, creates, you like we said before, it's like, you guys were inspired by that paper, maybe, you know, someone else would have been inspired by another paper. So it's like, I think there's like a context that goes just like, beyond having the right information.

There's something inherently human to it, right? It's like, what creates this spark of vision, invention, or creativity in your brain? What makes you react that way to that paper if you want to push it a step further? ⁓

Michael (38:58)
That's actually an outstanding segue. So I've thought a lot about, I'm not well versed, but I have spent a lot of time thinking about machine creativity. If an AI builds a painting, does that look good? If an AI makes a movie, does that resonate? ⁓ And also fundamentally, what is creativity as? It's represented in a knowledge graph. So it's disparate connections between two different areas that otherwise have not been connected before.

That's also useful. That's one of the working definitions of creativity. So you've created this graph. You've also created edges of the graph. ⁓ But Nico, I'm curious. You've clearly spent a lot of time thinking about this, and Ben as well. Is this a representation of creativity? How does this apply to the broader field of AI intelligence?

Nicolas (39:56)
That's a great question. It bothers me to say that it's a representation of creativity. It's a representation of knowledge that is intended to help creativity. But at end of the day, have people using this as either a repository of knowledge or as a package into a convenient tool. But it's a little bit the same thing when you use

chat GPT or cloud or whatever, and you find their answers useful. mean, at the end of the day, it's like, you're making, you're still making the decision. Maybe like in genetic stuff a little less now, but you're, still like, all right, Michael, I like this. I don't like this, you know, so someone really smart at some point said, I can't forget like who that was, but like basically like, you know, all LMS do is like hallucinate and some of these, these hallucinations we find useful.

Michael (40:54)
Well, just like push back. I mean, I probably agree with you. And like as a self-centered human, I want to believe that I'm special, but like when you break it down into the nitty gritty atomic definition, like creativity, I'm looking at Google right now, the use of the imagination or original ideas, especially in the production of an artistic work. Well, all right, that's not super great, but if we double click into imagination and original ideas, specifically original, a lot of originality comes from

disparate themes being connected. So if you take a step back philosophically, you've never experienced anything in the world. Can you have creativity if you don't have entry points to sort of play off of and build off of? ⁓ I would argue no. And another argument that I've heard is intelligence is the amount of disparate facts you have. The more facts you have in your head, the more you have sort of raw material to play with and combine.

probably disagree with that as well but it's an interesting angle where you have this knowledge graph and the connecting piece is tris. The edges are tris.

Nicolas (41:59)
I was going to say it goes back to Al Schuller's vision, right? Like the creator of trees. Like he would certainly agree that yes, creativity can be like seen as something like systematic and objective. And it's not just like, you know, human magic. It's something that you could, you could automate. I think that that's the premise of his, of his work. And it's a lot of people don't like that, right? Because, know, like, especially like trees.

being applied to engineering. In companies, you see a lot of pushback because inventors want to feel special. They don't want to feel like, there's this like method that actually I can apply and that's gonna, you know, help me come up with better ideas. Or maybe that's gonna in a sense, like, help me automate my, my work or, I don't know, help it. You know, it's like, I want to be like, no, I'm, yeah, I'm so unique. I came up with this, you know, people don't like that.

So it's like, think it's the same conflict. It's like before chat GPT, there was this idea that, like NLP, that's cute, but you know, never a computer will be able to like write a Well.

Ben (43:11)
I definitely can. I mean, my take on it is if you remove human ego from this topic, particularly you brought that up before we started recording about like, and you just said it again about the uniqueness of an inventor, you know, in in Western culture of saying like, hey, I need to stamp my name on this because it's about my self identity. It's about my future career progress.

to be like, hey, I was the one that came up with this and I feel good about the fact that I came up with it. In my opinion, like every epiphany I've ever had, I've never been deluded into thinking that, I'm a genius and I just pulled this out of thin air. The only reason I'm thinking about that topic is because hundreds of thousands of people before me laid the foundation of our global consensus of shared knowledge as a species.

Everything builds on one another as time goes on. You have certain facts that you have or information and understanding of concepts from either an abstract or a concrete way into your sort of world model in your head of how things work or how things could work. And I think the intuition aspect of that of like, oh, I have this epiphany on how to solve this problem or to have this really neat idea.

that's just our subconscious basically using the knowledge graph that's saying like how can I connect these things together? I don't think that's a conscious thing that our brains do. I think that's fully within our subconscious of like memory ordering and processing and retrieval. And I don't see that as any different as a system like what you're building being able to make those connections and then put that into a reasoning model to

generate text that summarizes some novel idea that is a basis of connections of disparate topics. don't know why people... Another thing is, I read an article the other day, one of the top trending things on Spotify was Gen.I-generated country music. It's basically like pop music and nobody knew that this wasn't an actual dude.

who is like singing these songs and stuff, but it was just trending and went viral like crazy. And if all you're doing is mimicry in art and something that is inherently popular is formulaic, there's a reason that pop music all sounds the same. There's a reason that country music all sounds very similar. A lot of country music. Anything that's like in the top 100 on any billboard chart,

Michael (45:53)
of country music.

Ben (46:00)
anything that is like considered a triple A video game, anything that is a blockbuster Hollywood movie that's being released. It's recycling tropes that have existed before that were successful. It's why it got better funding. That's why it got more mass exposure to people because the people who are trying to make money off of that are know that, copy something that works. Let's try to like have lightning strike twice here.

I don't think Gen. AI is incapable of any of that. I think it is inevitable if you're doing formulaic rehashing of something. But when you look at stuff like art house films or indie music that you hear that first thing, you're like, I've never heard anything like this before. I don't think this has ever existed. That's based on that intuitive gray area between things that have come before. Entire genres of music have started because of that.

Like the advent of blues or jazz. You know, there's somebody saying intentionally, I've learned everything I can about, you know, proper musical form and, you know, musical notation and what chord progression should be. What if we start messing with that and just going nuts? And that's just trying stuff. If you've ever jammed out on an instrument before, I mean, like, what happens if I do this chord progression?

on a guitar or a piano or something, or play this line and change key midway. What happens if I go from a C to an E flat minor? Does it sound good? And you're like, that, doesn't sound good. Or that sounds amazing. You know, that that's just the trial and error. I think generative AI can't do stuff like that. You can brute force and try anything. As long as there's a rule set associated with it. And arguably everything has a rule set. to everything does have a rule set, right?

Nicolas (47:55)
I mean, that would be the idea. But speaking of generative AI, there's a lot of voices right now in AI that say, the way LLMs have been done so far is kind of hitting a wall. Text is nice, but your understanding of the world cannot just be based off of text, right? So you have researchers looking at world models and like, I have a 10-month-old baby, right? And I think she has now a better understanding of the world than Chad GPT. Because- Of course.

Ben (48:23)
you

Nicolas (48:24)
she interacts with the physical world. She sees how things move, how things feel, how things smell. She has more senses. So it's like, is the next iteration of AI going to look like? Is it going to be like elements that also have a physical understanding, a space where they can reason? I that would be bit analogous to simulation in a sense. If you're able to have a simulation space associated with

with the physical world, like an understanding of that.

Ben (49:00)
Yeah, famously, I guess it was two weeks ago or something, Yann LeCun made his announcement, I'm leaving meta and I'm going to go focus on world model startup. And I think he summarized it pretty well in a statement that I watched of his about the first month of a human's and the volume of training data that is estimated, the number of petabytes of data that's coming into that brain.

to understand like, okay, I opened my eyes. What are all of these shapes and these lights and like, what is all of this? sensations within their body and the feeling of motivation. It's a driving factor of understanding the world as well. And then the development of true consciousness, like whenever that actually happens in a child's, you know, early life.

I guess his conjecture was you can't get that with like Transformers technology. Definitely not. Because it's mimicking just a text representation of things, pattern matching. They can get really clever with it. But yeah, to get a full understanding of like what is existence, you have to like viscerally experience that.

Michael (50:23)
Yeah, that's a different level. Yeah. So Nico, I wanted to conclude with one more topic, which is ⁓ sort of what's on the frontier for this space. We've talked about AI creativity, ⁓ scientific discovery via meta-analysis, and hopefully like prompting people via creative connections between disparate topics. ⁓ And you're defending your thesis very soon. So I'm sure this is somewhat top of mind. Like what are the holes in this methodology and where do think it can go?

Nicolas (50:53)
Yeah, I mean, I think what I want to do next is like, look at like better exploiting the graph, right? Like the kind of like the consumption layer or the exploitation layer for the graph. Like a challenge in what we've seen so far is like, how do you separate what is like noise from what is like potentially an emerging signal? ⁓ And then also like, how do you make this

whole thing like available to people to, for people to, to use. that's something, I mean, yeah, I'm actually defending next month and that's kind of what I want to do next. Right. So it's like, I want to put together, ⁓ basically a product that enables this graph to be, to be consumed, like put it out there and like, see what people want to, want to do with it. But as I do that, I also want to research, you know, like better ways of, ⁓

of like filtering it out, better ways of like navigating the graph. And there's like, you know, prior arcs, right? In terms of, in terms of that, right? So this is pretty much what's top of my mind. But that being said, like, to be completely honest, what's top of my mind right now is putting together the final draft of my dissertation and thesis defense. But yeah, I think the timing is amazing. You know, like,

being there now with the technologies that have emerged in the last couple of years, presents a lot of like new opportunities to actually make this more useful in a format where people can derive something from it. And we've done like a lot of like internal testing and prototyping with different experts in like engineering, biology, mechatronics, but it's a little bit like limited to the scope of the

research environment I'm in, I mean, given is quite interdisciplinary, but still like STEM focus. And yeah, I want to see, you know, what else could be done with this graph.

Michael (53:03)
Got it. And for filtering on usefulness, where's your head at with that? Cause Ben brought up an interesting point, just like throw a reasoning model at it, have it do a deep research check mark and say, all right, is there a valid link between these two things? It probably would miss some of the more novel links, but it could also creatively sort of reconfigure some of the words so that they fit better together and some of the concepts as well. So that's like an obvious approach, but Ben also brought up like actual physical experimentation with ⁓

any type of manufacturing process. So that's probably out of scope from like just a hardware perspective, but maybe from a software perspective, you could have it write code and iterate in the digital space. Like it seems like that's the final straw of defining useful or not is the experimentation piece.

Nicolas (53:51)
experimentation piece and at the end, the satisfaction of who's going to be the end user of the graph. So I think it would have to be somewhat dynamic, right? It's like, it's something where like you could level with it, but you could, your feedback, you know, as you consume the graph would like steer what's presented to you next. I think this would probably work best and also like the

we know the current technology enables it for like a years ago, that would have been a lot more, a lot more challenging, but you could think of something where, you know, you would selectively or you would dynamically select pieces of the graph by having a conversation with an agent. And then, you know, that would steer like what you see next. say, I don't like this, or I'm going to show you something else. Let's try to understand why you don't like this. And let's try to like, you know, narrow down to something that you find interesting.

And that's the part where it's like hard to come up with, you know, very like cookie cutter evaluation sets, because again, I mean, we're dealing at the end of the day with people and their appreciation. So there's like obvious cases, like, you know, you can benchmark that using like well-known biomimically cases, which like typically we've done so far. There's a couple like research articles out there where we see, so with our method, we're able to like, you know, re-engineer this

well-known solutions of biomimicry and that's great, but it's well-known solutions. But how do you benchmark the space of like unknown solutions? And I think that's the core of what you want to do with this next.

Ben (55:36)
I mean, where my mind goes way out in left field on this one is imagine a post-scarcity society where capitalism is dead and international borders are gone. And we start thinking like an actual species that's focused on prosperity and our future.

And no nature, like no knowledge that exists is sacred, right? Think about all of the institutions that are trying to work towards solving problems or like innovating in a product space of something that's doing something useful for humanity. And you had access to all of their R &D, all their documentation, everything they've ever tried. know, failures, successes, and put that in the graph. What does that then do?

Because right now, let's say I'm working for semiconductor fab company A and I'm competing against four other semiconductor fabs. We're all trying to go after this new tech node. Let's say we're going to four nanometer on pitch depth and we have all of these resources, 10,000 people spread across these four different companies and we're all trying stuff in parallel.

And we all have our own knowledge silos that exist within the company. But if all of those silos were just kind of grouped together and you could say like, Hey, we're trying to solve this one thing that we're doing with our architecture. That may have already been something that's solved by somebody else, or at least they've tried solving something adjacent to this in 47 different ways. And they've all failed with all of the data explaining the failure. Do you think.

that would potentially accelerate human progress.

Nicolas (57:39)
This is super interesting. mean, I think here you're like data set, like library of, know, trial and error, like things you've tried, things that have worked, things that have not worked and, know, where are you looking to go from a research perspective here that becomes your research question. So like the, you know, the initial approach is, all right, let's, how do we interact with this graph? I guess let's start with a research question, like how to increase the speed of the Shinkansen without degrading the...

let's say the aerodynamicism or like other mechanical characteristics. But in what you're suggesting, you would not just like ask a question like one or two sentences or maybe a paragraph, but you would dump your entire research. I think that could be, mean, part of the consumption layer for the graph in a way. And you could imagine like a framework where you would leverage

You would basically dump that into an agent that would in turn like go query the graph to like, you know, try to help you solve your problems.

Ben (58:50)
and it wouldn't need to hallucinate. And it would give you like here's some options weighted by probability based on what I see of connected components here.

Nicolas (59:03)
That's the part where I'm like, can LMs even do probabilities well? It's like if you ask an LM to give you a probability, it's not a real probability.

Ben (59:14)
It's just made up.

Nicolas (59:16)
So it would be more of a, I like to think of it as kind of like a steering thing, kind of like magic, secret sauce, whatever. But then you could potentially run other more explainable models on top of it.

Ben (59:31)
Right.

Michael (59:39)
Cool.

Ben (59:40)
Yeah, I mean like.

The way that I see a lot of these, even the most advanced reasoning models that are out there when I interact with them, including coding agents, which I use all day long every day now, is not this mystic that's like, oh, it knows so much more than anybody else. It's so good at this thing. It's like my typing assistant, right? I don't want to 10,000 lines of text.

I can give it instructions in like a couple of paragraphs and it generally follows those. And of course there's many iterations before it actually gets it right for like what I want. But I see these, these language models as just the convenient interface and they only become truly powerful when they have access to tens to thousands of tools that are deterministic in nature. can run things, but

If you're interacting solely with like the chat program that doesn't have tool calling enabled, I'm just like, yeah, I don't trust much that comes out of this thing. It's all just making stuff up based on its training data.

Nicolas (1:00:50)
But so, mean, this is kind of like, here, the one of the premise of this work is to make it also explainable, right? So like you can trace it back to like, you know, real papers, you can trace it back to like real, you know, elements in a data set. So I mean, to your point, you know, you can think of that as the interface or the consumption layer, which like differs from the graph itself and like the methodology employed for its curation and creation.

which hopefully is explainable because at the end of the day we're training that on scientific literature.

Ben (1:01:29)
I can imagine what you're researching, what you're building, not just from a white paper standpoint for pure research. I could imagine a product that you could release that is equivalent to how people use vector indexes right now in search engines internally within companies, where your methodology of finding within the confines and the space of a large enough organization, be like, hey, at Databricks,

dump all of confluence into this graph and like figure out where all of the connections are. What have we tried? Where are all of our design docs? What are the things that actually went into products? What are all of the dead ends? Cause we have that all documented. There's millions of pages in our R and D confluence. And if we had, and like our Google drive as well, like it's insanely large. If you could take that and put that in that knowledge graph and then wrap it. They can, a reasoning agent to that.

I can't even imagine how powerful that could be. What did the security team do to solve this problem? Right now you have to go and search for something like that or ask a question and be like, do document retrieval. But if you say, hey, I'm trying to solve problem x and y within the confines of these parameters, what else is available? And for it to go and find stuff that I never would have looked in, be like, by the way, ⁓

The team that handles logins, built something three years ago that's check this design out.

I think it'd be massively powerful.

Michael (1:03:07)
It's like TRIZ based rag versus semantic rag.

Ben (1:03:11)
Yeah, I think you have a product here.

Michael (1:03:15)
Well

Nicolas (1:03:15)
Stay tuned for next year. There will be developments.

Ben (1:03:24)
Yeah, I'll use it a hundred percent. I can already think of like many use cases for it.

Nicolas (1:03:31)
Love to get your feedback. Give me a Harley tester.

Michael (1:03:37)
Cool. We are at time. ⁓ Thank you, Nico, so much. I was going to summarize, but I'm just not going to. It's not really summarizable. ⁓ But the two things that I did write down on my notes are TRS is a matrix, or sorry, TRS matrix is an abstraction framework for scientific discovery, and we're on the frontier of AI creativity. So not the most useful summary, but I think factual. ⁓ So Nico.

If people want to learn more about you or your work, where should they go?

Nicolas (1:04:11)
Where should they go? That's a great question. mean, I guess they can find me on LinkedIn.

Michael (1:04:20)
cool. Look up his name on LinkedIn if you want to learn more. ⁓ And yeah, this was super, super fun as expected. So until next time, it's been Michael Burke and my cohost and have a good day everyone.

Ben (1:04:32)
Ben Wilson.

We'll catch you next time.

Nicolas (1:04:36)
Cheers.

More episodes

Chapters

What is Free Form AI?