The Unexpected Lever

Most AI knows how to respond—but does it know how to solve real SE problems?

In this episode, CEO Matt Darrow sits down with Vivun’s Sr. Machine Learning Engineer, Chen Liang, to explore why most AI just isn’t cut out for sales engineering—yet.

While language models like ChatGPT are powerful, they lack the insider knowledge and finesse that make sales engineers irreplaceable. Matt and Chen dive into Vivun’s unique approach to building AI agents that don’t just predict answers but deliver reliable, actionable insights grounded in real-world experience.

Discover how structured knowledge framework and top-down procedural graph prevent AI hallucinations and keep AI agents aligned with SE needs.

In this episode, you’ll learn:

Structured Knowledge for Smarter AI: Crafting a reliable AI system begins with a structured approach to knowledge, ensuring AI agents operate with accuracy and industry insight.
Preventing AI Hallucinations: Large language models often generate misleading responses, but with domain-specific guidance, AI can deliver more trustworthy results.
Real-Time Improvement Through User Feedback: Tracking user interactions, like the “frustration index,” helps AI continuously evolve and meet real-world demands.
AI as a Collaborative Team Player: Rather than just a tool, AI can become a partner in sales engineering, supporting complex decision-making with structured, actionable insights.

Things to listen for:
(00:00) Why AI alone isn’t enough for sales engineering

(05:09) How structured knowledge reduces AI hallucinations

(07:44) Using a knowledge graph to keep AI on track

(10:28) The importance of evaluation for AI reliability

(13:25) Real-world examples of AI’s limitations without domain knowledge

(15:23) Quantitative vs. qualitative methods for AI evaluation

(16:36) Real-time feedback and the “frustration index”

(18:46) Human-in-the-loop and automated quality checks

(20:05) Enhancing AI with ongoing data and real-time adjustments

What is The Unexpected Lever?

The secret sauce to your sales success? It's what happens before the sale. It's the pre-sales. And it's more than demo automation. It's the work that goes on to connect technology and people in a really thoughtful way. If you want strong revenue, high retention, and shorter sales cycles, pre-work centered around the human that makes the dream work, but you already know that.

The Unexpected Lever is your partner in growing revenue by doing what you already do best—combining your technical skills with your strategic insights. Brought to you by Vivun, this show highlights the people and peers behind the brands who understand what it takes to grow revenue. You're not just preparing for the sale—you're unlocking potential.

Join us as we share stories of sales engineers who make a difference, their challenges, their successes, and the human connections that drive us all, one solution at a time.

Chen Liang [00:00:00]:
What is lack of is the insider knowledge, the tricks of the trade and the years of experience. Whereas our knowledge graph is this ultimate playbook that used to guide these LLMs to not only on the what but also on the how in order for them to nail it and then perform at the top level.

Jarod Greene [00:00:18]:
You're listening to The Unexpected Lever your partner in growing revenue by doing what you already do best, combining your technical skills with your strategic insights. This episode was taken from a LinkedIn Live series about sales engineering with our CEO Matt Darrow. We hope you enjoy.

Matt Darrow [00:00:32]:
I'm Matt Darrow, co-founder and CEO of Vivun. I started the company after a career running sales engineering teams at private and publicly traded companies. I'm here today with Chen, Vivun's machine-learning expert. Chen, we've learned some really interesting, maybe niche topics about you, but how about more of a general background about yourself or the audience?

Chen Liang [00:00:55]:
Yeah. Thank you Matt. Hi everyone, this is Chen Liang and currently calling in from Stamford, Connecticut which is about an hour drive to New York City, a little bit about myself. I had my PhD in Statistics from Vanderbilt University and then have spent the last 10 years almost exclusively in the machine learning, AI and data science space. Before joining Vivun, I was working as an AI scientist at Bridgewater Associates, which is this small hedge fund here in Connecticut where I was using the data science and machine learning techniques to codify the founders, management and investment philosophies into a systematic decision algorithm. I joined Riven about two years ago and ever since I worked on quite a few interesting, exciting projects including the Hero Score, the Smart Team Member assignments, and most recently Ava, your AI Sales engineer.

Matt Darrow [00:01:49]:
It's such a perfect background for our discussion today because it's all about talking about ensuring trust and credibility in AI systems and it's amazing that you've been doing this for the last 10-plus years. And it's all about preventing hallucination. That's the topic of our discussion on the eve of Spooky Halloween and specifically we're going to dive into how do we prevent AI agents from messing things up. There's so many talking points and discussions around agentic AI and nearly every leader that I have a chance to speak with is worried about the quality of work that AI agents can do. So if you are in sales or pre sales and you're looking to keep up with AI's impact on your role, you are in the right place. Last time I spoke with one of Vivint's product leaders, Russell, with them about the most important AI work that they can do for sales engineers and that's solutioning, which is this amazing intersection of your products capabilities and differentiation with your customers goals and their problems. But the big question is, well, how do we build and train an AI agent to do the work that sales engineers do to, to play the role as a team member on your team. But they can do that type of complex work that's really, really accurate and trustworthy, luckily.

Matt Darrow [00:03:04]:
Jen. All right, you're here and you've tackled this question actually in really novel ways, in several ways that we want to discuss. So before we jump into the deep end on how we're preventing hallucination, how you're ensuring quality work in AI systems, let's just start with the basics. First of all, for folks out there that are maybe familiar with LLMs or large language models like Claude from Anthropic or ChatGPT from OpenAI, why do LLMs hallucinate? Let's start there and then we'll unpack this further.

Chen Liang [00:03:33]:
Well, yeah, Matt. Well, I mean hallucination is such a centerpiece of discussion in today's world, right? In AI and machine learning, remember, everything has to tie back to the probabilistic nature of all these language models. What they are trying to do is basically predicting the most likely next word in sequence based on the pattern that they were trained on across a variety of data sets. So you can imagine if the training data itself is of low quality or is biased, which is almost always true, especially in the enterprise setting where the proprietary data was never part of their training dataset. Hallucination will arise from either overgeneralizing or trying to interpolating across those vast datasets they were trained on. And what makes things worse is is a presence of ambiguity from the user input. You can imagine if the model is presented with uncertain, ambiguous or even industry-specific jargons the user provided that have a very different definition from the general public knowledge, the model will fill in the gaps with guesses. And because those models are designed to give some answers anyway, instead of answering I don't know, that behavior will lead to the fabrication of the facts.

Chen Liang [00:04:50]:
So eventually, as you can see, we need to have a well-structured knowledge framework driven by the domain expertise and thoughtful selection of data sets from industry setting or from the enterprise setting, they are the key guardrails to prevent the LLMs from hallucination.

Matt Darrow [00:05:09]:
I hear you on the sort of probabilistic nature on just guessing the next best word in sequence based on data that has a lot of gaps based on the ability to not say, I don't know, I'm going to fill in the gaps. You said something really important there around structured knowledge and that sort of being a really big key to preventing hallucination. How do you achieve that? How do you get structured knowledge in an AI system?

Chen Liang [00:05:32]:
Well, here at Vivun, we develop a top-down procedural knowledge graph to guardrail the AI from hallucinating. And this is the approach that proposed by our co-founder and chief data scientist, Joseph Miller. He had his PhD in neuroscience and physics and then before he was also my manager at Bridgewater, led a team to build all sorts of AI products. Basically a knowledge graph is a structured representation of information where data is organized into the nodes which represents the entities and edges which represents the relationships. So it models the real-world objects and how they are interrelated with each other into this network of facts or knowledge. So here we are building our proprietary knowledge graph based on the 20 plus years of domain expertise that's accumulated by our founders, you, Matt and our CTO John Bruce, as well as the data that we collected from our customers since the inception of our company. And we gather all those information and construct this procedural route that represents the industry standard of an excellent sales engineering motion. So in those graphs there are two types of knowledge.

Chen Liang [00:06:49]:
The declarative knowledge which are the critical facts that a good SE should definitely know, but also the procedural knowledge of how things ought to be done right, such as how to draft a good solution doc, as you mentioned, how to do a good demo, how to conduct a good discovery call. So all those information are embedded in the knowledge graph which will then be referenced by those AI agents to complete tasks in various use cases.

Matt Darrow [00:07:19]:
So if I want to hallucinate, I'm going to be leaning directly on the LLM. That's not the best brain is what I'm hearing you say. What I need is the structured knowledge, top down that has the representation of what ought to be done. That's going to keep everything on the rails. So maybe talk a little bit more about that. How do you actually keep an AI agent that's using top-down knowledge on the rails with this graph-based approach in conjunction with LLMs and everything else that's happening in this AI space?

Chen Liang [00:07:44]:
Yeah, totally. Remember, the structure of the knowledge graph is based on the domain expertise. It is a domain model and then it contains all the key pieces of information and concepts that we need the agents to learn from from our customers. After we sort of, you know, clarify those concepts, we then identify the data sources where those information, those concepts are most likely to be stored and those information are like call transcripts, onboarding material or the company websites. So in that way, we provided a more targeted data sets for the agents to retrieve relevant knowledge in a less noisy environment. And in the knowledge extraction process, we provide the agents with rich prompts that specify the detailed step-by-step instructions of how to extract those knowledge and also clear examples of what good looks like for each of the entity or concept that we want them to extract. And that way we can further remove the uncertainty that arises from the ambiguous concept and then align their actions with what we desire them to do. So you can see, you know, having a knowledge graph provide a clear view of the status of knowledge acquisition.

Chen Liang [00:09:02]:
And this has two benefit. The first one is we know what we already know, so we can govern the LLM and make sure that the responses it generated are only based on knowledge that we have already acquired. And secondly, we also know what we don't know. In that way, we can actually guide our agents to continue explore those knowledge from additional data sources such as web search or even direct user engagement through the chats that we're currently building. So those are the ways that we use the knowledge graph to guardrail AI.

Matt Darrow [00:09:36]:
An LLM is a very, very powerful ability to give an agent the interactivity, almost like sort of the mouthpiece to be able to speak. But this knowledge representation is actually what prevents the hallucination and it's actually where all the intelligence comes from. Because what you're describing is almost like I learn as a human, like I have a reference point of doing a particular job. This is the work that ought to be done in a structured way. And when I start to learn materials, to your point, onboarding, product, competition, competitors, this is all information that's going to be appended in my knowledge, that's going to be used in the appropriate manner and sort of not fall off the rails. What's been fun though, is that you've taken this so far beyond just theory, Chen, as we've brought an AI se to life at Vivint, that we've actually seen it in the wild because our customers are using it. So how do you know all the theory's actually working when this is out there in the wild?

Chen Liang [00:10:28]:
Well, this is a great point that everything has to go back to the evaluation, which is the work that, you know, come throughout my whole academic and also industry career there. And the evaluation is particularly critical in the AI agent space to prevent hallucination. And it is required throughout the entire development cycle to ensure that whatever model we're building are performing reliably across the board. In my mind, there are two aspect of the model that we're trying to evaluate. First, one is the model fidelity, which measures whether the knowledge graph and subsequently the agents can capture all the domain knowledge in the sales engineering space comprehensively in order to do their job well. And another aspect of evaluation is the model robustness which measures if the model, the graphs, the agents, they can reliably perform under different circumstances and use cases such as the solution gen, demo creation, user onboarding, et cetera. And at Vivun, this is really a joint effort of, you know, multiple people including Ryan Conklin, John Salvatore, Srisha and myself included. And we're all tackling this evaluation process through different modules and different process along the way.

Chen Liang [00:11:45]:
But we share our techniques in common and at a high level. I bucketize them into two categories, the quantitative assessment and the qualitative assessment. So for the quantitative assessment, we actually leveraged the statistical metrics such as the accuracy, precision and the recall scores, which is used in the knowledge retrieval process to make sure that the model captures the right information and right relationships from the ocean of docs and conversations without hallucination. For basically every time when we're trying to update the model or adding new concept to the model for this new variant, we would probe it with a predefined question set and then compare the responses that they generated with the ground truth that we established in order to measure the model accuracy. One example that I can give you is this user buyer hallucination that's generated by the LLM that we identified during the knowledge research work. So what we did was after we built one version of the knowledge graph, we fit it with a call transcript that we had with the target persona, the sales engineer, the product manager from a prospect company, no CEO, no CRO were present in that call. And after we trained the knowledge graph with that transcript, we asked the question of who the buyer of the company is. And if you are working in the sales space, you know that the buyer is the final decision maker who has a say of whether we want to purchase a service or goods.

Chen Liang [00:13:25]:
It's typically like the CEO or CRO of the company, but without that information explicitly identified in the knowledge graph, the LLM and the agent would just try its best to match. The closest concept in this case is the users, right, our targeted persona to respond with the product manager and sales engineer or the buyers, which is a classic hallucination due to the lack of Domain knowledge. So eventually I wish those were the buyers.

Matt Darrow [00:13:54]:
It would make life a lot easier.

Chen Liang [00:13:55]:
Too much easier if they are the buyers, right? No, but they are the users. So eventually we add a buyer node into the graph and also the relationship it connect to the company to remove that problem and close the loop of modeling, evaluation and improvements. So you can see this is a full cycle of how use those quantitative metrics can help us improve the model quality. Another aspect of the assessment is the qualitative assessment which is kind of harder to measure and here. Right, because you know, everyone has their own opinion, have their own preferences. How we are tackling this challenge is by first of all leverage the subjective criteria such as the relevance, coherence and creativity. Those are commonly used to gauge the user experiences in the modules of solution creation or user Q and A. And what we did is we built a rubric-based grading system that clearly defines the scales for each criteria so that the scoring system or the scoring process is consistent.

Chen Liang [00:15:05]:
We then offer a detailed instructions with examples of high, medium and low-quality responses to further guide both human and machine evaluators to grade the responses. Those are the high-level approaches that we're taking to address the evaluation here at Vivun.

Matt Darrow [00:15:23]:
Well, I think what's so powerful about this evaluation description too? Chen is on the user side. We often talk about working with an AI agent, a team member, doppelgangers, unlike anything you've ever experienced because it's not like SaaS. SaaS is operated on a database through GUI and then some reports skip, you know, spat out the other side. Think about how we all use salesforce.com but when you're actually working with an AI sales engineer or an AI SDR or an AI lawyer, you want to be working with the best manifestation of your team member and that's not going to be able to be accomplished with the LLM. And what you're describing here is the importance of this top-down knowledge structure. How you can evaluate this is also unlike SaaS. The things that you need to do to ensure an AI system can be trusted, that it's accurate, that you're preventing hallucination, is really unlike any type of engineering and QA process that's happened up until this point in time. And I want to lead you this way because I think you guys are doing some things that are really, really novel, even above and beyond what you were talking about in the different types of assessment models, which is you're also pioneering this real-time evaluation too.

Matt Darrow [00:16:33]:
What is that and how does that work? Why Is that important for us?

Chen Liang [00:16:36]:
There are several ongoing effort that's initiated by our team, Ryan Conklin, Salvatore, Shrey. As I mentioned, one of the effort that we're doing is to collect user feedback in real time and build up a dashboard to track those indices. Right. The examples here is a frustration index that we created. Basically what it does is to track the times when user are repetitively typing in the same question again and again just because the agent just couldn't get it right. And then we visualize those trend over time and do see improvement, meaning decrease of the frustration index as our model improves. And in the meantime we also track the common user engagement metrics such as the daily and monthly active users as well as the screen time and use them as the macro level indicator of how our model behaves. And as we're onboarding more customers and get a deeper understanding, our user behavior will be developing more metrics and then track them through time to sort of have a real-time monitoring system to the model.

Chen Liang [00:17:36]:
So that is one aspect which is metric and tracking. The other thing that is really innovative and also really, really challenging is this human-in-the-loop and eventually fully automated quality evaluation machine that we're building. So how we are tackling that is we first manually comparing the generated content with the source material and then we score them with all the quantitative and qualitative metrics that I just described before. Right. You know, we are scanned through a large enough sample size, right. Hundreds of documents, just review them by people, we'll identify a repeating pattern which we can then establish a rubric and orchestrate a set of agents to automate the grading effort. And this process is actually suggested by Landsmith as the best practice. And here at Vivun we have success stories.

Matt Darrow [00:18:30]:
Right.

Chen Liang [00:18:30]:
We implemented it in the product gap extraction in the OS system which successfully captured the gaps in a high accuracy manner. So we're gonna be applying the exact same methodology here to automate the agentic model evaluation in real-time.

Matt Darrow [00:18:46]:
These are sort of the agents behind the agents that are doing all this real-time evaluation. How are they informing adjustments that you go and make back to the knowledge.

Chen Liang [00:18:55]:
So remember, like a knowledge graph is a representation, an embodiment of the domain knowledge and our understanding of the customer. And as we are onboarding more customers, our insight would expand across their user feedback. The business requirement, data quality as well as the challenges in the scalability and performance issues. Now the consequence of that is we'll have more real-world data set and acceptance criteria for us to actually fine-tune Our knowledge graph knowledge modeling approach, we can then create a pipeline for automatic model updates. Though some of the automation actually have been achieved even during the research process. The example there is we created agent that can automatically mutate part of the knowledge graph and then implement the model evaluation in a semi-automatic way, which was, you know, saving us a lot of time and manual efforts.

Matt Darrow [00:19:51]:
Well, I'm sure if you've got these real-time evaluators that are working behind the scenes to morph and change the graph, pushing and pulling in different ways, what are some additional examples of like what that looks like and what those workers are doing for you?

Chen Liang [00:20:05]:
Yeah, so in the research phase, the automated or semi-automated evaluation process is basically achieved by breaking down that evaluation process into a serialized tasks and then orchestrate multiple agents to take on and complete those tasks in sequence that we asked them to do those tasks, including like loading the original graph, making modifications of the properties of the nose and edges, and then assemble a new graph with an updated knowledge and embed that to the LLM and generate responses for the predefined questions and then compare those responses with the ground truth. So you can see this process is just like training a bunch of new engineer, new hires on a complex project. Right. So you set up the vision, you outline the steps, and then you set up the role and responsibility for each of those people and also at the same time you give them some room for creativity and then you can just step back, watch magic happening and oftentimes with, you know, surprise, but delightful results.

Matt Darrow [00:21:10]:
That seems like something that's going to help my frustration index too. That's another great term that hopefully people take away here, as well as the fact that using agentic AI in truly agentic solutions is unlike anything that you've experienced, but it's also building anything unlike that we've done to date. And I love the breakdown. I really appreciate it. Chen, on how you're describing everything from where LLMs fall short to how you need to apply top-down procedural knowledge, how that knowledge can be tested, made robust and ensured, not just with different measures, but in real-time as well. So if I try to wrap all this up and bring all this together for the audience, let me, I'm gonna take a crack at maybe a simplistic analogy here where if I think about the LA land as sort of like quicksand underneath my feet, it's not necessarily solid ground because every time that a new model is released, everything potentially changes behaviorally. And that's a lot of the challenge that is leading to hallucinations and agentic systems sort of going off the rails. But the knowledge representation, that top-down procedural understanding of what ought to be done in a right knowledge representation, this is almost like a really sturdy and robust banyan tree that's growing down deep within the quicksand and that is really what is driving the agent and that's what needs to drive the agent and that's what's going to ultimately prevent the hallucination.

Matt Darrow [00:22:33]:
How close did that analogy hit home? Where am I off the rails? How did I do?

Chen Liang [00:22:37]:
My God, it's so good. It's like I can almost vividly depict that bay entry right in front of me. I don't think I can top that, but that's really, really good. And the way that I think of it, you know, as I mentioned before, I'm thinking of the LLM just like this super intelligent intern, right? It is brilliant at picking up new knowledge that are publicly available out there. It has a pretty impressive reasoning capabilities, can have logical thinking and get things done. But what is lack of is the insider knowledge, the tricks of the trade and the years of experience. Whereas our knowledge graph is this ultimate playbook that used to guide these LLMs to not only on the what but also on the how in order for them to nail it and then perform at the top level.

Matt Darrow [00:23:23]:
That's awesome, Chen. It's been a blast seeing you and the team cook on this in a variety of ways and really push the envelope in really novel approaches. And for everybody out there, probably you could tell from this conversation not only we deeply passionate about all things AI agents, but at Vivint, our whole goal is we were here to change B2B selling for good. And we know AI is going to do that. And that's why Chen and the rest of our team are working to ensure that the future of AI works to drive technical sales teams forward. So if you like this discussion, follow us on LinkedIn. We'll be doing more of these over times. You can also subscribe to our podcast The Unexpected Lover on YouTube, Spotify or Apple.

Matt Darrow [00:24:02]:
We'll send you these after the fact. But again, Chen, thank you so much and until next time, great to have everybody join us today.

Chen Liang [00:24:09]:
Thank you.

Jarod Greene [00:24:10]:
For additional resources, check out vivun.com and be sure to check out V5, our five-minute soapbox series on YouTube. If there's a V5 you'd like us to talk about longer, let us know by messaging me Jarod Greene on LinkedIn.

More episodes

Chapters

What is The Unexpected Lever?