{"type":"rich","version":"1.0","provider_name":"Transistor","provider_url":"https://transistor.fm","author_name":"Embracing Digital Transformation","title":"#355 From Mars to Data Centers: AI that Prevents Cloud Outages.","html":"<iframe width=\"100%\" height=\"180\" frameborder=\"no\" scrolling=\"no\" seamless src=\"https://share.transistor.fm/e/c5276c2e\"></iframe>","width":"100%","height":180,"duration":2044,"description":"Cloud outages don’t have to be a mystery—or a recurring fire drill. Host Dr. Darren interviews Dr. Helen Gu, professor at North Carolina State University and founder/CEO of InsightFinder, about how AI for cloud operations can detect, predict, and automatically fix outages before users feel the impact.\n\n## Key Takeaways\n- AI can move beyond simple alerting to **predictive outage prevention**, spotting early warning signs before they become incidents.\n- **Unsupervised machine learning** helps discover hidden patterns in noisy machine data without requiring large sets of labeled examples.\n- Real-world cloud environments are complex, with thousands of parameters, dynamic workloads, and interacting microservices that make manual troubleshooting difficult.\n- A **closed-loop feedback system** lets teams review AI predictions, correct mistakes, and continuously improve model accuracy.\n- InsightFinder’s **composite AI** approach combines predictive AI, causal inference, behavior learning, and small language models for more reliable operations.\n- The same data-driven approach can support **cloud monitoring, edge environments, critical infrastructure, and other machine-generated data streams**.\n\n## Chapters\n- 00:00 Introduction to AI that prevents cloud outages\n- 01:05 Helen Gu’s origin story in NASA-funded Mars research\n- 04:10 From video streaming on Mars to machine learning for reliability\n- 07:00 Why machine data is harder than it looks\n- 09:20 Unsurvised learning vs. supervised learning\n- 12:10 From research to Google Cloud anomaly detection\n- 14:40 Detection, prediction, and automatic remediation\n- 17:10 Why cloud systems are so complex\n- 19:45 The future of AI agents, models, and infrastructure monitoring\n- 23:10 Hallucinations, false positives, and feedback loops\n- 26:00 Composite AI and online learning in production\n- 29:10 Adapting AI models to different environments\n- 32:05 Fast deployment and time to value","thumbnail_url":"https://img.transistorcdn.com/IRrW2aizIeoZDn3gKLEax-JYQ8V_WzaFpHdgsslDx3k/rs:fill:0:0:1/w:400/h:400/q:60/mb:500000/aHR0cHM6Ly9pbWct/dXBsb2FkLXByb2R1/Y3Rpb24udHJhbnNp/c3Rvci5mbS9jM2Ji/MDk1OTdiYzA4ZWMw/NWNlOTY0N2RhMWQ3/YmY5Mi5wbmc.webp","thumbnail_width":300,"thumbnail_height":300}