AI agents in the industry | Part 2

Dr. Andrii Melashchenko,

Data Architecture as a Foundation in the AI Era

When an AI Agent recommends replacing critical network infrastructure based on subtle performance degradation patterns, human operators face a  trust challenge. Should they allocate budget and resources based on advice from the "digital colleague"? What if the recommendations involve substantial investments in time, equipment replacement, or system modernization?

© Jakkarin/Stock.Adobe.com

As Microsoft CEO Satya Nadella recently observed, "Traditional business or SaaS applications may dramatically change in the agent era. These applications are essentially CRUD (create, read, update, and delete) databases with business logic. But in the future, this logic will migrate to AI agents." This transformation means AI will soon operate directly on enterprise data to make operational recommendations, making the quality and comprehensiveness of that data integral.

The key to building trust between human operators and AI "skilled workers" lies in securing a mature data engineering infrastructure. When AI Agents offer solutions , it needs to be validated through comprehensive data.This is not not single snapshots, but a longitudinal analysis spanning weeks, months, or even years.

Use case: Predictive maintenance in Network Operation

Consider this real-world dialogue between an AI Agent and a network administrator:

AI Agent: "Based on analysis of network telemetry data, I recommend replacing the optical receivers in Switch-Core-3. I've observed a consistent 0.3 dBm degradation in signal strength over the past month, with no corresponding link-down events. This pattern indicates aging optical components rather than physical connection issues."

Network Admin: "That's a $15,000 investment. What's your confidence level?"

AI Agent: "High confidence (92%). The degradation follows a linear pattern inconsistent with fiber contamination (which shows step changes when fibers are reconnected). Historical data from 18 similar switches shows this pattern preceded receiver failures with 87% accuracy within 60 days. No maintenance logs indicate fiber cleaning or reconnection in this period."

Network Admin: "Show me the trend data and similiar cases for comparison."

AI Agent: "Here's the analysis:

  • 30-day signal strength trend showing linear degradation from -2.1 dBm to -2.4 dBm
  • Zero link-down events in maintenance logs
  • 18 historical cases with similar patterns: 16 resulted in receiver failure within 60 days
  • Estimated downtime cost if failure occurs: $50,000/hour
  • Current replacement window: scheduled maintenance next weekend
  • Risk analysis: 87% probability of failure before next maintenance window in 45 days"
Advertisement

This level of sophisticated analysis requires years of operational data stored in a system that can support complex queries, time-series analysis, and pattern matching across heterogeneous data sources. The $15,000 investment for high-end optical modules in a core switch becomes justified when it is compared against potential downtime costs. This is where modern data infrastructure becomes something critical.

The Lakehouse Solution: Unified Data Architecture for AI-Driven Operations

Before we dive deeper, it is important to define what the Data Lakehouse architecture is and how it differs from other models. The Lakehouse architecture unifies data access through three basic components: the storage engine managing data persistence, the computer engine processing queries, and the catalog maintaining metadata. These core elements connect to transform information through standardized table and file formats before final storage. This establishes a consistent data flow pattern. The integrated approach eliminates the usual separation between warehouses and lakes, enabling analytics and AI workloads to operate on the same reliable data foundation. At its core, the Lakehouse paradigm delivers four main capabilities

  1. Cost-Efficient Storage at Scale

    Lakehouse architectures rely on cloud object storage (S3, Azure Data Lake Storage, Google Cloud Storage) as their foundation, achieving storage costs of approximately $20-23 per terabyte per month, which is dramatically lower than the cost of traditional solutions. This economic benefit allows organizations to retain  historical data necessary for AI pattern recognition and validation.
    For our network monitoring example, storing detailed telemetry from thousands of switches, routers, and optical components becomes economically feasible. A single enterprise network might generate terabytes of performance metrics per year. However, at Lakehouse costs, multi-year retention pays off through long-term success-rate.

  2. Schema Evolution: 

    Unlike rigid databases, Lakehouses support flexibility. Built on Open Table formats like Delta Lake, Apache Iceberg, and Apache Hudi, these systems can add new metrics or sensor data without disrupting existing operations, evolve data structures while equipment and monitoring capabilities are upgraded, and maintain backward compatibility for historical analysis.
    This flexibility proves essential as networks evolve. When new optical monitoring parameters become available, or when AI models require additional contextual data, the Lakehouse adapts without expensive migrations or system downtime.
    This kind of flexibility the key. As new monitoring data becomes available or AI tools require added context, a Lakehouse setup can take it all in easily. There’s no need for disruptive migrations or rework. It just works, quietly keeping historical data of the organization intact.
    This flexibility is critical: when new monitoring data becomes available or AI tools require additional context, a lakehouse architecture can easily accommodate all of this without migration or downtime. It just works, without compromising the organization's historical data.

  3. Vendor-Neutral Open Standards: The Lakehouse ecosystem stands on open standards. Vendor lock-in that could make long-term data accessibility problematic is thus not a problem. Organizations can store data in open formats (Parquet, ORC) readable by multiple tools and choose from various compute engines (Spark, Trino, DuckDB) based on their specific needs. Using standard, like S3 API for storing data simplifies migration between cloud providers or on-premises infrastructure as requirements change.
  4. Unified Access Patterns: Perhaps most important for AI applications, the Lakehouse provides unified access to a wide array of data types thanks to standard interfaces. The infrastructure makes it easy to work across all types of data. Users can query time-series telemetry from network equipment, store and analyze unstructured inputs like maintenance logs and technician notes, and pull in semi-structured and structured data such as configuration files and network topology maps. With built-in time travel functionality, it’s also possible to revisit historical performance baselines or past anomalies. AI agents can access all of this through standard SQL or Python DataFrames.

Building Trust Through Comprehensive Data

The Lakehouse architecture specifically addresses the trust challenge in AI-human relationship through several mechanisms:

Longitudinal Data Analysis 

By economically storing years of operational data, The Lakehouse enables AI to identify subtle patterns that might escape the human eye. In our optical receiver example, the AI detected early-stage degradation by connecting multiple data streams into a coherent analysis.

The system tracked daily signal strength measurements over months, revealing a gradual decline too subtle for standard alerts, while at the same time correlating these readings with environmental factors like temperature and humidity to rule out facility-related causes.

This analysis's credibility was confirmed  through automatic comparison with historical failure patterns from similar equipment, where the system identified a signature degradation curve that took place before past failures.

The absence of maintenance events in system logs(fiber cleaning or reconnection)also confirmed that the signal loss happened due to genuine component degradation and not temporary conditions.

Explainable Recommendations

When AI agents have access to rich historical data, they explain the predictions they make. Instead of offering vague results, they break down their recommendations with clear reasoning backed by data. For example, if AI agent suggests replacing a piece of equipment, it’s not a guess - it’s citing patterns like signal degradation or declining performance metrics seen in similar past cases. It’s the AI equivalent of saying, “Here’s what we’ve seen before, and here’s what happened next.”

That transparency goes further with confidence levels based on actual operational data. Decision-makers get a clearer picture of how much trust they can place in each of the recommendations.

Perhaps most importantly, the AI connects technical risk to business outcomes. It describes the cost of proactive steps versus the potential expense of downtime, with real-world insight. This builds the AI agent’s role up from a black box to an active advisor, with  thinking that can be seen, challenged, and trusted.

Continuous Learning and Validation

The Lakehouse architecture brings an important advantage to AI development: it keeps a complete history of both data and decisions. That means organizations can measure how well AI predictions held up against actual outcomes. This kind of transparency allows teams to adjust models when new patterns or edge cases appear, without having to pause  live operations. It’s a self-improving system that gets smarter with every cycle.

Because every recommendation and action is traceable, the system also supports compliance and organizational learning. Engineers can perform root-cause analysis when things go problematic, or when the AI gets it exceptionally right.

Most importantly, this setup allows for ongoing validation across a wide range of operating conditions, from routine workflows to rare anomalies. Unlike static, black-box systems that rely on periodic updates, Lakehouse AI evolves continuously. It grows more reliable, more explainable, and more valuable.

Experience with Lakehouse Architecture

Readers may rightfully ask whether the facts and ideas outlined in this article have been tested in practice or remain in the ideation phase. The answer in short is – yes, but.

In this section, we are going to outline the pros and cons of Lakehouse architecture based on our company's real-world implementation experience.

The Challenges: Hard Cold Truths

Operating Lakehouse on-premises is not as simple as deploying a "single database executable" that handles everything as a black box. Traditional RDBMSs bundle storage, cataloging, and querying into one system, while Lakehouse architecture deliberately disaggregates these components. In practice, this means operating separate systems: an S3-compatible object storage cluster, a catalog server with relational database backend, and a querying engine of your choice. This disaggregation introduces operational complexity in non-cloud settings. In cloud environments, providers abstract away much of this complexity, but on-premises implementations require additional expertise and resources.

Another significant challenge is the mindset change required. RDBMSs have dominated the market for the last 50 years, establishing well-defined notions of indexes and clear trade-offs between ingress and egress speeds (OLTP vs. OLAP). The Lakehouse architecture changes these rules by separating reads from writes. Organizations can simultaneously achieve both high ingress and high egress performance, but only if they correctly implement partitioning, compaction, and file schema designs. Developers need to understand new optimization approaches that differ significantly from the RDBMS world. These on-premises operational challenges and mindset shifts represent the core hurdles we've observed during implementation.

The Benefit: Measurable Advantages

Regarding benefits, cost efficiency stands out the most. We have observed that decoupling compute and storage in the Lakehouse model, combined with S3 storage utilization, drives costs down by orders of magnitude. These aren't theoretical projections but actual cost reductions we've measured in staging environments.

Along with cost improvements, we've experienced significant query performance enhancements when Lakehouse architecture was properly implemented. However, it's important to note that performance evaluation requires an end-to-end perspective-from data source to downstream consumer. While downstream applications can query data rapidly from the Lakehouse, the speed of data ingestion, compaction, and sorting may cause seconds of delay for newly arrived data. Lakehouse is not a silver bullet for all use cases, but it excels as a destination for streaming pipelines where these trade-offs are acceptable.

Another substantial benefit we've observed is the native time-travel functionality. Lakehouse provides this capability out of the box, drastically improving our ability to understand how network infrastructure behaved at specific points in time without writing complex SQL queries. This feature has proven invaluable for both troubleshooting and compliance requirements.

Our company has experienced both the advantages and challenges of Lakehouse architecture. Organizations need to balance these factors based on their specific use cases, but the performance improvements and cost efficiencies definitely incentivize deeper integration and gradual evolution toward this model. As with any architectural shift, the key lies in understanding the trade-offs and aligning them with business priorities.

Article series "AI agents in industry"

AI agents are revolutionizing industrial processes: As digital employees, they combine AI with autonomous functions, act independently and enable new efficiency potential. This series of articles provides answers to questions relating to implementation, suitable architectures and the necessary database, among other things.

Part 1 of issue 05-2025 examined the basic concepts of AI agents, their implementation using AWS Bedrock and the development of inference as the fourth fundamental building block alongside compute, storage and databases.

In our next article in issue 08-2025 (EVT: August 05), we will explore how AI tools and frameworks build on this data foundation to create trustworthy systems for critical infrastructure. We will explore the specific APIs, processing patterns and validation techniques used to turn raw operational data into actionable information that people can trust.

  • Xing Icon
  • LinkedIn Icon
Advertisement
Advertisement

You might also be interested in

Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement
Advertisement

Personnel

Q.ANT hires Michael Krüger for Sales

Photonics specialist Q.ANT is expanding its management team. Michael Krüger is taking on the newly created position of Vice President Commercials and will be responsible for driving forward the marketing of the processor technology.

read more...
Subscribe to our newsletter
Advertisement
Back to home