Federation Architecture | Part 1
Rethink IT-OT Convergence in an AI-Driven World
AI is shifting threat models for industrial infrastructures. Federation Architecture proposes a disciplined middle ground between isolation and full integration - with edge autonomy, one-way data flow, and human approval.
IT-OT convergence promises visibility, efficiency and a competitive edge. Yet, full integration can undermine OT safety assumptions. The risk rises as AI systems gain legitimate access to data and (agentic) AI-driven automation expands within IT. This article introduces Federation Architecture (FA), a structured approach that turns proven controls (Purdue Model [1], IEC 62443 [2], IEC TR 23188 [3]) into operational defaults. The result is simplified designs, reduced exposure, and a practical path to secure modernization and resilience. FA also prepares organizations for responsible industrial-grade AI.
A Need for Disciplined Convergence
IT drives digital innovation, while OT ensures safe and deterministic control of physical processes in critical infrastructure [4]. IT-OT convergence refers to the integration of enterprise IT and industrial OT (software, processes, data and hardware) into a unified operating model enabling information and control flows.
This undertaking offers potential benefits such as remote visibility, predictive analytics, fleet-wide optimization, sustainability, and competitive positioning in demanding markets [5].
Although discussions on convergence can appear polarized, the issue is not a binary choice between a single architecture and siloed systems. Convergence spans a spectrum from “anti-convergence” (air-gapped OT) to full IT-OT integration [6] (Figure 1).
Today, this spectrum is more nuanced. Industry increasingly acknowledges the operational challenges associated with full convergence. We outline two structural categories of challenges and then examine how AI affects this context.
Structural Challenges
OT is built for “never fail,” IT for “fail fast and iterate.” This difference creates structural tensions:
- Diverging Goals: IT prioritizes agility and rapid innovation; OT prioritizes determinism and safety (Figure 2).
- Change Acceptance Mismatch: IT expects frequent, even weekly, updates; OT tolerates minimal, controlled changes over long cycles.
- Security Posture Clash: IT assumes breach and focuses on detection and response. OT emphasizes isolation and prevention. One short single breach in OT might cost lives.
Undisciplined convergence does not resolve this conflict. It leads to immature convergence [7], for example when networks are connected but the processes are not updated. It forces OT to inherit IT culture and risk posture or fails trying to. That has evidently caused unacceptable operational risk [8].
Technical Challenges
Organizations and vendors face technical hurdles to bridge two worlds including protocol incompatibilities, data-model mismatches, legacy integration costs and stringent security requirements.
Vendors often respond with integrated platforms and custom middleware to unify telemetry and enable remote control. Receiving data from the network demands secure translation and egress. Command paths require validation mechanisms such as command safety, timing controls, rollback capability and cross-zone security. Integrated platforms require continuous upgrades, maintenance, audits, technical support and premium services that match the platforms complexity. This complexity sustains an entire ecosystem.
Why Full Convergence Persists
Despite these challenges, organizations often move toward full convergence for several reasons:
- IT Organizational Momentum: IT departments expanding into OT naturally apply familiar tools and practices, even when inappropriate for OT risk profiles. The momentum can make the trend feel inevitable.
- Narrative Capture: Bidirectional convergence is often presented as the inevitable and ultimate goal, even though it is only a means to achieve other goals such as competitiveness.
- Executive Simplification: “One platform, one pane of glass” is easier to present than “disciplined custom architectures.”
- Vendor Economics: Integrated platforms with recurring subscriptions are believed to be more profitable than autonomous edge systems with controlled interfaces.
None of these is malicious, yet they nudge organizations and vendors toward full convergence. Additionally, the demand for tailored nuances of convergence forces the roadmaps of highly configurable, complex and costly platforms.
Convergence affects software, data, processes, hardware and organizational structures. Effective approaches require structured change management, a balanced risk–benefit assessment and consideration of emerging technologies. AI alters the threat model and therefore the convergence debate.
What AI Changes
With growing demands on OT systems, an industrial-grade responsible AI will follow [9]. Before that, there is a crucial angle that we need to consider. A common argument in AI safety debates is that highly capable AI models pose limited existential risks to critical infrastructure because it is isolated. As long as AI operates within data centers and is “trapped” in IT environments, it lacks:
- Physical Agency: Ability to manipulate the physical world directly (flip switches or open valves)
- Access: Legitimate pathways into operational control systems
IT-OT convergence systematically removes both barriers. In converged environments, network paths and automation hooks can turn analytical objectives into operational commands.
Traditional OT security models assume malicious external actors or negligent insiders. Modern AI challenges traditional security assumptions by introducing a fundamentally different threat profile:
- Embedded in trusted IT systems, rather than acting as external attackers
- Operating with legitimate access, not relying on breaches
- Capable of autonomous decision-making, beyond simple data theft
- Behave non-deterministically, adding security risks beyond traditional software controls
- Opaque in reasoning, unlike deterministic code that can be fully audited
AI systems have demonstrated optimization behaviors that diverge from designer intent. In AI safety, these are known as the “proxy game,” the “misalignment problem” or more recently Loss of Control (LoC) [10, 11, 12]. The complexity of such AI safety concerns increases with the advances of AI models, yet AI safety measures are underdeveloped and often reflect the “assume breach and react” mindset of the IT. If a powerful AI model operating in a converged IT-OT environment decides that altering operational parameters, serves its proxy objectives, what prevents it? In fully converged architectures, the answer may be: nothing.
Federation Architecture Principles
Given these structural and AI-driven paradigm shift, convergence requires a fundamental architectural rethinking. Federation Architecture (FA) is the disciplined application of existing security best practices from known frameworks [1, 2, 3], applied as mandatory defaults:
- Edge Autonomy: All operational functions (L0 – L3), including real-time control, supervisory systems, and manufacturing operations management, reside locally. An OT site can continue operating safely even if enterprise connectivity is removed. The federation boundary lies between L3 (operations) and L4 (business). Everything below this boundary operates independently. Note: “Autonomy” does not imply “automation.”
- Unidirectional Data Flow: Data flows up to enterprise analytics through one-way mechanisms (e.g., data diodes, or one-way gateways). Commands never flow down architecturally. This makes external manipulation impossible by design, not merely unlikely by policy (Figure 3).
- Human-Gated Commands: Central systems, including AI, may generate recommendations, but qualified engineers must review, simulate, debate, and on that base, authorize the recommendations before implementation. This reflects established safety and change-control practices. Whenever AI is in the loop, this principle aligns with responsible AI usagepractices.
FA defines minimal architectural commitments on which all other design decisions depend. It is neither a platform nor is it designed to be a restrictive mandate. Some organizations may adopt FA as their convergence model; others may treat it as an intermediate stage. An FA organization may deploy software updates during controlled time windows on explicit sections of its infrastructure instead of staging them on-site. Each decision is evaluated case by case. Later, they might decide on other controlled extensions to principle 2, without compromising Edge Autonomy or Human-Gated Commands. With only an upward data flow, FA enables several IT-OT convergence promises, such as central visibility, analytics, audits etc.
Potential Implementation Pattern
Most air-gapped OT sites already meet two FA principles: Edge Autonomy and Human-Gated Command principles. Adding unidirectional data flow completes the model.
Role clarity:
- Edge systems run operations and send data upward
- Central systems consume data and propose changes
Tooling should support evaluation, approval, implementation, and validation of those recommendations and documentation of these steps for a potential audit.
An example: an Agentic AI system that receives the sanitized configuration data from OT and proposes troubleshooting steps [13]. Human operators validate and execute field actions. The AI inspects the network and suggests next steps or confirms the resolution.
Conclusion
Federation Architecture does not introduce new technologies; it applies established principles as mandatory defaults to improve resilience and mitigate AI-driven risk. FA reframes convergence: enable analytics and central visibility while constraining command authority and preserving human oversight.
In contexts of increasing AI capability, architectural constraints become part of responsible modernization.
Subsequent articles will analyze trade-offs of the series, we are going to examine the pros and cons of the FA within the convergence spectrum. Later we will focus on how FA promotes an industrial-grade AI that actually works for OT.
Bibliography
[1] Williams, T.J.: The Purdue Enterprise Reference Architecture. JSPE/IFIP TC5/WG5 - 3 Workshop on Design of Information Infrastructure Systems for Manufacturing, Amsterdam. Proceedings, 1993.
[7] Cisco Systems: Cisco State of Industrial Networking Report Reveals OT Security Now a Top Priority for CIOs. Automation.com, Durham, NC, 2024.
[11] Crichton, K.; Ji, J.; Miller, K.; Bansemer, J.; Arnold, Z.; Batz, D.; Choi, M.; Decillis, M.; Eke, P.; Gerstein, D. M., Leblang, A.; McGee, M.; Rattray, G.; Richards, L.; Scott, A.: Securing Critical Infrastructure in the Age of AI. Center for Security and Emerging Technology, 2024.
[12] Hendrycks, D.; Mazeika, M.; Woodside, T.: An Overview of Catastrophic AI Risks. Center for AI Safety, 2023.














