Context & Scope
Last updated
Was this helpful?
Last updated
Was this helpful?
The Data Streamhouse is architected as a modular system, with each component responsible for a distinct operational concern. This separation enables horizontal scalability, fault isolation, and performance optimization—allowing the platform to support environments with hundreds of Kafka clusters and millions of active data streams.
Portal serves as the central control plane for distributed real-time data systems. It connects to Apache Kafka clusters and related components to unify access, governance, and operations across teams and environments. Users interact with Portal through a web-based interface.
Argus monitors one or more Apache Kafka clusters. It collects metrics, performs system health checks, and runs the Health Assistant to detect issues across the cluster and its associated components. Argus operates headlessly and reports back to Portal. Cluster assignments are managed via Portal.
Machina is the computational core of the Data Streamhouse. It executes data streaming applications and processes, such as data integration, processing, or custom analytics. Machina instances are deployed and managed independently for workload isolation and scale.
The Data Streamhouse runs entirely within your controlled infrastructure—whether in private cloud, on-premise data centers, or containerized environments on public cloud platforms. The diagram below outlines the core components of the system (Portal, Machina, Argus) and the external systems and actors that interface with it.
A single Data Streamhouse typically connects to multiple, separate data streaming environments—each with its own Kafka clusters, Schema Registries, Connect clusters, and associated applications. This multi-environment architecture supports hybrid setups and allows organizations to centralize governance, monitoring, and control across all streaming domains from one unified platform.
Key external integrations include:
Authentication providers (e.g., LDAP, OpenID)
Automation and provisioning systems (IaC, CI/CD)
Alerting and messaging platforms
Public APIs for integration with internal tooling and workflows