How does CADA enable data pooling for automotive AI models?

As proposed, the Cloud and AI Development Act (CADA) creates a specific legal and operational framework to overcome data silos in the automotive sector.

Summary As proposed, the Cloud and AI Development Act (CADA) creates a specific legal and operational framework to overcome data silos in the automotive sector. Recital 19 explicitly mandates that the Commission facilitate data pooling across industrial sectors through trusted third parties to train specialised AI models, while strictly preserving intellectual property rights. This is operationalized through the Cloud and AI Leadership Initiatives (Title II), which would enable secure large-scale data pooling using privacy-enhancing technologies. Furthermore, the proposal encourages the exploration of secure and verifiable compute approaches to enable AI deployment in sensitive contexts, such as autonomous driving and safety-critical manufacturing, ensuring that collaboration does not compromise proprietary data or operational security.

Detail

The automotive industry stands at a critical juncture where the development of advanced AI—particularly for autonomous driving, predictive maintenance, and industrial robotics—requires vast, diverse datasets that no single manufacturer possesses. The proposed Cloud and AI Development Act (CADA), COM(2026) 502 final, addresses this bottleneck not by mandating data sharing, but by constructing a sovereign, secure infrastructure that makes pooling viable. The legislation targets the "industrial AI" and "physical AI" domains, providing the legal certainty and technical enablers necessary for manufacturers, suppliers, and research entities to collaborate without fear of IP leakage or sovereignty risks.

The Legal Mandate: Recital 19 and the Role of Trusted Third Parties

The most direct legal basis for automotive data pooling in the proposal is found in Recital 19 of the explanatory memorandum. This recital identifies the automotive sector as a priority for the Apply AI Strategy and explicitly outlines the Commission's role in facilitating collaboration. It states:

"In manufacturing, the Commission should facilitate data pooling across industrial sectors through trusted third parties to train specialised AI models, ensuring a sufficient volume of training data, while strictly preserving intellectual property rights."

This provision is significant because it shifts the paradigm from direct peer-to-peer data exchange (which carries high legal and commercial risk) to a model mediated by trusted third parties. In the context of the automotive stack, these third parties would likely be entities certified under the Union's sovereignty framework or designated as Experience and Acceleration Centres for AI (Centres for AI) under Article 5. These entities would act as neutral custodians or technical facilitators, managing the pooling process in a way that ensures data remains protected while contributing to the collective training of models.

The recital further emphasizes the dual objective: ensuring a sufficient volume of training data (critical for overcoming the "long tail" of edge cases in autonomous driving) while strictly preserving intellectual property rights. This balance is the cornerstone of the proposal's approach to industrial AI, acknowledging that data is a core competitive asset for automotive OEMs and suppliers.

Operationalizing Pooling: Article 4 and Secure Technologies

The mandate in Recital 19 is operationalized through Article 4, which defines the operational objectives of the Cloud and AI Leadership Initiatives. Specifically, Article 4(5) sets out the goals for "industrial AI" across the Union's strategic sectors, including automotive.

Under Article 4(5)(c), the initiative would:

"enable secure large-scale data pooling for collaborative AI training through technologies enhancing privacy and preserving confidentiality."

This clause provides the technical mandate for the deployment of Privacy-Enhancing Technologies (PETs). For automotive AI, this implies a regulatory push towards architectures such as:

Federated Learning: Where models are trained locally on vehicle data, and only model updates (gradients) are shared, ensuring raw sensor data never leaves the manufacturer's control.
Secure Multi-Party Computation (SMPC): Allowing multiple parties to jointly compute a function over their inputs while keeping those inputs private.
Trusted Execution Environments (TEEs): Hardware-based secure enclaves where data can be processed in an encrypted state.

By explicitly linking data pooling to "technologies enhancing privacy," the proposal ensures that the infrastructure supporting automotive AI is built on a foundation of confidentiality. This directly addresses the primary hesitation of industry players regarding data sharing.

Specialised AI Models and the Preservation of IP

The proposal distinguishes between generic AI and specialised AI models tailored to the operational requirements of specific industries. Recital 19 notes that these models are designed to meet the needs of prioritised sectors, such as healthcare, transport, and manufacturing. In the automotive context, this includes:

Autonomous Driving Systems: Models requiring diverse driving scenarios from multiple geographies and manufacturers to ensure robustness in all weather and traffic conditions.
Industrial Robotics: Models for manufacturing lines that require pooling of telemetry data to optimize production efficiency.

The preservation of Intellectual Property Rights (IPR) is not merely a suggestion but a structural requirement. The proposal ensures that the pooling mechanism itself does not erode the competitive advantage of the data providers. By using "trusted third parties" and privacy-enhancing technologies, the data remains under the legal and technical control of the original owner. The "specialised" nature of the resulting models means that the output is a shared asset for safety and efficiency, while the underlying training data remains proprietary.

Secure and Verifiable Compute for Sensitive Contexts

A critical enabler for this pooling ecosystem is the computational infrastructure itself. Recital 19 explicitly states:

"Secure and verifiable compute approaches should be explored to enable the use of AI in sensitive contexts."

This provision recognizes that automotive data often falls into "sensitive contexts," ranging from commercially sensitive manufacturing data to safety-critical information that could impact public order or national security (e.g., high-precision mapping data). The proposal supports the development of secure and verifiable compute to ensure that AI can be deployed in these contexts without compromising data integrity or confidentiality.

This is reinforced by Article 4(2), which supports the development of:

"AI-optimised servers and baseline software based on processors, accelerators and quantum accelerators designed and manufactured in the Union."

By promoting EU-designed and manufactured hardware, CADA aims to reduce the risk of supply chain vulnerabilities, such as backdoors or unauthorized access by third-country actors. For automotive AI, which is increasingly classified as high-risk under the AI Act, the compute environment must be sovereign and verifiable. The concept of "verifiable compute" implies that the integrity of the computation can be cryptographically proven, ensuring that the AI model was trained on the correct data without tampering.

The Role of Experience and Acceleration Centres for AI

The Centres for AI, established under Article 5, serve as the physical and operational nodes for this ecosystem. Built on the network of European Digital Innovation Hubs, these centres are tasked with:

Helping organizations accelerate digital transformation through access to AI technologies.
Connecting organizations with European providers of cloud and AI technologies.
Facilitating the transfer of expertise across regions.

In the context of automotive data pooling, these centres would act as the trusted third parties referenced in Recital 19. They would provide the neutral ground, technical infrastructure, and expertise necessary for SMEs and larger manufacturers to collaborate. For smaller automotive suppliers who may lack the resources to implement complex federated learning architectures, the Centres for AI would offer the necessary support to participate in the pooling initiatives, ensuring that the benefits of CADA are distributed across the entire value chain.

Alignment with the AI Act and Data Act

CADA does not operate in a vacuum; it complements the existing regulatory framework. The AI Act (Regulation (EU) 2024/1689) imposes strict requirements on high-risk AI systems, including those used in automotive safety components. CADA's data pooling mechanisms are designed to ensure that the resulting AI models comply with the AI Act's requirements for data governance, accuracy, and robustness.

Furthermore, the Data Act (Regulation (EU) 2023/2854) provides the foundation for data access and switching. By reducing vendor lock-in and enabling data portability, the Data Act makes it easier for automotive entities to move data between providers and participate in the pooling initiatives facilitated by CADA. Together, these instruments create a cohesive ecosystem where data can flow securely and efficiently to drive innovation.

What this means for you

For Chief Technology Officers, architects, and legal counsel in the automotive sector, the proposed CADA offers a strategic roadmap for navigating the data challenges of the AI era:

Engage with Trusted Third Parties: Begin evaluating potential partnerships with entities that could serve as "trusted third parties" under the CADA framework. This may include national Centres for AI, certified cloud providers, or specialized data trusts. Early engagement can position your organization as a leader in the emerging European automotive AI ecosystem.
Prioritize Privacy-Enhancing Technologies: Invest in and deploy technologies such as federated learning, SMPC, and TEEs. The proposal explicitly links data pooling to these technologies, suggesting that future funding and regulatory support will favor architectures that guarantee data confidentiality.
Leverage the Centres for AI: SMEs and suppliers should actively engage with the national Experience and Acceleration Centres for AI. These centres will provide access to the necessary computing resources, expertise, and neutral ground for collaborative AI training, lowering the barrier to entry for smaller players.
Strengthen IP Protection Frameworks: As you prepare for data pooling, ensure that your contractual and technical frameworks are robust enough to strictly preserve intellectual property rights. CADA's recitals explicitly support this, so leveraging these provisions in partnership agreements can strengthen your legal position.
Adopt Sovereign Compute Standards: Consider the provenance of your compute hardware and software. The proposal's emphasis on EU-designed and manufactured processors and accelerators suggests that sovereign infrastructure may become a competitive advantage, particularly for projects involving sensitive data or public sector contracts.

Common misconceptions

Misconception 1: CADA mandates data sharing between competitors.
- Reality: CADA facilitates and incentivizes data pooling through funding and infrastructure support, but it does not mandate data sharing between private entities. The use of "trusted third parties" and privacy-preserving technologies is designed to make voluntary sharing safer and more attractive, not compulsory.
Misconception 2: Data pooling requires centralizing raw data.
- Reality: The proposal explicitly supports "secure large-scale data pooling" through technologies that enhance privacy. This includes federated learning and other methods where raw data does not leave the owner's premises, contradicting the notion that centralization is required.
Misconception 3: Only large OEMs can benefit from these initiatives.
- Reality: The Act specifically mentions supporting SMEs and SMCs through Centres for AI and simplified procurement strategies. SMEs can leverage these centres to access the compute resources and expertise needed for collaborative AI training, ensuring a level playing field.
Misconception 4: CADA replaces the AI Act's data requirements.
- Reality: CADA complements the AI Act. While CADA provides the infrastructure and incentives for data pooling, the AI Act continues to govern the quality, governance, and safety of the data used in high-risk AI systems. Compliance with both is necessary.

Official sources

This is general information about a draft EU regulation, not legal advice.