Author: AIR Team
Data is the lifeblood of today’s digital transformation—from connected cars to high-performance scientific computing—driving decisions, training models, and underpinning innovation. But with volumes exploding and privacy paramount, how we create and collect data is as important as how we analyze it. Within NOUS, the Data Life Cycle Framework (DLCF) defines six phases, from creation to deletion, to keep data trustworthy, secure, and usable across edge, cloud, HPC, and future quantum infrastructures. This blog zooms in on the first and most decisive phase—Data Creation/Collection/Acquisition—and shows how four Use Cases bring unique approaches to gathering the data that powers Europe’s digital future.
Use Case 1 – Connected Vehicle Perception Using Camera Data
NOUS fuses multi-modal inputs—roadside/in-vehicle video, GPS/IMU telemetry, network KPIs (latency, 4G/5G), and environmental context (weather, traffic)—to detect static (signs, barriers) and dynamic agents (cars, cyclists, pedestrians). Model training/validation leverages established datasets (COCO, UA-DETRAC, BDD100K). The pipeline is engineered for sub-100 ms end-to-end response so safety events translate into timely driver and pedestrian alerts.
To meet the challenge, the pipeline must deliver sub-100 ms processing while staying GDPR-compliant—storing no raw frames and anonymizing at the source so only metadata/event records persist—and ensure EIF/EIRA interoperability, secure transport via MQTT or WebSockets over TLS 1.3, and transparent, tamper-evident auditing through a permissioned blockchain.
NOUS addresses these challenges with edge (multi-access) nodes that run real-time object detection and on-source anonymization, use lightweight protocols to cut latency, apply end-to-end encryption for secure transport, and record tamper-evident event trails on a permissioned blockchain; outputs are emitted in interoperable JSON/GeoJSON and governed by rolling-retention policies, so safety alerts reach road users within milliseconds—delivering high technical performance with social responsibility.
Use Case 2 – Energy Prediction and Data Management in HPC Infrastructures
NOUS aggregates high-frequency energy signals—PV and wind production, grid load, market prices—together with meteorological variables (irradiance, wind speed, temperature, humidity) and open datasets (ADMIE, Meteostat, PVGIS). Streams arrive at ~15-minute cadence; each packet is timestamped and hashed at source to secure authenticity and traceability.
With energy data arriving in incompatible formats, units, and sampling rates—creating integration and time-sync issues that degrade forecasts under GDPR—NOUS enforces EIF/EIRA, normalizes schemas and aligns time bases via an Auto-Standardizer, explores a permissioned blockchain for tamper-evident audits of production/consumption records, and uses federated learning to train models without sharing raw data, preserving privacy while improving accuracy.
By combining diverse sensor inputs, external data platforms, and federated learning techniques, this use case shows how careful data acquisition transforms raw energy signals into actionable intelligence. With blockchain for auditability, encryption for security, and interoperability standards for data harmonisation, the system stays technically robust and compliant with European regulations, so NOUS delivers more accurate energy predictions, strengthens grid management, and supports the transition to cleaner, smarter, and more resilient energy infrastructures.
Use Case 3 – Crisis Management and Civil Protection with the CRIMSON Platform
CRIMSON ingests a multi-source, real-time view of incidents: drone/CCTV video, IoT telemetry (e.g., temperature, hygrometry), weather feeds, and high-fidelity geospatial assets that form a digital twin (3D terrain, aerial imagery, cadastral layers, BIM). Field teams add tactical inputs—commands, sitreps, photos, videos, audio notes—captured in interoperable formats (JSON/XML; GeoTIFF, GML, 3D Tiles, DXF) and streamed via modality-appropriate protocols (MQTT/REST for sensors, RTSP for live video). This ensures that situational awareness is both timely and machine-readable.
From the first hop, the collection pipeline enforces high-stakes reliability, security, and traceability by anonymizing and encrypting personal/location data in line with GDPR, operating under NIS2-aligned zero-trust with strict role-based access control, immutably logging critical actions and selected events on a permissioned blockchain with clear permission policies, and mandating OGC geospatial services (WMS/WFS/WMTS) so agencies can exchange and overlay layers without bespoke conversions or delay.
Edge-aware optimizations and AI assistance keep collection fast: near-source gateways pre-filter streams into actionable events, image/video analytics prioritize attention under EU AI Act transparency and oversight, and comprehensive logs + versioned sessions ensure every item—live feeds, crowdsourced annotations, GIS layers—enters with provenance intact, yielding a coherent, audit-ready operational picture in real time and enabling post-incident replay and analysis that turns heterogeneous inputs into trustworthy evidence and strengthens coordination and accountability across civil protection stakeholders.
Use Case 4 – Scientific Data Storage in HPC and AI Analytics
Scientific discovery thrives on diverse, high-quality data. In NOUS, datasets originate from molecular simulations, nanometrology and structural experiments, large-scale imaging/3D outputs, and synthetic data from computational models—augmented by textual corpora (publications) and rich experiment metadata. We also capture process provenance (user interactions, resource metrics, AI training history) to couple results with the conditions that produced them—key for reproducibility.
The challenge is heterogeneity; CSV simulation logs, real-time sensor streams, model checkpoints, and multi-TB images are not immediately interoperable. NOUS addresses this from the point of acquisition by enforcing FAIR principles and EU interoperability frameworks (EIF/EIRA), assigning persistent identifiers (DOIs, ORCIDs) early, and recording ISO-aligned metadata. Privacy and compliance are preserved with encryption and GDPR-aware handling of any personal data in workflows.
This approach ensures scientific data enters the lifecycle already structured for long-term value: automated preprocessing and validation pipelines plus advanced compliance measures make datasets immediately usable, secure, and interoperable across HPC and AI. The result is a foundation that accelerates current scientific discovery, keeps data accessible for future research, supports cross-disciplinary collaboration, and ensures Europe’s investments in supercomputing and AI infrastructures deliver lasting impact.
Common Threads: Quality, Privacy, and Trust
Across all use cases, NOUS enforces privacy-by-design and security-by-default: data is anonymized at the source, encrypted in transit, and exposed only via role-based access controls, aligning with GDPR, the EU AI Act, and NIS2 from the very first hop. Data quality is ensured through schema validation, plausibility/sanity checks, and time synchronization, while each record is enriched with contextual metadata (who, when, where conditions) to remain discoverable, interoperable, and reusable. For traceability, blockchain-backed audit trails yield tamper-evident logs so data enters as a trustworthy digital asset—ready to be integrated, analyzed, and safely shared across vehicles, energy systems, crisis platforms, and scientific workflows—laying the groundwork for a transparent, reliable, and future-ready digital infrastructure.
Conclusion
NOUS data starts at the edge—when a roadside camera sees traffic, a smart meter logs output, a drone surveys a flood, or an experiment emits readings. Those first moments set fidelity, provenance, and compliance for the entire data life cycle. By pairing edge AI and federated learning with selective blockchain audit trails and European standards (GDPR, NIS2, FAIR, OGC), we prove that collection can be high-performance and accountable. The payoff is practical: safer mobility, smarter energy grids, faster and more transparent crisis response, and open, reproducible science. Next, we’ll show how this chain of trust continues through processing, storage, sharing, and long-term preservation. Follow the series as Europe builds a secure, transparent, and future-ready digital data infrastructure.