Technical Intro to Dataspaces
What's a Dataspace?
The simplest (and informal) definition could be: A trusted peer-to-peer network where data is shared between the participants in a sovereign way. Dataspaces are crucial for organizations looking to securely share and manage data across multiple parties while maintaining control over their information.
Dataspaces enable organizations to collaborate on data-driven projects while preserving data sovereignty and ensuring compliance with data protection regulations. By implementing robust governance and trust mechanisms, such as identity management, configuration, and contract management, dataspaces ensure that data is shared securely and in compliance with legal and regulatory requirements. This approach allows data providers to maintain control over their data, aligning with the principles of data sovereignty. The International Data Spaces Association (IDSA) and similar frameworks provide guidelines and standards for setting up and managing dataspaces, ensuring that all participants follow open standards and compatible rules for data exchange, thereby building trust among stakeholders.
in detail:
- trusted: every participant should trust the others, because the onboarding is not available to everyone but it is an activity regulated by the dataspace governance entity.
- peer-to-peer: there's no client-server: every participant could be both a data provider and a data consumer.
- participants: typically organizations, companies and so on.
- sovereign: the data is shared only to the participants that are allowed to, this is ensured through policy evaluation.
Architecture
The dataspace is typically designed as decentralized: so there's no single point of failure, the network need to be operative also if some components are not available. In the real case though some components could be centralized (like identity provider, discovery service, ...), but the long-term path should foresee a full-decentralized network. This decentralized approach enhances data security, reduces single points of failure, and promotes data sovereignty.
Key components of dataspace architecture include identity providers, discovery services, and data exchange protocols. Identity providers manage the identity of participants and authentication methods, ensuring a secure environment for data sharing. Discovery services enable users to search for and share data efficiently, making data access more streamlined. Data exchange protocols, such as those defined by the IDSA Dataspace protocol, outline the technical standards and methods for data sharing, maintaining control over data, and ensuring interoperability between participants.
Connector
The connector is the fundamental piece in the dataspace network, every participant needs to have one (or more) in order to be able to be part of the network, it takes care to:
- interact with the other connectors through the wire protocol: DSP: Dataspace Protocol
- present and verify credentials through Identity and Trust protocol: DCP: Decentralized Claims Protocol
- Understanding the role of connectors is essential for organizations looking to implement or participate in a dataspace. Connectors act as the bridge between an organization's internal systems and the broader dataspace ecosystem.
Connectors play a crucial role in enforcing data usage policies, managing access rights, and facilitating secure data transfers between participants. They support usage policies written in specific languages. Connectors help manage access rights and ensure that data sharing complies with organizational policies. Furthermore, connectors enforce secure data transfers by adhering to predefined rules, thereby acting as the guardians of data sovereignty within the dataspace ecosystem.
Learn more about Think-it's connector as a service for dataspaces.
Real world dataspaces
Dataspace is not just a study case concept, there are some of them alive and running (a radar is also available), as:
Open Source
Open source initiatives play a crucial role in the development and adoption of dataspace technologies. They provide transparent, community-driven solutions that organizations can leverage and contribute to, something Think-it is proudly part of.
-
Community-Driven Innovation: The open-source environment in dataspace technology is characterized by its efficiency, agility, and transparency. This collaborative atmosphere encourages the contribution of ideas, feedback, and continuous improvement, as seen in the International Data Spaces (IDS) open-source ecosystem.
-
Reduced Vendor Lock-In: Open-source software helps prevent vendor lock-in by ensuring that data sharing can occur independently of the underlying IT infrastructure. This is a key benefit highlighted by solutions like sovity, which emphasize the importance of avoiding lock-in effects through open-source technologies.
-
Increased Interoperability: The use of open-source frameworks such as the Eclipse Dataspace Components (EDC) ensures interoperability between different dataspace implementations. EDC is designed to provide a standards-based framework that can be reused and customized, aligning with the Gaia-X AISBL Trust Framework and the IDSA Dataspace protocol.
-
Ethical Engineering and Sustainable Practices: Open-source projects are governed by principles of openness, transparency, and meritocracy, which are essential for ethical engineering. Additionally, these solutions promote sustainable practices by fostering collaborative development and reducing dependence on proprietary technologies.
Eclipse Dataspace Components (EDC)
EDC is an open source framework hosted by the Eclipse Foundation for building secure, globally-scalable data-sharing services. EDC provides highly customizable components for creating control planes, data planes, decentralized identity systems, and federated data catalogs. (ref.)
In fewer words, it is a platform written in Java that can be used to build components to be used in a dataspace.
Tractus-X
Tractus-X is the official open-source project in the Catena-X data space (ref.)
To join the Catena-X network it's not necessary to use the Tractus-X products: Catena-X defines the standards, and every organization could implement the products on their own and being able to share/fetch data in the network. It's often called the "reference implementation" for the Catena-X network.
Tractus-X EDC
Is the reference implementation of the Dataspace Connector in the Catena-X network. It is based on EDC. Website: https://github.com/eclipse-tractusx/tractusx-edc
For organizations interested in implementing or joining a dataspace, start by identifying your data sharing needs and potential use cases. Familiarize yourself with the key components like connectors and protocols. Consider exploring open-source solutions like EDC or Tractus-X as a starting point, as well as the prescriptive guidance architecture we've provided on AWS.