While many Salesforce products like Sales Cloud, Service Cloud, Education Cloud, Health Cloud, and various industry clouds are built on a common ‘core’ platform that shares the same datastore (an Oracle relational database), Salesforce Data Cloud operates on (Data Cloud Model) a distinct architecture and technology stack. This insight explores the data models and related concepts that underpin the platform.
Thank you for reading this post, don't forget to subscribe!Data Cloud Architecture and Storage
Starting with the storage layer, Salesforce Data Cloud integrates multiple services:
- DynamoDB for hot storage, ensuring rapid data retrieval.
- Amazon S3 for cold storage.
- A SQL metadata store for indexing all metadata.
This combination enables Data Cloud to provide a petabyte-scale data store, overcoming the scalability and performance constraints of traditional relational databases.
The physical architecture of Data Cloud is represented as a set of data objects, which form the foundation for how data is ingested, harmonized, and activated within the platform.
Key Data Cloud Objects
Data Source
A Data Source is the initial data layer in Data Cloud, representing an external platform or system from which your data originates. Data Sources can include:
- Salesforce platforms like Sales Cloud, Commerce Cloud, Marketing Cloud, and Marketing Cloud Personalization.
- Object storage platforms such as Amazon S3, Microsoft Azure Storage, and Google Cloud Storage.
- Ingestion APIs and Connector SDKs for programmatically loading data from websites, mobile apps, and other systems.
- SFTP for file-based transfer.
Data Stream
A Data Stream is an entity extracted from a Data Source, such as ‘Orders’ from Commerce Cloud, ‘Contacts’ from Sales Cloud, or ‘Subscribers’ from Marketing Cloud. Once a Data Source is connected to Data Cloud, Data Streams provide access to these entities and require assignment to a category: Profile, Engagement, or Other. A single Data Source can contain multiple Data Streams.
Data Source Object (DSO)
Data Streams are ingested into Data Source Objects (DSOs), which serve as temporary staging stores containing data in its raw, native format (e.g., CSV files). Minor transformations can be applied to fields during data ingestion.
Data Lake Object (DLO)
The next step in the data flow is the Data Lake Object (DLO). DLOs are typed, schema-based, materialized views that reside in the data lake (Amazon S3) and are generally stored as Apache Parquet files. This format, along with Apache Iceberg’s abstraction layer, supports efficient data storage and retrieval. DLOs are the first objects available for inspection and preparation of data, including field mapping and additional transformations.
Data Model Object (DMO)
Unlike DSOs and DLOs, which use a physical data store, Data Model Objects (DMOs) provide a virtual, non-materialized view into the data lake. Queries against a DMO return results based on the current data snapshot in DLOs, without storing them. DMOs can aggregate attributes from different Data Streams, Calculated Insights, and other sources.
DMOs inherit their category from the first DLO mapped to them, and only DLOs with the same category can be mapped to the DMO. Like Salesforce objects, DMOs offer a canonical data model with pre-defined attributes (standard objects), and you can also create custom DMOs (custom objects). DMOs can have one-to-one or many-to-one relationships with other DMOs. Currently, there are 89 standard DMOs in Data Cloud, and this number is growing to support more use cases.
DMOs are organized into various subject areas, including:
- Case: for service and support cases.
- Engagement: for tracking interactions, like email activity.
- Loyalty: for managing rewards and recognition programs.
- Party: for representing individual attributes, such as contact or account information.
- Privacy: for tracking data privacy and consent preferences.
- Product: for defining product and service attributes.
- Sales Order: for managing past and forecasted sales.
Data Spaces
Data Spaces offer logical partitions, allowing data to be segregated between different brands, regions, or departments, without the need for multiple Data Cloud instances. They also align with a Software Development Lifecycle (SDLC), enabling you to stage and test Data Objects in separate environments without affecting production data. Data Sources, Data Streams, and DLOs can be shared across Data Spaces, while DMOs and other platform features are isolated based on user permissions.
Final Thoughts
With the drop in storage costs and advancements in technology, companies now have vast datasets at their disposal. Every customer interaction, whether it’s a purchase, an email open, or a web page view, can be captured and stored. When properly organized, this data allows businesses to understand customers better, predict their needs, personalize interactions, and much more.
However, one of Salesforce’s challenges is that its original platform, built over 20 years ago, was designed for a different era and struggles with handling voluminous, “big” data. Salesforce Data Cloud addresses these limitations with a modern architecture that overcomes the constraints of relational databases. Data Cloud is poised to become the backbone of the Salesforce platform, supporting the data needs of customers and products for the next two decades.
If you’re interested in learning how to implement and integrate Salesforce Data Cloud in your organization, following best practices, reach out today.