Cortex Framework: Integration with Salesforce (SFDC)
This insight outlines the process of integrating Salesforce (SFDC) operational workloads into the Cortex Framework Data Foundation. By integrating Salesforce data through Dataflow pipelines into BigQuery, Cloud Composer can schedule and monitor these pipelines, allowing you to gain insights from your Salesforce data. Cortex Framework Integration with Salesforce explained.
Prerequisite: Before configuring any workload integration, ensure that the Cortex Framework Data Foundation is deployed.
Configuration File
The config.json
file in the Cortex Framework Data Foundation repository manages settings for transferring data from various sources, including Salesforce. Below is an example of how Salesforce workloads are configured:
jsonCopy code"SFDC": {
"deployCDC": true,
"createMappingViews": true,
"createPlaceholders": true,
"datasets": {
"cdc": "",
"raw": "",
"reporting": "REPORTING_SFDC"
}
}
Explanation of Parameters:
Parameter | Meaning | Default Value | Description |
---|---|---|---|
SFDC.deployCDC | Deploy CDC | true | Generates Change Data Capture (CDC) processing scripts to run as DAGs in Cloud Composer. |
SFDC.createMappingViews | Create mapping views | true | Creates views in the CDC processed dataset to show the “latest version of the truth” from the raw dataset. |
SFDC.createPlaceholders | Create placeholders | true | Creates empty placeholder tables if they aren’t generated during ingestion, ensuring smooth downstream reporting deployment. |
SFDC.datasets.raw | Raw landing dataset | (user-defined) | The dataset where replication tools land data from Salesforce. |
SFDC.datasets.cdc | CDC processed dataset | (user-defined) | Source for reporting views and target for records processed by DAGs. |
SFDC.datasets.reporting | Reporting dataset for SFDC | “REPORTING_SFDC” | Name of the dataset accessible for end-user reporting, where views and user-facing tables are deployed. |
Salesforce Data Requirements
Table Structure:
- Naming: Tables use snake_case (e.g.,
some_objects
) and are plural. - Data Types: Columns maintain the same types as in Salesforce.
- Empty Tables: Missing required tables are automatically created as empty during deployment to ensure smooth CDC execution.
- CDC Requirements: The
Id
andSystemModstamp
fields are crucial for CDC tracking, ensuring changes are captured properly.
Loading SFDC Data into BigQuery
The Cortex Framework offers several methods for loading Salesforce data into BigQuery:
- API Calls: Directly access Salesforce data via API calls and load it into a BigQuery “raw” dataset.
- Structure Mapping Views: Translate existing data in BigQuery into a format compatible with Cortex Framework’s reporting features.
- CDC Processing Scripts: For constantly changing data, CDC scripts can track changes and update BigQuery accordingly.
CDC Processing
The CDC scripts rely on two key fields:
- Id: Unique identifier for each record.
- SystemModstamp: Timestamp indicating when a record was last modified.
You can adjust the CDC processing to handle different field names or add custom fields to suit your data schema.
Configuration of API Integration and CDC
To configure Salesforce data integration into BigQuery, Cortex provides the following methods:
- Cortex API Scripts: Replicate data from Salesforce or a replication tool.
- Append Records Only: For tools that only append records, create CDC scripts to manage changes.
Example Configuration (settings.yaml):
yamlCopy codesalesforce_to_raw_tables:
- base_table: accounts
raw_table: Accounts
api_name: Account
load_frequency: "@daily"
Data Mapping and Polymorphic Fields
Cortex Framework supports mapping data fields to the expected format. For example, a field named unicornId
in your source system would be mapped to AccountId
in Cortex with the string data type.
Polymorphic Fields: Fields whose names vary but have the same structure can be mapped in Cortex using [Field Name]_Type, such as Who_Type
for the Who.Type
field in the Task object.
Modifying DAG Templates
You can customize DAG templates as needed for CDC or raw data processing. To disable CDC or raw data processing from API calls, set deployCDC=false
in the configuration file.
Setting Up the Extraction Module
Follow these steps to set up the Salesforce to BigQuery extraction module:
- Set up Credentials: Ensure your Salesforce user has the correct permissions and security token.
- Create a Connected App: Enable API integration for Salesforce with OAuth settings.
- Secret Manager: Use Google Secret Manager to securely store Salesforce credentials.
Cloud Composer Setup
To run Python scripts for replication, install the necessary Python packages depending on your Airflow version.
For Airflow 2.x:
bashCopy codegcloud composer environments update my-composer-instance
--location us-central1
--update-pypi-package apache-airflow-providers-salesforce>=5.2.0
Security and Permissions
Ensure Cloud Composer has access to Google Secret Manager for retrieving stored secrets, enhancing the security of sensitive data like passwords and API keys.
Conclusion
By following these steps, you can successfully integrate Salesforce workloads into Cortex Framework, ensuring a seamless data flow from Salesforce into BigQuery for reporting and analytics.