Cortex Framework: Integration with Salesforce (SFDC)

This insight outlines the process of integrating Salesforce (SFDC) operational workloads into the Cortex Framework Data Foundation. By integrating Salesforce data through Dataflow pipelines into BigQuery, Cloud Composer can schedule and monitor these pipelines, allowing you to gain insights from your Salesforce data. Cortex Framework Integration with Salesforce explained.

Prerequisite: Before configuring any workload integration, ensure that the Cortex Framework Data Foundation is deployed.

Configuration File

The config.json file in the Cortex Framework Data Foundation repository manages settings for transferring data from various sources, including Salesforce. Below is an example of how Salesforce workloads are configured:

jsonCopy code"SFDC": {
    "deployCDC": true,
    "createMappingViews": true,
    "createPlaceholders": true,
    "datasets": {
        "cdc": "",
        "raw": "",
        "reporting": "REPORTING_SFDC"
    }
}

Explanation of Parameters:

ParameterMeaningDefault ValueDescription
SFDC.deployCDCDeploy CDCtrueGenerates Change Data Capture (CDC) processing scripts to run as DAGs in Cloud Composer.
SFDC.createMappingViewsCreate mapping viewstrueCreates views in the CDC processed dataset to show the “latest version of the truth” from the raw dataset.
SFDC.createPlaceholdersCreate placeholderstrueCreates empty placeholder tables if they aren’t generated during ingestion, ensuring smooth downstream reporting deployment.
SFDC.datasets.rawRaw landing dataset(user-defined)The dataset where replication tools land data from Salesforce.
SFDC.datasets.cdcCDC processed dataset(user-defined)Source for reporting views and target for records processed by DAGs.
SFDC.datasets.reportingReporting dataset for SFDC“REPORTING_SFDC”Name of the dataset accessible for end-user reporting, where views and user-facing tables are deployed.

Salesforce Data Requirements

Table Structure:

  • Naming: Tables use snake_case (e.g., some_objects) and are plural.
  • Data Types: Columns maintain the same types as in Salesforce.
  • Empty Tables: Missing required tables are automatically created as empty during deployment to ensure smooth CDC execution.
  • CDC Requirements: The Id and SystemModstamp fields are crucial for CDC tracking, ensuring changes are captured properly.

Loading SFDC Data into BigQuery

The Cortex Framework offers several methods for loading Salesforce data into BigQuery:

  1. API Calls: Directly access Salesforce data via API calls and load it into a BigQuery “raw” dataset.
  2. Structure Mapping Views: Translate existing data in BigQuery into a format compatible with Cortex Framework’s reporting features.
  3. CDC Processing Scripts: For constantly changing data, CDC scripts can track changes and update BigQuery accordingly.

CDC Processing

The CDC scripts rely on two key fields:

  • Id: Unique identifier for each record.
  • SystemModstamp: Timestamp indicating when a record was last modified.

You can adjust the CDC processing to handle different field names or add custom fields to suit your data schema.

Configuration of API Integration and CDC

To configure Salesforce data integration into BigQuery, Cortex provides the following methods:

  • Cortex API Scripts: Replicate data from Salesforce or a replication tool.
  • Append Records Only: For tools that only append records, create CDC scripts to manage changes.

Example Configuration (settings.yaml):

yamlCopy codesalesforce_to_raw_tables:
  - base_table: accounts
    raw_table: Accounts
    api_name: Account
    load_frequency: "@daily"

Data Mapping and Polymorphic Fields

Cortex Framework supports mapping data fields to the expected format. For example, a field named unicornId in your source system would be mapped to AccountId in Cortex with the string data type.

Polymorphic Fields: Fields whose names vary but have the same structure can be mapped in Cortex using [Field Name]_Type, such as Who_Type for the Who.Type field in the Task object.

Modifying DAG Templates

You can customize DAG templates as needed for CDC or raw data processing. To disable CDC or raw data processing from API calls, set deployCDC=false in the configuration file.

Setting Up the Extraction Module

Follow these steps to set up the Salesforce to BigQuery extraction module:

  1. Set up Credentials: Ensure your Salesforce user has the correct permissions and security token.
  2. Create a Connected App: Enable API integration for Salesforce with OAuth settings.
  3. Secret Manager: Use Google Secret Manager to securely store Salesforce credentials.

Cloud Composer Setup

To run Python scripts for replication, install the necessary Python packages depending on your Airflow version.

For Airflow 2.x:

bashCopy codegcloud composer environments update my-composer-instance 
  --location us-central1 
  --update-pypi-package apache-airflow-providers-salesforce>=5.2.0

Security and Permissions

Ensure Cloud Composer has access to Google Secret Manager for retrieving stored secrets, enhancing the security of sensitive data like passwords and API keys.

Conclusion

By following these steps, you can successfully integrate Salesforce workloads into Cortex Framework, ensuring a seamless data flow from Salesforce into BigQuery for reporting and analytics.

Related Posts
Salesforce OEM AppExchange
Salesforce OEM AppExchange

Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more

The Salesforce Story
The Salesforce Story

In Marc Benioff's own words How did salesforce.com grow from a start up in a rented apartment into the world's Read more

Salesforce Jigsaw
Salesforce Jigsaw

Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more

Health Cloud Brings Healthcare Transformation
Health Cloud Brings Healthcare Transformation

Following swiftly after last week's successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

author avatar
get-admin