Apache Iceberg Archives - gettectonic.com
Data Integration with AWS Glue

Data Integration with AWS Glue

The rapid rise of Software as a Service (SaaS) solutions has led to data silos across different platforms, making it challenging to consolidate insights. Effective data analytics depends on the ability to seamlessly integrate data from various systems by identifying, gathering, cleansing, and combining it into a unified format. AWS Glue, a serverless data integration service, simplifies this process with scalable, efficient, and cost-effective solutions for unifying data from multiple sources. By using AWS Glue, organizations can streamline data integration, minimize silos, and enhance agility in managing data pipelines, unlocking the full potential of their data for analytics, decision-making, and innovation. This insight explores the new Salesforce connector for AWS Glue and demonstrates how to build a modern Extract, Transform, and Load (ETL) pipeline using AWS Glue ETL scripts. Introducing the Salesforce Connector for AWS Glue To meet diverse data integration needs, AWS Glue now supports SaaS connectivity for Salesforce. This enables users to quickly preview, transfer, and query customer relationship management (CRM) data, while dynamically fetching the schema. With the Salesforce connector, users can ingest and transform CRM data and load it into any AWS Glue-supported destination, such as Amazon S3, in preferred formats like Apache Iceberg, Apache Hudi, and Delta Lake. It also supports reverse ETL use cases, enabling data to be written back to Salesforce. Key Benefits: Solution Overview For this use case, we retrieve the full load of a Salesforce account object into a data lake on Amazon S3 and capture incremental changes. The solution also enables updates to certain fields in the data lake and synchronizes them back to Salesforce. The process involves creating two ETL jobs using AWS Glue with the Salesforce connector. The first job ingests the Salesforce account object into an Apache Iceberg-format data lake on Amazon S3. The second job captures updates and pushes them back to Salesforce. Prerequisites: Creating the ETL Pipeline Step 1: Ingest Salesforce Account Object Using the AWS Glue console, create a new job to transfer the Salesforce account object into an Apache Iceberg-format transactional data lake in Amazon S3. The script checks if the account table exists, performs an upsert if it does, or creates a new table if not. Step 2: Push Changes Back to Salesforce Create a second ETL job to update Salesforce with changes made in the data lake. This job writes the updated account records from Amazon S3 back to Salesforce. Example Query sqlCopy codeSELECT id, name, type, active__c, upsellopportunity__c, lastmodifieddate FROM “glue_etl_salesforce_db”.”account”; Additional Considerations You can schedule the ETL jobs using AWS Glue job triggers or integrate them with other AWS services like AWS Lambda and Amazon EventBridge for advanced workflows. Additionally, AWS Glue supports importing deleted Salesforce records by configuring the IMPORT_DELETED_RECORDS option. Clean Up After completing the process, clean up the resources used in AWS Glue, including jobs, connections, Secrets Manager secrets, IAM roles, and the S3 bucket to avoid incurring unnecessary charges. Conclusion The AWS Glue connector for Salesforce simplifies the analytics pipeline, accelerates insights, and supports data-driven decision-making. Its serverless architecture eliminates the need for infrastructure management, offering a cost-effective and agile approach to data integration, empowering organizations to efficiently meet their analytics needs. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
AI Data Foundation

AI Data Foundation

In the era of AI, the Data Foundation is crucial for empowering AI-driven customer experiences. Data Cloud emerges as a unifying force, seamlessly integrating data to fuel transformative AI encounters and elevate customer-centricity. Beyond mere data management, Data Cloud represents a significant advancement, enabling profound insights by harmonizing diverse data sources with CRM data from the Salesforce platform. This convergence facilitates the unlocking of actionable insights critical for informed decision-making. In a strategic collaboration, Salesforce and AWS extend their partnership to enhance AI capabilities. AWS AI services are integrated into Salesforce’s Einstein Trust Layer, empowering Data Cloud with seamless access to AWS data services and compute resources. Additionally, Data Cloud and other Salesforce offerings are now accessible through the AWS Marketplace, streamlining procurement processes. This insight explores how Data Cloud unifies vast and varied business data with CRM data from the Salesforce Einstein Platform. It serves as a robust foundation for AI-powered customer experiences, providing businesses with unprecedented insights into their data universe. With Data Cloud, businesses can seamlessly combine CRM data with diverse sources, including transactional data, IoT device data, and social media interactions. This consolidation fosters a single source of truth, enhancing decision-making and the relevance of AI models. Unlike traditional approaches that involve laborious data movement, Data Cloud operates on AWS infrastructure, enabling seamless data connectivity and preparation without the need for ETL processes. Leveraging Apache Iceberg and Salesforce’s contributions, Data Cloud ensures data consistency, flexibility, and interoperability, essential for AI-driven insights. Moreover, Data Graphs offer a novel approach to assemble and rapidly access data collections from disparate sources, facilitating grounded AI experiences. Through Model Builder and Einstein Copilot Studio, businesses can seamlessly access Data Cloud data in Amazon SageMaker for custom AI model creation without ETL overheads. This partnership between Salesforce and AWS represents a paradigm shift in data management and AI integration. By combining Salesforce’s customer-centric approach with AWS’s scalable infrastructure, Data Cloud empowers businesses to harness AI as a practical tool for growth and innovation in the digital era. Like1 Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
gettectonic.com