Apache Iceberg Archives - gettectonic.com

Once Upon a Time in Data Land

Once Upon a Time in Data Land: Building the Artificial Intelligence-Ready Warehouse In the early days of data, businesses simply wanted to know what had already happened in the past. Questions like “How many units shipped?” or “What were last month’s sales?” drove the first major digital settlements—the Digitally Filed Data Warehouse. Looking back this seems like the aluminum carport you can have erected in your driveway. The Meticulously Organized Library (The Digitally Filed Data Warehouse Era) Imagine a grand, meticulously organized library. Data from sales, finance, and inventory wasn’t just dumped inside—it went through ETL (Extract, Transform, Load), where it was cleaned, standardized, and structured into predefined formats. Need quarterly sales figures? They were always in the same place, ready for reliable reporting. But then, the world outside got messy. Suddenly, businesses weren’t just dealing with neat rows and columns—they faced website clicks, customer emails, sensor data, social media streams, images, and videos. The rigid Digitally Filed Data Warehouse struggled to adapt. Trying to force unstructured data through ETL was like trying to shelve a waterfall—slow, expensive, and often impossible. The Everything Shed (The Rise of the AI-Powered Warehouse) Enter the AI-Powered Warehouse—a vast, flexible storage space built for raw, unstructured data. Instead of forcing structure upfront, it embraced “store first, organize later” (schema-on-read). Data scientists could explore everything, from tweets to video transcripts, without constraints. But freedom had a cost. Without governance, many AI-Powered Warehouses became “data swamps”—cluttered, unreliable, and slow. Finding clean, trustworthy data was a treasure hunt, and building reliable AI pipelines was a challenge. Organizing the Shed (The AI-Ready Warehouse Paradigm) The solution? Structure without sacrifice. The AI-Ready Warehouse kept the flexibility of raw storage but added intelligence on top. Technologies like Delta Lake, Apache Iceberg, and Apache Hudi introduced:✔ ACID transactions (no more corrupted data)✔ Data versioning (“time travel” to past states)✔ Schema enforcement (order without rigidity)✔ Performance optimizations (speed at scale) A key innovation was the Medallion Architecture, organizing data by quality: This hybrid approach unified BI dashboards, analytics, and machine learning—all on the same foundation. The AI Factory (The Modern AI-Functioning Warehouse) Just as businesses adapted, AI evolved. Generative AI, autonomous agents, and real-time decision-making demanded more than batch-processed data. The AI-Ready Warehouse transformed into a fully integrated AI factory, built for: 🔹 Real-Time & Streaming Data 🔹 Seamless MLOps Integration 🔹 Vector Databases & Embeddings 🔹 Robust AI Governance Why This Matters for AI Agents Autonomous AI agents don’t just analyze data—they act on it. The AI-Functioning Warehouse gives them:✔ Context: Real-time data + historical insights✔ Consistency: Features match training data✔ Memory: Logged actions for continuous learning The Future: An AI-Native Data Ecosystem The journey from Digitally Filed Data Warehouse to AI-Powered Warehouse to AI-Functioning Warehouse reflects a shift from static reporting to dynamic intelligence. For businesses embracing AI, the question is no longer “Do we need a data strategy?” but “Is our data foundation AI-ready?” The answer will separate the leaders from the laggards in the age of AI. Next Steps: The future belongs to those who build not just for data, but for AI. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

May 18, 2025in Artificial Intelligence, Data, Generative AI0 Comments

08Oct

Data Cloud Model

While many Salesforce products like Sales Cloud, Service Cloud, Education Cloud, Health Cloud, and various industry clouds are built on a common ‘core’ platform that shares the same datastore (an Oracle relational database), Salesforce Data Cloud operates on (Data Cloud Model) a distinct architecture and technology stack. This insight explores the data models and related concepts that underpin the platform. Data Cloud Architecture and Storage Starting with the storage layer, Salesforce Data Cloud integrates multiple services: This combination enables Data Cloud to provide a petabyte-scale data store, overcoming the scalability and performance constraints of traditional relational databases. The physical architecture of Data Cloud is represented as a set of data objects, which form the foundation for how data is ingested, harmonized, and activated within the platform. Key Data Cloud Objects Data Source A Data Source is the initial data layer in Data Cloud, representing an external platform or system from which your data originates. Data Sources can include: Data Stream A Data Stream is an entity extracted from a Data Source, such as ‘Orders’ from Commerce Cloud, ‘Contacts’ from Sales Cloud, or ‘Subscribers’ from Marketing Cloud. Once a Data Source is connected to Data Cloud, Data Streams provide access to these entities and require assignment to a category: Profile, Engagement, or Other. A single Data Source can contain multiple Data Streams. Data Source Object (DSO) Data Streams are ingested into Data Source Objects (DSOs), which serve as temporary staging stores containing data in its raw, native format (e.g., CSV files). Minor transformations can be applied to fields during data ingestion. Data Lake Object (DLO) The next step in the data flow is the Data Lake Object (DLO). DLOs are typed, schema-based, materialized views that reside in the data lake (Amazon S3) and are generally stored as Apache Parquet files. This format, along with Apache Iceberg’s abstraction layer, supports efficient data storage and retrieval. DLOs are the first objects available for inspection and preparation of data, including field mapping and additional transformations. Data Model Object (DMO) Unlike DSOs and DLOs, which use a physical data store, Data Model Objects (DMOs) provide a virtual, non-materialized view into the data lake. Queries against a DMO return results based on the current data snapshot in DLOs, without storing them. DMOs can aggregate attributes from different Data Streams, Calculated Insights, and other sources. DMOs inherit their category from the first DLO mapped to them, and only DLOs with the same category can be mapped to the DMO. Like Salesforce objects, DMOs offer a canonical data model with pre-defined attributes (standard objects), and you can also create custom DMOs (custom objects). DMOs can have one-to-one or many-to-one relationships with other DMOs. Currently, there are 89 standard DMOs in Data Cloud, and this number is growing to support more use cases. DMOs are organized into various subject areas, including: Data Spaces Data Spaces offer logical partitions, allowing data to be segregated between different brands, regions, or departments, without the need for multiple Data Cloud instances. They also align with a Software Development Lifecycle (SDLC), enabling you to stage and test Data Objects in separate environments without affecting production data. Data Sources, Data Streams, and DLOs can be shared across Data Spaces, while DMOs and other platform features are isolated based on user permissions. Final Thoughts With the drop in storage costs and advancements in technology, companies now have vast datasets at their disposal. Every customer interaction, whether it’s a purchase, an email open, or a web page view, can be captured and stored. When properly organized, this data allows businesses to understand customers better, predict their needs, personalize interactions, and much more. However, one of Salesforce’s challenges is that its original platform, built over 20 years ago, was designed for a different era and struggles with handling voluminous, “big” data. Salesforce Data Cloud addresses these limitations with a modern architecture that overcomes the constraints of relational databases. Data Cloud is poised to become the backbone of the Salesforce platform, supporting the data needs of customers and products for the next two decades. If you’re interested in learning how to implement and integrate Salesforce Data Cloud in your organization, following best practices, reach out today. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

October 8, 2024in Data, Google, Salesforce, Salesforce Data Cloud, Snowflake, Technology, Unified Knowledge

04Sep

Data Integration with AWS Glue

The rapid rise of Software as a Service (SaaS) solutions has led to data silos across different platforms, making it challenging to consolidate insights. Effective data analytics depends on the ability to seamlessly integrate data from various systems by identifying, gathering, cleansing, and combining it into a unified format. AWS Glue, a serverless data integration service, simplifies this process with scalable, efficient, and cost-effective solutions for unifying data from multiple sources. By using AWS Glue, organizations can streamline data integration, minimize silos, and enhance agility in managing data pipelines, unlocking the full potential of their data for analytics, decision-making, and innovation. This insight explores the new Salesforce connector for AWS Glue and demonstrates how to build a modern Extract, Transform, and Load (ETL) pipeline using AWS Glue ETL scripts. Introducing the Salesforce Connector for AWS Glue To meet diverse data integration needs, AWS Glue now supports SaaS connectivity for Salesforce. This enables users to quickly preview, transfer, and query customer relationship management (CRM) data, while dynamically fetching the schema. With the Salesforce connector, users can ingest and transform CRM data and load it into any AWS Glue-supported destination, such as Amazon S3, in preferred formats like Apache Iceberg, Apache Hudi, and Delta Lake. It also supports reverse ETL use cases, enabling data to be written back to Salesforce. Key Benefits: Solution Overview For this use case, we retrieve the full load of a Salesforce account object into a data lake on Amazon S3 and capture incremental changes. The solution also enables updates to certain fields in the data lake and synchronizes them back to Salesforce. The process involves creating two ETL jobs using AWS Glue with the Salesforce connector. The first job ingests the Salesforce account object into an Apache Iceberg-format data lake on Amazon S3. The second job captures updates and pushes them back to Salesforce. Prerequisites: Creating the ETL Pipeline Step 1: Ingest Salesforce Account Object Using the AWS Glue console, create a new job to transfer the Salesforce account object into an Apache Iceberg-format transactional data lake in Amazon S3. The script checks if the account table exists, performs an upsert if it does, or creates a new table if not. Step 2: Push Changes Back to Salesforce Create a second ETL job to update Salesforce with changes made in the data lake. This job writes the updated account records from Amazon S3 back to Salesforce. Example Query sqlCopy codeSELECT id, name, type, active__c, upsellopportunity__c, lastmodifieddate FROM “glue_etl_salesforce_db”.”account”; Additional Considerations You can schedule the ETL jobs using AWS Glue job triggers or integrate them with other AWS services like AWS Lambda and Amazon EventBridge for advanced workflows. Additionally, AWS Glue supports importing deleted Salesforce records by configuring the IMPORT_DELETED_RECORDS option. Clean Up After completing the process, clean up the resources used in AWS Glue, including jobs, connections, Secrets Manager secrets, IAM roles, and the S3 bucket to avoid incurring unnecessary charges. Conclusion The AWS Glue connector for Salesforce simplifies the analytics pipeline, accelerates insights, and supports data-driven decision-making. Its serverless architecture eliminates the need for infrastructure management, offering a cost-effective and agile approach to data integration, empowering organizations to efficiently meet their analytics needs. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

September 4, 2024in Salesforce

14May

Zero Copy from Salesforce

When was the last time you relocated? Remember the hassle of packing up all your stuff, loading it onto a truck, and unpacking everything at your new digs, hoping it all survived the journey? Now, imagine if your furniture and possessions could simply teleport to your new location in perfect condition. While this may not be possible (yet) in the physical world, the concept of zero copy integration offers a similar seamless experience for handling customer data. Zero Copy from Salesforce. Zero copy integration, also known as zero ETL (extract-transform-load), allows data to be shared among multiple data stores without the need for actual movement or duplication. This is particularly advantageous for companies utilizing cloud data warehouses like Snowflake or Google BigQuery. Some companies hesitate to adopt a customer data platform (CDP) due to concerns about duplicating data. However, with zero copy integration, users can leverage the benefits of a CDP—such as data harmonization, identity management, built-in analytics, and activation—without the drawbacks of physical data movement. Some functions related to data have advanced with astonishing speed: generation, dissemination, storage, even security (and yes, that one still has many challenges). But there are others still mired in the stone age—call it the 80s—of the digital era. For example, think copies. How many business professionals still make multiple copies of sensitive data, just as it’s been done for the last four decades? What does this endless duplication do for regulatory risk, security threats and privacy liabilities? This is a symptom of a larger problem: Far from the data-centric universe we all envisioned, we’re beholden instead to the apps used to create, store, and distribute data. There’s an app for everything, but also a database for every app, leading to dozens of data silos undergoing hundreds of integrations, increasing fragmentation, complexity, and costs. The basic idea, according to Dan DeMers, Cinchy’s CEO, is that the framework aims to remove application data silos by using access-based data collaboration versus standard API-base data integration that involves copying data and branding it with complex app-specific coding. This would be done by access controls set in the data layer. It would also involve: What is Zero Copy Integration? Zero copy integration enables access to data stored in multiple databases simultaneously without the need to move, copy, or reformat the data. This approach not only facilitates faster and easier access to data but also reduces costs and minimizes the risk of errors associated with data movement or transformation. Comparing Traditional Methods with Zero Copy Integration: Traditional Methods: Zero Copy Integration: How It Works: From CDP to Data Warehouse In this scenario, data is accessed from the CDP and shared with the data warehouse, a process known as data sharing. The steps typically involve: How It Works: From Data Warehouse to CDP Conversely, in this scenario, data is accessed from the data warehouse and integrated into the CDP, a process referred to as data federation. The steps typically involve: Real-World Application: How Buyers Edge Utilizes Zero Copy Technology Buyers Edge, a leading procurement optimization company in the food service industry, leverages zero copy technology to access purchase data stored in a data warehouse while building a unified customer profile in a CDP. This allows them to provide better customer insights to their sales and marketing teams. By eliminating the need for data movement, duplication, or reformatting, zero copy technology streamlines data access, harmonizes data for insights and analytics, and provides a real-time holistic view of customers. As data volumes continue to grow exponentially, technologies like zero copy integration will play a crucial role in simplifying data management and facilitating seamless access to valuable insights. Whether it’s optimizing business processes or enhancing customer experiences, zero copy integration empowers organizations to navigate the data landscape with agility and efficiency. Be a zero copy hero and unlock the full potential of your data management strategy. “In today’s digital landscape, companies struggle with islands of data spread across various systems. With this global ecosystem of partners, companies can access all of their data, no matter where it resides, and unlock the power of all of that data within Salesforce — creating more personalized customer interactions and establishing a foundation for trusted AI, in less time and at lower cost.” Brian Millham, President and Chief Operating Officer at Salesforce The Zero Copy Partner Network features initial partners Amazon Web Services (AWS), Databricks, Google Cloud, and Snowflake, and adds Microsoft, all committed to zero copy integrations with Salesforce that give customers a secure and cost-effective way to connect and take action on all of their data New zero copy support for data warehouses and data lakehouses built on open table formats, like Apache Iceberg, with Salesforce Data Cloud, removes the need to copy or move data, so customers can unlock their data, powering Customer 360 experiences with AI, automation, and analytics The network includes Salesforce ISV partners, building new Zero Copy Data Kits to bring valuable new datasets to Salesforce Data Cloud, and Salesforce SI partners including Accenture, Cognizant, Deloitte Digital, and PwC, helping customers with zero copy Salesforce Data Cloud implementations. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

May 14, 2024in Salesforce, Salesforce Data Cloud, Snowflake

07May

AI Data Foundation

In the era of AI, the Data Foundation is crucial for empowering AI-driven customer experiences. Data Cloud emerges as a unifying force, seamlessly integrating data to fuel transformative AI encounters and elevate customer-centricity. Beyond mere data management, Data Cloud represents a significant advancement, enabling profound insights by harmonizing diverse data sources with CRM data from the Salesforce platform. This convergence facilitates the unlocking of actionable insights critical for informed decision-making. In a strategic collaboration, Salesforce and AWS extend their partnership to enhance AI capabilities. AWS AI services are integrated into Salesforce’s Einstein Trust Layer, empowering Data Cloud with seamless access to AWS data services and compute resources. Additionally, Data Cloud and other Salesforce offerings are now accessible through the AWS Marketplace, streamlining procurement processes. This insight explores how Data Cloud unifies vast and varied business data with CRM data from the Salesforce Einstein Platform. It serves as a robust foundation for AI-powered customer experiences, providing businesses with unprecedented insights into their data universe. With Data Cloud, businesses can seamlessly combine CRM data with diverse sources, including transactional data, IoT device data, and social media interactions. This consolidation fosters a single source of truth, enhancing decision-making and the relevance of AI models. Unlike traditional approaches that involve laborious data movement, Data Cloud operates on AWS infrastructure, enabling seamless data connectivity and preparation without the need for ETL processes. Leveraging Apache Iceberg and Salesforce’s contributions, Data Cloud ensures data consistency, flexibility, and interoperability, essential for AI-driven insights. Moreover, Data Graphs offer a novel approach to assemble and rapidly access data collections from disparate sources, facilitating grounded AI experiences. Through Model Builder and Einstein Copilot Studio, businesses can seamlessly access Data Cloud data in Amazon SageMaker for custom AI model creation without ETL overheads. This partnership between Salesforce and AWS represents a paradigm shift in data management and AI integration. By combining Salesforce’s customer-centric approach with AWS’s scalable infrastructure, Data Cloud empowers businesses to harness AI as a practical tool for growth and innovation in the digital era. Like1 Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

May 7, 2024in Data, Einstein 1 Platform, Salesforce, Salesforce Einstein

Apache Iceberg

Data Integration with AWS Glue

Recent Posts

Salesforce’s Enterprise General Intelligence

How Agentic AI is Redefining Customer Service

Data-Driven Decision-Making in the Age of AI

Salesforce Achieves FedRAMP High Authorization for Agentforce

A Strategic Approach to Governing Enterprise AI Systems

Contact Us

Be in touch today — and start your business on a path to success.

Category

Archives