Description
In this session at DevOps Dreamin’, Ryan Cox, Technical Architect at Salesforce, explores how Data Cloud, Salesforce’s data platform that unifies enterprise data, integrates into enterprise architecture and the best practices for implementing DevOps around it. Ryan covers:
- What Data Cloud is and the benefits it brings
- Architecture/Data Cloud Components
- How to configure Data Cloud
- Data Cloud DevOps Pipelines
- Resources to get started with Data Cloud
Learn more:
- Salesforce Data Cloud
- Salesforce Data Cloud: key features and DevOps considerations
- Understanding Salesforce Data 360 (formerly Data Cloud) architecture, capabilities, and benefits
- Salesforce Agentforce: A complete guide
- Gearset’s Salesforce Agentforce deployment solution
- How to deploy Einstein bots in Salesforce
- How to deploy Agentforce: A simple guide for effective implementation
- Salesforce DevOps at enterprise scale eBook
Transcript
My name is Ryan Cox. I'm a technical architect at Salesforce, and I've been I've been here for six years. And I've been, last three or four years of that, I've been covering, high-tech accounts, very large enterprises like Amazon, Google, Cisco, DoorDash, and people maybe, you know.
But the last year and a half or so since Data Cloud has really been GA, I've been focusing only on data cloud pretty much, living and breathing in it, and, it's it's still pretty nascent. I mean, there's a lot there's not a whole lot of people, actually, that are in production with it. There are a growing number of people, but this year is a very big push, ever you may have seen, where data cloud's kind of the underpinning of all the Einstein Gen AI stuff that our company has been Salesforce has been promoting, right? So what I'm gonna do is kind of go through, a high level overview of Data Cloud very quickly, but primarily to show how Data Cloud fits into an enterprise architecture and what the primary components are inside of Data Cloud that, you need to know about to incorporate into, like, your DevOps process.
I haven't really seen very much, and there's not been a whole lot of materials yet around how Data Cloud really really fits into a DevOps process for, like, your normal Salesforce deployment process. So that's what I'm starting to address here, to show you what that would look like.
And I'll talk about what's coming, in, kind of roadmap.
So forward looking statement. I do have a few things in here that are roadmap items that I'll point out along the way, so just be aware.
So, if you've seen any of the newer announcements, kind of starting back earlier this year at Trailhead DX, I think is when this was actually announced, is that Salesforce did a renaming of its Salesforce platform, which will probably change in two weeks from now.
But, no. So the overall platform as a whole now is called Einstein One, but really what that's that's kind of more, like, speaking to the fact that Salesforce is kind of aggressively trying to make data to data, data cloud, you know, the primary large, like, large volume big data data lake that's underpinning all of our Salesforce products.
Right? So it's quickly becoming kind of like the the main data tier that will be servicing, your Salesforce org, servicing, maybe multiple Salesforce orgs to bring in large amounts of data across your enterprise so that you can use it back inside of your CRM, and also so you can clean up your data or curate it where it can drive, you know, drive your predictive and, generative AI services that are being rolled out with our Einstein services, like Einstein Copilot, Prompt Builder, but it's also gonna be the underpinning data tier for Marketing Cloud, for Tableau, CRM analytics, things like that.
So Data Cloud kind of fits into this where it's it's really meant to unlock your trapped data. So it's a way to bring in large volumes of data from across your enterprise, from your Salesforce org, from multiple Salesforce orgs, from any data store, really, that you need to use to surface back to drive, you know, for your sellers inside of CRM, for your marketers and marketing cloud, for your analytics across Tableau, CRM analytics, things like that. So you bring the data together. You kind of clean it up, organize it in a way that you're curating the data, in the way that you need to use to serve your users in all the places that they're living in.
So you may have seen something like this. Some version of this has been floating around for a long time now, but the the way Data Cloud really works is the kind of going from left to right. It's around, you know, bringing data in in batch or streaming mode. We have, at the end of this year, we'll have a hundred plus connectors for bringing in data from Salesforce from all the Salesforce products Salesforce, Marketing Cloud, Commerce Cloud, as well as, data sources from across AWS, GCP, Azure.
We have this concept called bring your own lake, where you can actually just point to data from Snowflake, Databricks, BigQuery, things like that. Azure, Synapses and Fabric is on the road map. So you're bringing you're connecting to all these data sources, bringing them together, you're harmonizing them, and what harmonize means here is that you're organizing it in a common data model.
So no matter where the data is coming from, you can organize it into a common, normalized data model inside a data cloud, then you can unify it, unify your data. And what that means is that you're you're doing identity matching, for your accounts, for your contacts, your leads to come to, like, a unique set of customers, that kind of speaks to deduping. Not exactly deduping, but you're matching these profiles together, and have relationships to all that other behavioral engagement data that you might have ingested from all these other data sources, so that you then can act on it in interesting ways. You can build insights on top of it. You can build segment more accurate segment marketing segmentation to activate these segments out to your marketing tools.
You can, you know, have one place for analytics over your data cloud data, and then you also have more accurate, grounding of generated AI services that we're surfacing back inside of Salesforce, but not just in Salesforce, in Salesforce, in Marketing Cloud, across all the touch points that Salesforce has. Right? Even Tableau pulls, for example, can utilize data coming at grounding, from data from data cloud data.
So, so I put together this architecture to talk through the main components that are involved with Data Cloud. So this is actually how you go about configuring Data Cloud, how it fits into your enterprise architecture, and then since this presentation is for DevOps, it's more around speaking around so you understand kind of the basics of the components that are gonna be metadata inside a data cloud that you would then need to move from a lower environment into production, similar to how you do your deployment process for Salesforce metadata today.
Right, so when you get Data Cloud or the way you get Data Cloud, I should say is you have a production Salesforce org, and you enable Data Cloud, and Data Cloud is now a separate instance, so it's a different tech stack. It's a data cloud tenant, we call it, and that Salesforce org from which you enable data cloud is now called your data cloud home org. So that home org is the org that, then has a you can see here this data cloud app. So that's the UI interface for configuring Data Cloud that gets enabled when you enable the Data Cloud instance, and, this is the org in which your admins log in and do all the configuration of Data Cloud.
So all the other things I'm going to show in this build out of this architecture is all driven through this data cloud app interface, inside of your home org. Right? So all the things that you're doing inside of data cloud, you to get started, you're setting up data streams to ingest data from any of your data sources. So, this could be you know, including data from Salesforce orgs, any number of Salesforce orgs, and each data stream represents a different object inside of a data inside of a Salesforce org.
So, we have these out of the box connectors for bringing data in in batch on a schedule from Salesforce orgs, from b to c commerce, from, marketing cloud, and from all the data kind of a growing number of data stores inside of AWS, GCP, Azure, like S3, Redshift, the equivalent for GCP and Azure.
Then you also have well, so let me back up a second here. So, some main components here to pay attention to for, like, data cloud nomenclature is, so you're setting up data streams, so each data stream maps to a different connector, a different object, or a different data table inside of one of these source systems. The data lands in what's called a data lake object, so these are these are objects inside the data cloud, but it's mostly just for storing the raw records of the data being ingested into data cloud. That's where the land the data lands. Now we have a mapping tool and a data transform tool where you can do kind of a you're doing a mapping into the data model objects. So, data model object is another type of of data object, but that is your common data model. So that's like a materialized view across your data lake objects that represents your common normalized data model.
So if you're bringing in, you know, product usage data or accounts or whatever from all these different systems, you're mapping everything into a common data model so everybody's looking and thinking about your data in the same place across all of your enterprise. Right?
An example, this gentleman was, from Bank of Montreal was saying that they have a Financial Service Cloud org, and they have another org that's not on Financial Service Cloud. Right? So you could bring all that data from multiple orgs together, map it into the objects that you use for Financial Services Cloud, but without having to do anything in the org that doesn't have it, right? So, you have one single kind of common data model regardless of where the data is coming from, right?
So you we have, three different methods of consuming of ingesting data. The first one I mentioned was, batch ingestion through these connectors. We have near real time streaming ingestion, so you can bring data in through a variety of streaming APIs. So we have APIs for receiving events from websites, from any website, from, mobile apps, for things that we don't have, connectors for yet.
We have the ingestion API that you can write your own integration to bring data in and streaming in batch mode, and then we also have the bring your own lake technology. So this is a way to bring to bring data together into data cloud from existing data lakes, like from Snowflake, from Redshift, BigQuery, Databricks. You're virtualizing the data from those source systems, So these tables are actually it's called we kind of refer to it as metadata unification, so we're bringing metadata about those tables into data cloud. You can still it can still participate in mapping into your canonical data model, but the data is only retrieved when it's queried from within Data Cloud or anything else that you're doing in Data Cloud.
It's not actually copied into Data Cloud.
And then from there, there are all these things that you can do, all these capabilities that Data Cloud has, so identity resolution for doing matching of accounts and individuals from contacts and leads.
You can build insights, streaming and batch calculated insights.
Ultimately, for this audience, these are really just SQL statements that are, joint can be joins across datasets regardless of where the data is coming from and then, like, kind of the equivalent of role it fills inside of inside of Salesforce.
So you can do calculations over large volumes of data, and it materializes a different view with calculations in it. But for things that you're actually moving from a lower environment into a production environment for a data cloud configuration, these are really just SQL statements wrapped into to what we call insights.
And then from those, you can and from anything in your data model, you can fire data actions, so there's a way to event on do eventing off of data changing inside a data cloud. So when things are getting created, updated inside of data cloud, you can fire data actions that will fire either a platform event into a Salesforce org any Salesforce org. It doesn't have to be that home org that I mentioned.
You can send a payload to a webhook, or you can fire off a marketing journey inside of Marketing Cloud or send an email.
So one of the other kind of main things that you're doing with Data Cloud in the Salesforce ecosystem is is now surfacing that unified view of your customer inside of data cloud, along with any of the engagement data or behavioral data that you've collected from across your enterprise back into your Salesforce org or orgs. Right? And we call that CRM enrichment. This allows you to visualize the data, like, as related list, say, on an account page.
You can use that data to ground prompts that are driving Einstein Copilot so you have more accurate data to go on for your generative AI services.
You can fire flows, build custom lightning web components that are retrieving data from data cloud. Right? Also, you don't have you have one place to go to for for all that data across your enterprise, and that's data cloud now instead of having to do point to point integrations to all these different systems from your individual Salesforce orgs, right?
So, and then you can also build predictive ML models off your data cloud data in what we call Einstein One Studio.
You can bring your own model from other sources to execute predictive models inside a data cloud, and then you can also build marketing segments and then activate those segments out to different marketing tools and add services.
Right? And then kind of ultimately have one place to go to. Everybody's looking at the same thing for analytics, so CRM analytics, Tableau, all understands the data model inside of data cloud, right?
And then we also have well, so stay here for a second. So now everything that I'm showing you here inside of the Data Cloud box, these are all different these are all new metadata types that Salesforce has brought along that with data cloud that when you're building data cloud or configuring data cloud, these are the components that are going to have to be reconfigured between a dev a lower environment and a production environment through the course of a DevOps process. Right?
We have this is kind of like the the breadth of the core of Data Cloud, the main use cases, and then this is a bit more detail around, how Data Cloud fits into our generative AI services where you can bring in data also into what's called a vector database inside a data cloud, where you can bring in unstructured data and then set up a search index over it. So you can now, from Einstein Copilot, from prompt builders inside of Data Cloud, inside of your core Salesforce org. You can do, retrieval augmented generation for those generative AI services to pull data and understand queries across both unstructured and structured data that you have inside of Data Cloud.
So, the let me go a bit further. So all those things that were inside of the Data Cloud box, right, how do you configure it? Right. So you configure it primarily through the admin UI that's inside of the home org that I mentioned. So that's the main Salesforce org from which you're controlling everything in Data Cloud, including all the configuration of all those different capabilities that I mentioned as well as security for what users have access to Data Cloud, what users have access to doing the configuration for different things. You know, there's different roles that people play that you might offer, you know, admin access for marketers to build segmentation inside of, inside of Data Cloud, and then other admins just care about the data model, for example, right?
So the Salesforce admin UI, which is that data cloud UI, there is, the Salesforce metadata API is starting to use, starting to offer pieces of data cloud metadata as a way to retrieve and update the metadata through that API.
I'll show you more about that in a second. And then we also have a way to package different pieces of data cloud together, primarily the data model. So, we have this new concept called data kits, and so data kits is a new kind of container that you can package in, so just through the normal packaging interface that we have inside of data inside of Salesforce, inside of Data Cloud now, You can group together all the things here that I'm showing I'm highlighting here. So, primarily, your data model, your data streams, and the mappings between the data lake objects and the data model objects.
Data transforms are included there, and then the streaming insights, streaming calculated insights. So these can all be, kind of put into a data kit. The data kit then is put into a package, like an unmanaged package and managed package.
Both supports first and second generation packaging, and then you can take that package and install these data streams, these data models into another data cloud instance.
Then the metadata API, I mean, I didn't really say this at the beginning, but not I didn't focus on it, but DevOps or Data Cloud is very, very new, and you can tell that this picture here shows that, right? The metadata API only supports very few things in that overall picture, primarily, the data model, which is kind of what you're focusing on to get started, but everything else in this picture that's not there or in this either in the data kits or in the metadata API, everything else has to be done through the admin UI manually, include like segmentation, data actions, the predictive objects, things like that.
And then, but there's another component of that, right? So I mean, all the configuration of Data Cloud is is the beginning. Now how you use the what you're using what you're doing inside of data cloud back inside of your Salesforce orgs. So those are all the normal components that you would be configuring and building inside of Salesforce, right? So those would be flows that are retrieving data from Data Cloud. Those are going to be record trigger flows that are triggered off of Data Cloud objects.
Those can be how you ground Einstein Copilot actions and prompt templates, things like that, So, those are all just metadata components that are normally a part of the normal Salesforce process of deploying, you know, metadata from one org to another.
There are a few, new metadata types that are brought along with the Einstein stuff that's like prompt templates, Copilot actions. These are new metadata types, but they're all supported with, like, just kind of the normal self source metadata that you could support from GearSat or any other deployment tool.
So how do you get started with a lower environment in Data Cloud, right? So what I was showing you before was actually a production org that you enable Data Cloud instance off of, and that's where it is.
That's basically been the story up until, well, now, but up until summer of this year. So we're still in this time period where we don't really have the concept of a lower environment in data cloud at all. So the way that people have been running POCs, pilot implementations to do something that's not in production is by talking to your account executive, buying a new Salesforce org, setting that up and configuring that as data your data cloud home org, and doing the configuration there. And then when you're ready to move to to production, you do all the things that I just talked about to move components or reconfigure components inside of the new production in summer, which is soon, not too far away, you know, end of June, July time frame, we'll have the concept of a data cloud sandbox finally, and, what that means is that when you spin up a sandbox from your production org, your data cloud instance will come along with it, and you will have a true data cloud sandbox that's tied to this core sandbox org.
Right? So I put together kind of an example, just a small example. There are many use cases for data cloud, but just to kind of show maybe one process of an example of, you know, why, somebody might use data cloud, This is this is actually a real example from one of my customers or several of my customers where they have, say, even if you just have one Salesforce org, and you have Snowflake, and you have any number of data services that you might be running on AWS or something like this, right? So typically, what people will do is they'll have some sort of ETL process in the picture where you're shuttling data on a schedule, like, every eight hours from Salesforce to Snowflake, doing something with it, and then pushing it back into Salesforce.
Right? So this is a whole ETL process that you have to go through. There are a lot of problems, sometimes that people run into, like governor limits, you know, API limits, storage limits, you know, that are reasons why you wouldn't want to do this, and then you have also, you know, all the point to point integrations that you might be doing for with all these bespoke services to pull data in and out of other external services, right? So these are the I'm not going to go through all of them, but I put this in here just so as an example for challenges, for benefits.
Maybe this will help you with your promotion of data cloud in your company, And then once you start talking about multiple orgs in your environment, data cloud really allow starts to shine in terms of, like, a way to future proof your data architecture, right? So as you start bringing, you know, in multiple org environments with just without data cloud, right, the integration challenges get a lot hairier, right?
You've got API integrations between the two different orgs, or you're pushing it to someplace else and then pulling it, right? There's all kinds of integration patterns that you have to think through, and Data Cloud kind of helps solve for a lot of that, right? Now you just have one place. You ingest data very quickly on a schedule. I didn't really talk about that exactly, but when you're bringing data in from Salesforce, we have something that's in pilot right now that'll be GA soon for ingesting data as quickly as possible from your Salesforce orgs. Like, within a minute, you can have data updated from your Salesforce orgs into Data Cloud, and it's all in that common data model, right?
So to kind of walk through an example here, today, this is what you're doing with a new production org, is that, like I was saying, you kind of spin up a new production org.
You, you configure data cloud from it. You kind of designate that as your development environment, right? So you designate it as your POC environment, your pilot implementation, whatever you want to call it.
You do all the configuration, and you may be, you know, the so this production home org here is dev, but the data that you might be that you would be ingesting for your Salesforce org would have to be, or should be anyway, sandbox data, right? So it's you've spun up a sandbox from your normal Salesforce org, and if you want to ingest that into data cloud, you would connect that up and ingest that.
Now, you go through the process of all those different methods that I mentioned of once you've built it, you've tested it, you want to go into production, now you have all these options, right? So you deploy, your data cloud configuration using data kits and packaging.
You can use some amount of metadata API if you want to, and then the rest is manual configuration to reconfigure everything inside of your production environment.
And then, a lot of those components that you've the CRM enrichment components that you might have built to surface that data back into your Salesforce org, you know, you go through the normal life cycle of deploying those metadata components into the core org for your production, right, that are dependent on data cloud. This is actually a sequence, so that's important to note actually because the many times, like, record trigger flows off of data cloud data, you're not going to be able to deploy them into your org until the data model is there inside the data cloud that it operates off of.
So here's another example.
So if you look at these, it's exactly the same process. It's just that this is what it would look like in a multi org environment, Right? So you're instead of you just hook up a different org, probably hopefully a sandbox, for your multiple orgs. And in this environment, you might be building other CRM enrichment components that are going to be in your second org to surface the same data in both orgs, so you're getting a single view of your customer for different reasons.
This is a very common pattern for, and low hanging fruit for Data Cloud also, right? So a lot of times people will have a service a sales org and a service org, right? The service org is for doing case management for customer support, and then you got a Salesforce a sales org. You know, common problems that people have is, okay, all my sellers in this sales org want to see my customer support cases for my accounts, right?
Data cloud is super easy to do that, right? You just bring all the data into Data Cloud and then surface it as, like, a related list inside of your account page, on the and also the other way too. Right? So inside of customer support tickets, if you're looking at it, a case, you might wanna understand what, you know, what products they own inside of that are coming from your Salesforce org or what what they have what you have available for opportunities for co sell, upsell, that kind of thing.
So as you can imagine, this is kind of this is a process, right?
What's coming in summer allows you to have this true sandbox environment, So what you're getting with that is that Data Cloud still has to be tied to a production org, so inside of your production org, you enable Data Cloud, and then when you create a sandbox, from it, any type of sandbox, can be dev, partial copy, full copy, It'll create a sandbox, a normal core sandbox, and it will bring along with it a new data cloud instance that is tied to that sandbox, the Core Sandbox Instance, and your sandbox itself will become the sandbox homework for configuring your Data Cloud Instance.
Now, now you're going to go through the same process, right? You build, you do your configuration, and then you have all these mechanisms for deploying to data cloud config is, supposedly, hopefully, when Sandbox is Go GA, meant to be in winter this year, They we will also support, DevOps Center for moving, metadata from data cloud, sandboxes into production sandboxes, which kind of means that we'll support data cloud metadata from the SFDC DX, CLI.
Alright.
So to I I put this together to talk about different options here. So this would be like your first data cloud implementation.
So, say you have Data Cloud in production already, and you you've been running there for a while, and now you want to, to you have a new project, a new use case you want to implement.
What happens then is, if you spin up a new sandbox from your production org, all of the configuration that was in production will get copied into your data cloud sandbox instance but without the data, so there won't be any actual Data Cloud data in there.
Depending on your type of sandbox, you would still have data inside of your core sandbox but not inside of the data cloud instance, and then what you would have to do then is reconfigure the connections for your data stream, so the connections to the actual source systems. You have to do, like, a basically go through a re a reauthentication, because that reauthentication for those connections might be lower environments, like your other sandbox orgs, lower environments for for Snowflake or different, you know, different instances that represent data, and then, and then you would go through the same process. So that's kind of kind of the the overall process for what a sandbox looks like. Now, same picture. This is the last one.
Just to talk about just to show you multiple orgs in the picture, right? So you'd have to do the same thing with multiple orgs, if you wanted to go if you had multiple orgs in your environment.
My time's up, but I have some resources in this presentation, and I also have a, you know, I'll I'll have have a link to the, presentation that you can download later too. So