Data storage optimization for Salesforce

Join Alex Walter, Development Team Lead at Gearset, and Sunny Matharu, Chief Technology Officer at ThirdEye Consulting, as they discuss data storage optimization through archiving. In this breakout session from Gearset’s virtual summit, discover how to effectively manage Salesforce data and the benefits of using Gearset’s archiving solution.

In this webinar:

Understand the need for data archiving in Salesforce
Learn how to optimize storage and improve performance
Explore compliance considerations and regulatory requirements
Get a live demo of Gearset’s data archiving capabilities

Learn more:

Relevant videos:

Okay. So let's get underway. So, yeah, welcome to breakout room three where we're going to be talking about, data storage optimization.

So I'm Alex Walter. I'm one of the dev team leads here at Gearset, and I've been sort of mainly working in the, like, on the backup product and now onto a new product of data archiving, which we'll be talking about through this. And I'm delighted to be joined by, Sunny Matharu from Third Eye Consulting. And, Sunny, did you wanna do a quick intro?

Thank you, Alex. For those who don't know me, I'm Sunny. Hello. Lovely to meet you. I'm the CTO at Consulting, platinum Salesforce partner, no summit in modern speak. And really happy to be talking about probably one of my top three architectural subject areas, in terms of data, as crazy as it sounds. So thanks for having me, Alex.

Alright. Well, let's hope we can push it up by the end of the talk, to, the top one today. But, yeah. So, okay. So just to give a bit of an overview of what we're planning to talk about. So we're talk gonna be talking about, mainly data archiving and gear sets data archiving tool and why people might want to have such a tool and what what kind of needs that this is addresses.

And we'll open it up at the end for any questions that anyone has.

So, firstly, a bit about what is data archiving. So, it's good to give you just a very brief overview.

So what it is is a place to store Salesforce data off platform. So you can take the data that's in a Salesforce format, stored on Salesforce, move it to somewhere outside of your org, and but you'll be able to view it, safely and securely, and it will be, something that you could push back into Salesforce if you would need to.

So I've got, like, a little example here of let's pretend it's like a file Salesforce is a bit like a file system or these are records.

And we have in Salesforce, we only care about these ones in the blue dotted line here.

Whereas these couple at the bottom here, we don't really care about them too much.

They're like old stale records.

So what do we want to do about that?

So, we can, then if we buy, say, a solution such as an archiving solution, we've got, we'd be able to then select the files that we, don't care about too much, the stale files, copy them over into our archiving solution, remove them from Salesforce, and then you end up with a whole load of free space. So I guess there there's and so to sort of discuss why why you might be wanting that free space, should we, jump into a couple of, thoughts with, so I'll let let's, I'll run through this with the with the the help of Sunny. So, I guess so, Sunny, I'll open up with this question.

What reasons do you typically, come across for people wanting an archiving solution?

It's the main question, isn't it? So I've been in the Salesforce ecosystem and space since two thousand ten, two thousand eleven.

I was a Java and data warehouse person before that. So lots of experience with underlying data, and there are two main reasons that stick out.

One is sheer volume. Right? Having lots of data inhibits you in a number of different ways, either performance wise, limits wise, whatever else you might be, and I can get into that detail. And the other is regulatory. So we talked about GDPR in the in the session just before this. Obviously, UK has the data protection act, which is the equivalent of GDPR since we left the EU, but same same, but different.

That those two reasons are the ones that stick out the most. I've seen them in the wild. I've seen different implementations and ways to address those reasons or problems. Happy to go into detail.

Yeah. I guess, but I suppose should we start with maybe the, the storage concerns? So what what what kind of examples have you come across in your, many, your vast experience of consulting?

You must have come across, many different, customers.

Did you want to run into a few a few examples that you've come across there?

Yeah. For sure. So I've spent a lot of time working in insurance and financial services, alongside health care and life sciences and a few other things.

But insurance is one of those industries where data replicates and gets created very quickly, almost instantly.

And and the company I'll I won't mention names, but I'll talk about their business case and and the way that they operate.

They provide insurance for organizations. So let's say they're providing insurance for Third Eye staff. So health insurance for every employee of Third Eye. We're only seventy people, so perhaps not that big, but imagine we were twenty thousand. That's not unrealistic.

This company has contacts for twenty thousand people, policies for twenty thousand people, perhaps some policy lines, maybe five for policy, so or or fifty to a million policy lines, maybe a case per person, a claim per person, maybe some junction objects, Activities per person per employees are twenty thousand times, perhaps four or five. So you're you're touching a hundred k activities, claims, files, and uploads. And that I did some maths, like, some crude maths before jumping on this call.

Twenty k is not unrealistic for a size of the number of employees for for this company to take on. That's a million records in one year. One point two million conservatively if someone made one claim.

And that's I I I've got health insurance. I've got a.

I broke my leg playing football.

I've got RSI in my back and my neck. I get lots of things. Don't talk to my wife. She might say I complain a lot, but it's not unrealistic to have that data escalate and extrapolate really quickly.

And that's one customer. That's over a million records in one year. This company could have a hundred customers. And so that's a hundred and twenty million records in one year of not very much activity. Obviously, there are ways to manage that and using things like big object, external objects, not keeping things in Salesforce in the in in the first instance.

Equally, you need that data in Salesforce to start with. You know, when you're onboarding your company, when you wanna look at that claimants, previous claim history, or or personal data. You want to see that in Salesforce. It's transactional and current and important.

However, that can easily become stale.

So if third eye chooses a different insurance company or provider, what relevance does that data have for operational staff, you know, support or sales thereafter.

Perhaps might skew that view of the world in terms of how to look at reports in Salesforce. One, in terms of just correctness and valid validity of the data, but also performance. I I'm sure there are people that are very familiar with reports running slowly because your org is too big and you've got too many records.

So cleaning that data out, not just from a validity and corrector standpoint, but also from a performance management stand standpoint is really important. So that's the volume one. Hopefully, that that answers the question a little. If you've got some more, I'm happy to answer.

Yeah. I, yeah, I think volume is definitely one, especially with, I don't know, the the b to c kind of customers. So ones where you have huge numbers of customers, a lot of interactions. You might have a lot of cases.

Yeah. So say you have a single case for, like or say some like, someone like, every time someone sends an email, it creates a new case. These can seriously build up with, like, large large companies with a lot of, a lot of interactions.

Exactly.

And I guess from the, compliance side, what what examples have you got of, of customers who have faced problems there?

One of my favorite projects was for a HLS health care and life sciences customer based on the UK and the US.

Again, won't provide the names, but we implemented a multi org solution for them. They have fifty thousand internal employees, and they were all using the internal org. Then they had something called the patient org.

I won't give it a name, but this was an org that contained data about drugs that were being developed, PII data for patients, treatment data, and how that treatment was affecting certain people, the efficacy of it. So ultra sensitive data that's very identifiable. And if that got out into the into the world, that's really dangerous equally.

If that's living in an org when it shouldn't after that trial perhaps is closed or not relevant, is also not a good idea. I I did some homework, which isn't typical of me, But I was looking at the top twenty GDPR fines so far in the world, and this list is twenty years oh, sorry, two years old.

But Meta was surprising along with Amazon and a few other companies that that we all know the names of. But find one point two billion dollars last year for a GDPR breach.

And, overall, I think it was something like two billion in the last two years or or three years in terms of fines across organizations that we are all very familiar with because of breaches with respect to GDPR.

So the risk of data getting out and the financial and reputational risk of, you know, what you could be hit with in terms of litigation lawsuits.

There's a current class action lawsuit in terms of an organization in the US, a data breach, a a pharma health care life sciences competitor of that company I worked, with earlier.

A class action lawsuit in the US because of a data breach, and we just don't want that. Right? So the cleaner you can make your environments, the more you can, I guess, decouple data that might have been relevant once upon a time that isn't anymore because it's no longer relevant in terms of GDPR? You're not acting on those patients anymore.

The better and safer it is for you as an organization.

That's yeah. That's it. So that's a great point. I guess, like, the if you can move things away from people, like, having the access to the data, then you're much less likely to to have any, breaches at all.

Yeah.

Yeah. I'd say yeah. Like so what what would you say are, like, the most important requirements that when it comes to having an archiving solution?

Yeah. Two two schools of thought here. So every company has to make a choice, and that choice is, do you spend your time, effort, money on on what makes you money, you know, what your secret sources as as an organization, the reason you exist, or do you focus on everything that underpins it to your technical architecture, your systems?

The the company I just talked about, the the health care one, they chose the latter. They had a heavy internal dev team, so so lots of Java developers, dot net developers, data warehouse developers, the list goes on. And they chose to build something out that was home baked, home brewed, but also something that wasn't just affecting you in terms of your CapEx, your initial spend to stand the solution up, but the considerable OpEx of ongoing support internally because it's a bespoke solution that they built for them worked really well. I'd left by the time that I started, but that's option one.

The other is you focus on what makes you money. You let your sales team sell. You let your business leaders figure out why and how you should make money, how you compete against your competitors, so on and so forth. And you let tools and architecture that is built to handle all the complexity around data archiving do the job for you so that you can have your business users focus on what you should be doing, what data should be archived based on other dates or or other field statuses, whatever else it might be.

Let a tool like this that's data reliable solution, take care of the architecture, the how.

That's really the the that that's, I think, one of the biggest decisions. You you choose on you choose to build something yourself and take care of the how, or do you let the tools and the companies that know more about how to do it in the first place take care of that problem for you?

Yeah.

This is the, buy or build your own. And I guess it's like, if you've got the specialization in other areas of the business, maybe it's, it works out better to to make it a bit easier. Or, oh, okay. Just seeing a message there. But, yeah, to, be able to, pass that responsibility on to people who are sort of experts in, working on it.

It. I guess, yeah, I suppose, just like, what are there any other difficulties to keep in mind that you've seen people come across when they've been building their their own solution?

Yeah. I mean, there are some very particular ones. So, again, for people that are familiar with Salesforce.

Again, we we talked about profiles and permission sets earlier.

Having either visibility or not having visibility of certain fields or records can make all sorts of things go wrong when you try to pull data out of log.

Not having access to one field, for example, can make your entire backup job fall over if you haven't built guardrails around it. So, yeah, having tools and solutions that can take care of either fixing the problems for you around that or at least identifying them so you can do something about fixing the problems.

We've been part of the the closed beta, for the archival solutions. So a lot of our team have fed back into, the Gearset team about, you know, some of the things that we'd like to see and that we have seen.

I think that's a very specific one. The other is if you've got a solution that is hard to decipher.

But if the complexity is behind a code layer rather than a declarative layer, becomes very hard for a business person to have a snapshot or visual view of what's happening to their data.

So, again, having tooling that can pull out the complexity around other scheduling, filtering, shape of the data, all that. So you can see that declaratively is really helpful because you're not abstracting or obfuscating it behind a code layer. Nothing wrong with that, but it means that the skill set needed to interrogate the data or even see at a high level what's happening is very different.

Okay. Yeah. Well, there's some yeah. It was pretty good to get this feedback from, you and your colleagues as part of this, the pilot. But I guess well, speaking of which, shall we shall we jump into, some yeah. Like, to, do a quick demo of the of the data archiving solution.

Yeah. I can you see, Salesforce screen at the moment?

I can.

K. Cool. Well, let's just, let's run through what we what we have in here. So I've got an example here of a Salesforce org, and it has a whole load of jump data in there. So these are some leads.

And as you can see, the scan date is couple of years ago, rating's cold.

And it's basically not they're not really doing very much, and we just want to remove these from the org and leave, these important ones that we've got down at the bottom here.

So I'll jump into, archive that I've got set up previously and just kick it off for the time being.

And what this is gonna do is it's gonna copy the data, remove it from my Salesforce org.

And now I'll just run through how, like, how you go about actually setting up one of these. So we had one that was set up for the EU production.

Now I'll go and set one up for the US production.

So first of all, we'll check the, whether we got the right permissions to fuel the data and, delete the data. We could put that. We'll give it a name. Go ahead and create the archive.

So the next step is to look to check that have we got all the permissions to to view all the data. And, well, in this case, we have got all the read permission read access for all the fields in Salesforce.

So this is like, you wouldn't want to archive data, that you couldn't actually see or, see all the the values in. So you would end up with lost data if you were to delete those records. And it can also play havoc with, like, dependencies and things like that as well if you don't have access to all the, like, master detail relations and things like that.

And we can also set up, like, if anyone adds, like, adds any more fields and then they accidentally forget to add any permissions in, we can automatically pick them up.

Or if you have real concerns about sort of what should be exposed to this, this, integration user, You could you have got some other options of, manually doing it yourself and not running the archive if that's a if, you they don't have the permissions.

So now that we've got that set up, let's think about creating a policy.

So I'll try to do this again for leads, and then we'll think about, well, how do we want to filter them?

So, usually, you want to do something on a, like, a time based.

So I'll start with, say, a scan date is older than two years and say that I want a, a rating of cold as well. And so I can call these, cold leads.

And I can do a quick preview, just check-in my mind that, we've got something that seems about right. So we've got twenty five lead records. They're all down as cold and with a scan date that's two years in the past.

And we're not and we've still got quite a number of records left in place.

So if we're okay with that, we can then work out which relations that we need to get in place as well. So Gearset will automatically pick up on all of those. We've only got some, content document links here, but, we could have a whole number such as for accounts where you could could have many, many items that are all linked to it.

I can then go ahead and run a validation, which will check like, fetch all the data, make a copy of it, but won't actually do any deletion. And so you can be sure so just check on check on it first of all before you actually, like, do any deletion from your org.

So now that we, now let's go back to the one that we, we set up previously.

So we can jump in and look at the policies that we had on the EU production.

So, we've just had a so it looks like it's now completed, but we had a policy for the leads. So where we'd, get rid of, cold leads older than two years.

We had a similar thing for cases where it's like low priority cases, that have been resolved for, more than a year and then accounts as well. So we had some expired accounts that, had a low priority.

And you can do things like if if you say, if you wanted this to just run-in the background, so you we we click to to kick it off automatically, but you could, set it off to, to run automatically to a schedule. So it runs, like, on a weekly basis. So every Tuesday at twelve forty five in the morning.

So let's have a look at, when we just kicked off that, that, that archiving run, it's popped up on here, and we can have a look into, see what it did. So it looks like it's archived twenty lead records.

So we got a few down here.

So named and we can see in a bit more detail. We click on them.

You can you can either, you can see, like, the details there where you could either download these to a CSV or you could restore this back into Salesforce if you wanted to.

Let's just check-in our Salesforce org. So we've previously had forty two leads, and, hopefully, it should have gone down to twenty two. So we have made a copy of those records, pulled them into gear set, and removed them from our org.

So other things that we can, we can do in here of, just say that we had something that we've archived quite a long time ago.

So it was archived, say, a couple of weeks ago, and all that we have is a, like, we just know know the name of the record.

So let's say I have a lead, and all I know is that I have, I I know that it had the word Thompson Lee in there. And so if I do a search for it, I can then find, find my record.

I can see all the details here. I can either download it, restore it to Salesforce, or just say there is something around, like, a GDPR concern. You could actually remove it from the archive completely if someone has a, right to be forgotten request as well.

So, it it shows that it's, like, quite a simple, easy thing to set up.

It's easy to view the find the data once it's all been archived away, and it's very easy to set up multiple policies. So, say, I could put a, a policy for, say, cold leads if they're older than two years, but I could have, like, warm leads, if they're older than five years. So and they're expired on that basis.

So that's that was a quick run through of, some of the, the things that we have to in the, like, in the demo the archiving tool, and I'll open it up for any questions that anyone has.

Not a question, but just an observation, Alex. It's really nice to see the same things that make gear set good for metadata DevOps be applied here when it comes to data archiving. So, oh, I've chosen grandchild object x, but I need to make sure that you also have parent object or this other related object or this other related object. Identifying that and building out the the tree of data is really helpful. And we see that every day with Gearset when it comes to metadata. It's really nice seeing the same thing, and the shape of data being backed up accordingly when it comes to data archiving.

Yeah. That's, that's something that's, like, well, vitally important because you want, well, it's just been raised as a question here of, like, archiving records with relations. So well, so you asked about an account with some contacts.

So let me see if I can find one in here.

So I had a, I had a policy that was based on the accounts, and so I had thirty accounts in here.

But along with the account, we also have some related contacts.

We also have some opportunities.

And then if we keep on going down the tree, we've got some opportunity line items, opportunity history, and, we can keep on going, like, further and further, like, working out the dependencies. We need to make sure that we've sort of got all of them, and they're all grouped together.

And, well, if we wanted to, restore all of these as well, we can set up a restoration where we can where we show how how well linked these are. So I've just selected a single account here, and I've got my related opportunities, opportunity line items, contacts.

So in a similar manner to the gear set metadata that, that you just mentioned, Sunny. So, yes, thanks for the thanks for noting that.

And, yeah, I guess, has, anyone got any other questions that they'd, they'd like to, to ask at all?

I guess, I will what else have we got in here that we could quickly show? So, I guess, a few other things that we do have is, so it did say about the, field permissions.

So we do have this as a a little report, and you can check these all the time, on your org to make make sure that they're all there.

Then there's also permissions for other members of your team, so you can have multiple users.

So you could have, like, a whole your whole gear set team have access or maybe different levels for different, different users within there.

Other things, so around, like, GDPR as well. So to complete if you needed to completely remove the records, so we can set a retention policy.

I've got it set up for seven years here, but we can specify that, in a bit more detail as well.

Yeah. Was there any further, things that people would like to know about the tool?

Okay.

Well, so we're coming up to the, the end of this session, and we've got a, well, a panel coming up talking about the future of DevOps in Salesforce.

So, well, we'll get, some more, views from, from Sunny in there, and you'll also be joined by, a few of our other DevOps leaders. So, Sammy and, Johnny as well, And along with, VP of product, Ben, who will be, giving some more views on the future of Salesforce Salesforce DevOps.

Compare & Deploy

CI/CD Pipelines

Backup & restore

Data storage optimization for Salesforce

Compare & Deploy

CI/CD Pipelines

Backup & restore

DevOps done right

Ebooks & whitepapers

Webinars

Blog

Podcast

DevOps report 2025 New!

DevOps training

Help center

DevOps assessment

Why choose Gearset

Customer stories

Integrations

Security & trust

Events

DevOps Leaders

Feedback forum

New from the blog

How to search Salesforce metadata using native tools, APIs, and Gearset

New from the blog

How Gearset delivers real value with AI

New from the blog

Salesforce Audit Trail and Field History Tracking complete setup guide

Data storage optimization for Salesforce

Description

Transcript

Contact us

Customer support