How to design your Salesforce disaster recovery plan

Share with


Description

Are you prepared for a data loss incident in your Salesforce environment? Join Richard Jones, Software Engineer at Gearset, as he walks you through the essential steps to design an effective Salesforce disaster recovery plan. In this webinar, you’ll learn about:

  • Common causes of data loss and how to mitigate them
  • Best practices for backing up and restoring Salesforce data and metadata
  • Key metrics like RPO and RTO to measure disruption
  • The three-stage restore process to ensure a smooth recovery
  • How Gearset’s backup solution can enhance your disaster recovery strategy

Learn more:

Relevant videos:

Transcript

Okay. Hello, everyone, and welcome to this Garett webinar, designing your disaster recovery plan. So over the next twenty to twenty five minutes or so, I'll be walking you through the steps that should be taken, when a data or metadata loss incident occurs in your Salesforce environment.

So just as a brief introduction, my name is Richard, and I've been a software engineer for just over two and a half years now at Gearset.

Throughout that entire time, I've had the privilege to have worked on a Gearset's backup solution.

And I spend the majority of my day building features that allow users to back up and restore their Salesforce data and metadata in the best way possible.

As well as this, I also spend a lot of time on calls with users who are experiencing data disasters to help them figure out what went wrong in their Salesforce orgs and what they need to do to get them back to normality.

Just worth noting as well, in the chat, my colleague, Alex, will be answering any questions you might have. He's also a a software engineer here at Gearset. So if you want to ask anything as we go, please type your questions in the chat. We'll do our best to answer you. Okay? So let's get started.

So what can cause a data loss?

The pictures are seen. It's a normal working day. Everything is going pretty well until one of your sales reps comes to you as the Salesforce expert, panicking because their contacts and opportunities have been deleted from their key accounts.

Right now, what would the impact be for you and your org?

How would you recover your data? Where would you start? How long would it take?

What would happen to the rest of the business whilst you were doing that? In this case, maybe you didn't lose or corrupt the data, but but now it's your responsibility to fix it.

When you come across an incident in your Salesforce org, your number one priority is to get the information back in as little time and with as little disruption to business as usual.

When faced with this kind of scenario, it's easy to slip into a panic mode and not always make the best decisions.

It takes time to fully analyze an instant. Establishing the root cause can be difficult, and there can be subtle nuances to the mode of data corruption.

It's vital to build an accurate picture of the nature and scale of the incident as well as the specifics of the root cause.

So a couple of examples of this might be Salesforce outages.

Any system, even the world class CRM like Salesforce, suffers outages.

Scripts are released in error. Errors are released in scripts.

And web applications using cloud based software still ultimately run on hardware. So Salesforce is not exempt from the problems of securing physical infrastructure.

Some people in the audience may be unfortunate enough to remember an issue that came up in May of twenty nineteen that was cruelly nicknamed the Permageddon.

In it, Salesforce released an update that impacted all user profiles in all orgs with the Pardo product integrated, which gave all users access to view and modify all data.

An issue that didn't just lead to corrupted security models, but corrupted data from users editing records that they shouldn't have been allowed to. So all in all, not an ideal situation, and it's a great example of how issues with the platform behind your data can have potentially drastic effects.

There's also human error, so honest mistakes do happen. And when you have a sizable company with a few Salesforce instances, it's nearly impossible to keep track of everything.

Accidental deletions can cause issues ranging from a small inconvenience to a major disaster that can take weeks to recover from.

There's also malicious deletions. So, sadly, data is sometimes deliberately deleted, whether this is from a hack, from somebody outside of the organization looking to cause mayhem, or a disgruntled employee on the inside who might be upset because they had to be let go.

Both these situations can happen in any organization, and it's much easier to have the data secured already. So they're scrambling around trying to salvage it afterwards.

There's also misapplied Salesforce integrations.

So there are various Salesforce integrations designed to alter or move data.

These are powerful tools that can have devastating consequences when mistakes are made. And what if somebody introduced a programmatic bug? You might not notice for a few days or weeks, and a fix needs to be implemented, tested, and then released before the data can be restored.

Otherwise, you could just corrupt it all over again.

I've personally been on calls with many users who have been caught out by not fully understanding the problem that they're trying to solve, leaving them in a much worse situation than when they started. Sadly, sometimes even permanently.

So without a deep understanding, it's easy for your recovery efforts to actually end up making things worse rather than better.

However, you can plan for likely scenarios.

A great starting point is a previous incident you've experienced, learning from previous pain and scenarios to plan for the future.

But brainstorming potential scenarios in advance is where everyone should really get to.

With that being said, planning these scenarios is not just something you do at the start of your data backup journey. It should be an ever changing process, but we'll talk more about this later.

Having a really well grounded shared understanding of your org's data model will also be a valuable tool. The longer that your org is affected by an incident, the greater the disruption to the business.

But how do you measure disruption?

Measuring disruption during a data incident requires a multifaceted approach as it affects all areas of the company to varying degrees.

Naturally, you could calculate this as a dollar value for your business, but there are also less tangible impacts, like damage to reputation, people working with plausible but incorrect data, or even potential fines.

A good set of objects objectives that you can aim for to reduce disruption during data instance, recovery point objective or RPO, and the recovery time objective, RTO.

The RPO is the amount of data that can possibly be lost between a data loss incident occurring and your most recent backup finishing, and RTO is the amount of data lost from the time the incident occurs to being discovered to being fully restored.

This is all shown in this lovely clock illustration here. You can see that if a backup finishes at nine o'clock and an incident occurs at midday, your RPO will be three hours of data lost. This this incident occurred at midday. It but we are only made aware of it at two PM.

Say it took another two hours to fully restore the org. Our RTO would be four hours total.

By reducing both your RPO and RTO, you can ensure a more timely restore process, reducing the amount of disruption these incidents can cause for your company.

By analyzing what matters most to your business and what data you have date readily available, you should be able to build a method of disruption measuring that works best for your organization. However, once you have a good set of measurements for disruption in place, you can set out on reducing it as much as possible.

In this talk, we'll talk you through the best practice guide to backing up and restoring your Salesforce data and metadata quickly and safely. And we'll also make the case for why you should treat this as an integral part of your DevOps strategy.

So how common is data loss?

Salesforce data and metadata can be lost or corrupted at any time due to things like human error, malicious attacks, or integration bugs.

With tools like Salesforce's data loader, it's really easy to masterly or update records that you didn't mean to. And a simple mistake in your source file or field mapping could spell disaster for your data.

It's not just teams who are new to the DevOps way of thinking their experience is either.

Teams who are advanced in their DevOps practices also experience data and metadata loss.

It's not really a question of if, but when it will happen to you.

Each year, we conduct the state of Salesforce DevOps survey.

This year, we had just over twelve hundred Salesforce professionals take part.

Salesforce backups are crucial, so we asked whether they had experienced data and metadata loss in the last year.

Does anybody want to have a guess on what percentage said yes?

So sixty seven percent of respondents said their business had suffered metadata or data loss in twenty twenty two.

There are a lot of misconceptions among Salesforce professionals about the safety of their metadata and data. Many have no backups in place because they believe Salesforce has secured their data in the cloud and will be able to restore it in case of emergency.

Another issue is that many Salesforce professionals also don't think backups are their responsibility.

Who in your organization is responsible for backups?

Do you know?

Is it you?

But as the people who look after the company's Salesforce data and metadata, you are the experts in this area, and the business could look to you to sort out an instant.

And it's not enough to only protect your data. It's much harder to restore data if the metadata describing the structure of your org has been lost or corrupted, so you'll need to ensure your metadata is backed up too.

Even more worryingly, only six percent of respondents were actually aware whether or not they'd suffered a metadata or data loss incident in twenty twenty two. This is a common issue that we see from users with and without backup solutions.

Having the strongest restore strategy doesn't matter if you aren't able to spot when you need to use it, and it's easy to overlook issues that aren't immediately apparent to your everyday Salesforce users.

It's always good to remember that even the smallest data breach can become the biggest thorn on your side if it is able to exist in your Salesforce org for long enough. So having tools that enable this transparency in your org is absolute, absolutely vital.

So ways to backup your orgs. There are two main ways to backup your data, and the type you choose will have an effect on how you implement the three stage process we'll talk about in a few moments.

One way is to use manual backups, like a data export, and the other is to use a third party backup solution, like Gearset, Phone Backup, Spanning, Grax, Cloud Ally, or any of the other third party offerings out there.

So let's start with using Salesforce data export tool.

So using this tool, data backups can be scheduled to run at regular intervals, either weekly or monthly.

For professional and developer additional orgs, exports can only be monthly.

Exports can be run manually on demand, but not more often than the limit for scheduled exports. So it's not possible to export data more than once a week. Take a second to consider how much the data in your org changes in a single week or how much it changes within a single day.

What would the impact to your business be if you reset the org to how it looks six days ago? For example, if you have a data integration bug introduced to your production org eight eight days ago, you could be so close to a perfect data snapshot that happened seven days ago.

Unfortunately, because your backups are weekly, you'd have to settle for an older snapshot from fourteen days ago, leaving you to chase up the lost data from those six unaccounted days.

So in short, the more granular your backups are, the more you can reduce your RPO, and therefore raise the accuracy of your of your recovery process.

These exports only generate c c s v files for your data, so no metadata is downloaded.

Any objects, fields, and relationships that have been lost would need to be rebuilt from scratch, unless they have been separately backed up and successfully restored first.

This is a time consuming and error prone process, And while you may be very happy to perform this manual metadata change normally, believe me when I say that you don't want to be jumping through any extra hoops while you're negotiating a data loss instant, this manual approach is also bound to be much slower, increasing your RTO considerably.

Furthermore, by using the data export tool, you're accepting storage responsibility for all of your orgs' backups. Keeping these in a centralized, secure, and easy to access location is no easy feat and will often require a larger team to keep the backups in an accessible state.

Luckily, there's also the third party solution.

So using a third party solution to back up and restore data and metadata is the safest option.

These are faster, more efficient, and will give you more insight into what data or metadata you've lost.

While the upfront cost puts some teams off, it only takes one data disaster to get a return on the investment.

When choosing a backup solution, it's best to have one that will alert you to specific data loss or corruption straight away and via your preferred method.

Maybe this is a Slack or Teams message or a text about changes to specific critical objects.

The first step in data recovery is noticing the data needs recovering.

The quicker you identify a loss, the earlier you can assess the damage, build an effective plan, and reduce the risk to your business.

By utilizing features like this, you can avoid being one of the six percent who are unaware of the data instant instance that could be happening around them.

As well as helping you understand how your data changes, third party tools are also capable of helping you visualize how your data relates to itself.

Salesforce orgs can get convoluted and the links between objects with it. Some that you will know about, but also some that you don't, especially when you enter the wonderful world of managed packages.

So having a tool that can help you see how you need to restore your data to preserve these relationships is vital.

Again, no one wants to be figuring out how the data is structured from CSV columns when a data incident is underway.

On top of this, you will want your backup solution to run on both at least a daily basis to ensure the most recent possible snapshot, and even possibly an hourly cadence for a select set of really critical objects that have a higher rate of change.

The more granularity that you have over the changes to your data, the better you can pinpoint where things have happened and what state you'd like to come back to.

It's also worth ensuring that your solution has the best in class deployment engines for both data and metadata restoration.

Oftentimes, metadata restoration can be underestimated in terms of its importance in the process, but this is really important to remember.

Your backups are only as good as your ability to restore from them.

So while backups are your first line of defense when recovering from data incidents, that's just one part of the backup and recovery story.

As we will never tire of saying, backups are only as good as your ability to restore from them.

So according to our state of Salesforce DevOps report, thirty four percent of respondents are able to restore in a day or less. This year's data shows a slight improvement compared to last year's results as more teams are starting to implement backup solutions.

However, recovery time is also a crucial metric for teams. When trying to safeguard Salesforce environments, teams want to avoid allowing bugs and errors to linger for extended periods.

Rollback is the vital functionality that teams need to recover quickly. Once an issue has been spotted, a rollback enables teams to get rid of the problem and restore service as quickly as possible.

Some teams take several weeks to fix their last bug or error, and some teams who took part in the survey didn't want to disclose how long it took them.

It isn't enough to just have a backup solution in place, but this is a great start. You and your team need to be confident in the recovery process for different scenarios and in your ability to restore the data.

This is where the three stage restore process comes in. So first, you start by assessing your damage, then you plan the restore sequence, and finally, you go ahead and restore the data.

So let's look into these three steps in more detail.

So assessing.

Slipping into a panic and blindly restoring data is easy to do when you're under pressure.

It may also be tempting to ignore best practices to save time. For example, deploying metadata straight into production, bypassing your release process.

Restoring the wrong data will leave you worse off than when you started. Rather than having to unpick one data incident, you now have to unpick two.

We don't tend to make the best decisions when we're stressed. So once an incident has been spotted, stop everything and just breathe.

Taking the time to assess the damage now will save you time and headaches in the long run.

What caused the incident?

Has it been identified, contained, and eliminated to stop the risk of more data being corrupted or lost?

What's the extent of the damage? What has happened? Check the latest backups. Were they successful?

Which backup are you planning to restore from?

It's also crucial to be communicating to the rest of your colleagues that something doesn't seem right. Can be as simple as telling you then that something is wrong and that you're investigating it.

So moving on to planning.

Now you know what's been lost, you can plan an effective restore process. It might help to think of restoring as just running a deployment with your backups as the source and your org as the target.

It's always best to restore metadata first and then data.

For a simple example, if a field has changed, you may not be able to restore the data without it being present in the correct form in the target.

If there's a huge amount of metadata corrupted, it's advised to restore the metadata in this order. So starting with the data tier, which is typically the core components that set the data structure of the org, like your custom objects, fields, and custom apps.

Then programmability, store the custom code that you built on top of the platform. This is your Apex, classes, components, tests, and triggers.

Then the presentation layer. So with the code in place, you can begin restoring your modifications to how your end users interact with the data platform, such as Visualforce, lightning pages, components, and layouts.

Then permissions and security.

So it's time to add in the security model to ensure everyone has the correct access.

This includes field level security, profiles, mission sets, security settings, roles, and sharing rules, and finally, anything else.

And for data, the way you restore will be different depending on whether you have a manual backup or a third party backup solution in place.

So data is slightly different to metadata.

But it's also really important to plan your data restore, which can differ enormously from situation to situation.

So it's best to take a step back and think what you actually want to achieve.

What are the root objects that you want to have in your org by the end of it? How will you verify that the data has been fully restored? Are there any related objects that need to be restored too? Will their relationship be preserved on the base objects?

Have there been any live changes happening on the data that you don't want to lose? How will you restore the necessary data without overwriting anything you need to preserve?

It's always very tempting to try to try and build a one size fits all approach for restoring data that you can that you can use in any restore scenario.

However, every scenario how has its own nuances, and that must be addressed.

After working on the tools for restoring Salesforce data for some time now, I can honestly say that approaching a restore with a clear goal in your head is the best way to ensure a successful and timely recovery for a data loss incident.

Of course, your time to restore can be dramatically improved by having an all in one tool that can back up both data and metadata with knowledge about how the two link together.

So, finally, moving on to restore.

We said earlier that a restore is like a deployment, so you should follow the DevOps principles.

You wouldn't deploy metadata straight to production without testing. The same applies to restoring metadata and data.

So once you know what you're restoring, and in what order, make sure that you restore it to a freshly cloned full sandbox.

Check everything as it should be, and then restore to production.

Then check everything again just to be sure.

Yes. It takes a bit of time, but don't think of this process as slowing down your recovery time. By taking the time to make sure your metadata and data has been restored correctly, you avoid mistakes and reduce the risk of more damage to your work. It's quicker to take your time here than rush it and have to spend more time fixing things up later.

Also, practicing your recovery process might be something you think is worth skipping over.

But when time is of the essence in a disaster situation, you don't want your team to have to grapple with a tool that they're not familiar with or don't have access to.

We suggest testing once a year or when there are any significant changes to the team so that everyone is familiar with the process.

It's important to remember that testing a backup strategy is a lot like a fire drill. If you only test it when you first install it, a lot could go wrong. Doors and access points could have changed, and employees with important roles in the drill may have left. The same can be said about a Salesforce org.

By practicing regularly, you'll be able to iron out the kinks in your restore process. This will not only increase your chances of success, but also reduce the tension of the entire process If people know where they need to be at what time and what they need to be doing, there'll be much less panic panic floating around the company.

When this kind of disaster strikes, teams who work with a mature DevOps setup, like having a backup solution integrated with their release process, and strategizing or planning for potential instance and data scenarios, are going to come out the winners when it comes to getting things back in order quickly.

So when you leave here today and begin thinking about implementing a backup solution or tidying up your current restore process, think about how it'll work for you and your team. Do you want to be in a stressed team, nervously scrambling to salvage what's left of your data and metadata, or a calm team that can clearly and methodically restore their data, their all back to its exact, original, happy state?

Here at Gearset, we have a backup solution that can help you with your whole backup and restore process. You can run daily backup jobs, see a full history of every backup you've ever done, and restore quickly and reliably.

You can set up smart alerts to tell you if there there's a worrying change in your org and keep you happy and relaxed knowing that your data and metadata is in safe hands.

You can get a free trial of our backup solution by scanning this QR code.

Okay.

So, thank you for taking the time out of your day to attend this webinar. If you want to do some further reading, then the resources page on our website is a great place to start. You can download the backup ebook or the state of Salesforce DevOps report from there, both of which are super rich sources of information for all things backup and DevOps related.

So, hopefully, now you'll be leaving this webinar with the knowledge and confidence to start working on your own disaster recovery plan for you and your team. Have a great rest of your day.

Thank you very much.