Catch Flow and Apex errors fast: A guide to proactive Salesforce monitoring

Share with


Description

Salesforce errors in Flows and Apex can quickly escalate, impacting your users and slowing down your team’s productivity. Don’t wait until it’s too late — catch errors proactively!

Join Kenny Vaughn, Development Team Lead at Gearset, in this practical webinar as he explores how Salesforce teams can implement robust, proactive monitoring strategies for Flows and Apex code. Discover how early error detection prevents minor issues from turning into major headaches. Kenny shares actionable insights, tools, and best practices to help you:

  • Understand common Flow and Apex errors and their root causes
  • Implement proactive monitoring techniques to catch errors early
  • Leverage Gearset’s intuitive error-tracking solutions to streamline debugging
  • Minimize downtime and maximize productivity for your team

Learn more:

Transcript

I'm Kenny, and it is a pleasure to be here. I'm one of the engineers in Gearset, and I've been leading the charge on building our Apex and Flow monitoring solution.

I've been an engineer for about eleven years, and I joined GearSat a bit over a year, year and a half ago.

During my eleven years in the industry, I've been acutely aware of, you know, what happens when something goes down in production.

I built something great. It fell over, and it produced my ego in the business. Didn't love it either. So I've spent my time here in GearSat just building solutions that make it a bit easier to monitor errors in Salesforce and help Salesforce teams incorporate observability into their processes.

So there's a few things I wanna talk about today. We're gonna talk about the problem with flow and Apex error monitoring. You may have heard the term observability. I wanna touch on what that is and why it's a really important part of the DevOps life cycle.

From there, I'm gonna talk to you about some of the techniques that you can use to solve the problem, ways to bring observability into your orgs.

And then I'd love to give you a demo of what my team within Gearset have built, but we're framing that around what we're trying to do to add observability to Salesforce for flows and Apex exception errors.

I do want to stress one thing, though. I am not a salesperson. I'm definitely not a spoke a public speaker. First ever webinar. I will be showing you what we've built in Gearset to help you understand observability. But if you don't take anything else away from this, I want you to know how important observability is for your org.

So let's get stuck in and start talking about flow and Apex errors. So you'll probably already know that flows handle logic that is critical to your business systems. Think something like generating an invoice. A flow error is when something happens in that flow in Salesforce that Salesforce wasn't expecting.

And that can be the result of, you know, a user error or a bit of input, and the user might see that error on the screen. The flow can stop, and then the user might not be able to continue doing their job.

It could also be a background process.

Flow runs. The user's not really interacting with it. It's happening sort of off system, and it silently fails.

In Apex, unhandled exceptions occur for very similar reasons to flows. Something unexpected has happened.

Those that aren't caught using what we call a try catch block, well, those will cause the Apex to fail. Or in some cases, we've seen that a try catch can just catch the error, log what's going on, and then not do anything with it. That's a pattern most commonly known as swallowing exceptions. It It can leave a system in an inconsistent state.

This is the fun bit. Even if you've done everything right, you may still get an uncatchable exception. It's those dreaded governor limits, and I think we've probably all seen those. And it is the same thing in flows. You can configure a fault path, and you can try to do everything right, but you still may get some emails, with errors in them.

So Salesforce will send you emails, and these emails are better than often. The error emails go to the last person who modified the flow, as an example, or you can configure them to go to maybe a central inbox or a a group of people.

You have to remember to check, dig through all the noise of sometimes hundreds of emails, and really triage.

And with all of that, you need to start thinking, what issue needs my attention right now? Which issue can wait? I will tell you that many of you on the call and many of the admins and devs that we've spoken to can probably say, well, we just know. It's a great institutional knowledge to have, but when someone else comes along and tries to solve the problem or know how to triage it and make well reasoned decisions, well, that's harder to do without information in front of you. I think the flow emails are pretty good. They give a really nice breakdown of what's happening in the flows, where things started to go wrong.

Apex emails, I'm not such a fan of. They don't have the same detail as the flow error emails.

For example, they might be missing things like who was the user that experienced it. The more complicated part is they're not sent in real time. We have seen them delayed by minutes or, in some cases, even hours.

You will get a stack trace. So that's the part of the code where the error has occurred, but also the parts that called that part of the code. And that's that's quite useful when you're starting to dig in deeper.

If you're really lucky, a user may have come along to you and said, hey. Something's gone wrong. I'm gonna report this issue. But in the case of some Apex, asynchronous stuff in particular where you're doing maybe scheduled or future, you're gonna have to pay really close attention to those emails because you might not know something went wrong at all.

For most teams, these error emails are where they start when they think about monitoring flows for, errors and Apex for errors.

So you get this email coming in. What's the first couple of questions you need to ask yourself? Well, what's really useful when we're thinking about observability and solving those problems is how many errors are on the same flow? How many users are impacted?

Are these the same error or are they new ones? Have I seen a spike or a reduction? What's changed and when? Finding that information from emails can be a bit manual and time consuming, and it sometimes involves quite a bit of effort to get the overview before you know where to start digging.

Observability is supposed to provide answers to these questions and more. That's because of observability tooling, whichever you choose, you should have more of more than just the visibility of the errors happening across your orgs. You should know more than, hey. Something's gone wrong. You should have a fuller context, and it should really help you diagnose what's the root cause. How do I fix this? How do I prevent it happening in the future?

It's observability practices that you really wanna incorporate into the way that you work day to day and the workflow that you do, and that'll help you take control of things like flow and Apex Errors. So I've mentioned and talked about observability.

Let's get a little bit deeper about what observability is.

Simply put, it's about observing and understanding the state of your system.

Are things running properly? Do we have errors? Is the system getting slower? And then are there issues cup coming up that maybe there's trends in? It's not just about logs and having alerts, but if we're gonna be really practical, that's the best starting point.

The concept of observability is a really standard practice on other development platforms. We've been talking about it for years, but it's not as widely implemented in Salesforce. And that can be a challenging problem to solve.

So imagine you have a critical production outage. It started about an hour ago, and you've just found out because the CFO has told you the entire sales team are blocked.

I wish I was making that up, but that's a real example somebody told us about.

You want to get things back to normal as soon as possible. Let people go back about their work. You're costing the business money, and it's costing people time. As well as making things a bit less stressful so it doesn't feel like things are always on fire for you and your teams.

When you have a full understanding of what the error is, what's changed, how it relates to other errors, then you can start diagnosing and fixing the issues faster.

Observability does another thing, though. It also helps you get recognized and your team get recognized for the hard work that you're probably already doing to improve the health of your org.

I mentioned the CFO earlier, raising an issue. If you've really good observability, you can be proactive about that, and you can be the one to send a message to the sales team saying, hey. We're actually aware that an issue is emerging. We're already working on a fix.

We'll keep you updated. They're not coming to you. You're coming to them. And that's really using the tools to go beyond that reactive firefighting and starting to think a bit more proactively.

How can my observability, the metrics and numbers that I'm seeing, how can they help me make my work healthier, more efficient, or even just anticipate problems before they impact users or critical business processes. Now it's useful getting an alert when something goes wrong, but, also, once you put out all those fires, what's next?

I'm sure none of us are ever idle and sitting around. But imagine a future where you can take a look at things and say, I see twenty new errors in the last two weeks for our flows. It's impacting thirty customers. Well, we should focus here. That's a really powerful thing to be able to say for you and your teams. That's a signal that you're really starting to mature and you're really starting to embrace the DevOps life cycle.

Let me give you a personal example of observability and practice. A little adjacent to to, sort of Salesforce, but very relevant for what we're about to talk about. Early in the project for error monitoring, we were in pilot with a new feature that was gonna start extracting useful information from emails.

Before, we didn't really need many resources or processes to process the thousands of emails that we're receiving.

The new feature was working great until it wasn't. But a really rare case when three huge emails came in and all at once, and this service just ran out of memory. It crashed, and then it restarted.

We weren't too worried because we knew we had redundancy in place, so so no data was gonna be lost. But you really don't want a feature just to crash every time a big piece of data comes in. Our alerting already let us knew that the the, the crash was about to happen. And we find out about it before a user got impacted in production.

So we had enough time to actually think about it, investigate it thoroughly, almost at our leisure. We rolled out a fix. We weren't panicked. Things weren't on fire.

Observability gave us notice that things were wrong and space to fix it before it became an issue, and that's what you want.

So let's talk a couple of numbers.

Seventy four percent of the teams that don't have observability tools most often learn about the issues from their end users or the CFO like we talked about earlier.

Teams with an observability solution are fifty percent more likely than other teams to catch bugs within a day and forty eight percent more likely to fix them within a day. Those are some great numbers to achieve.

You may have heard us talk about, and I mentioned earlier, the DevOps life cycle. I'll give you a little brief primer on that, but I don't wanna go into too much detail.

It's a way of looking at everything that goes into building and releasing software. It sort of visualizes that software delivery as as an affinity loop. It combine combines the sort of traditional software development, the dev part, and IT operations, the ops.

And projects move through it continuously and iteratively. And you really just look at it as a visual road map to think, where can I improve and refine processes to make all of what we do to deliver software better?

You'll see on my slide that Observe is highlighted here. It is such a key part of the DevOps life cycle, and often it's treated as an afterthought.

It shouldn't be.

When you're planning your software, before you've even written a flow or written any Apex, you should be asking yourself and your team should ask, when we have built this, how can we observe it? How do we know when it's working well, and how do we know when it's not working well? Now like I said, this session is not about the DevOps life cycle. I could probably talk about that topic for an hour, but I'll spare you that. But some people do ask, well, where should I start?

And the cool thing about the infinity symbol is there's no best place to start. A lot of it will depend on your needs as an organization, your team, your technical capability, and budgets.

And I know that's not really an answer, but I'm biased. So I will tell you that if you don't know where to start, observability truly is a good place.

There is more than one way to add observability to your orgs, and I wanna talk about some of the things that teams have already done.

And this is based off a lot of the customer research and a lot of the talks that we've had with real engineers, real admins, real developers working in Salesforce day to day.

The first place that many teams start is those error emails I was just talking about.

By default, those flow error emails go to the admin who last modified the flow.

You can change this default behavior in Salesforce and have the email score somewhere else. First tip, I recommend you make that change. Even if you move into other solutions, you probably should change that as well anyway.

Sending them onto the likes of distribution list, it means that, you know, you won't be beat if the admin is off on holiday that day or maybe they're busy and just not being able to keep an eye on their inbox.

So you've got these emails coming in. Set up some inbox rules. That can be kind kind of time consuming. You're not gonna be able to sort of catch every sort of variation of the emails, but it can help kick some of the noise out of your main inbox. There's no real way to curate or prioritize.

And with a hundred of emails, and we have seen people with hundreds a day, you might see that there's a lot of noise, and that risks them being ignored.

The other thing that you're gonna miss and and what I talked about from observability just earlier is you're not getting real insights. It's fine that you know something's going wrong, but you're not gonna see is there a spike, is there any way we can correlate these errors to recent deployments. And remember, those insights and that extra context is what your tools should be offering you, not just telling you things are going wrong.

We've seen teams use custom objects to log errors. And the basic setup, and I'm just run through it quickly is you create a custom object and give it a meaningful name, something like application log. And then you'll define a series of fields for it. So at the minimum, you'd have something like log level, log message, timestamp.

Then your Apex, classes, they can write to this object in your code. And in your flows, you can use the create record action to write into it as well.

Now you've got some logs, and they're being persisted.

Because they're being persisted on the Salesforce platform, you will have to set up some process to manage that because you're consuming Salesforce storage. If you use a lot of d o DML calls, you may also be consuming some of that. It is a bit and I'm using the word consuming a lot, but it's also time consuming to set all of this up.

There is a risk of performance overhead, though that can be mitigated. So if you're looking at this method, look at a way to do it asynchronously for that write.

But what you're now in the space of doing is you've written yourself a logging framework. You're gonna have to manage it and maintain it.

You could also use an existing framework like the open source tool Nebula Logger. There's others too.

Now I'm gonna be a bit of a fanboy here. I've seen what they've built, and I've seen it in action, and it's natively built in Salesforce. I think it's a pretty cool thing.

You can use it to capture Apex errors for flows sorry, Apex errors and Flow errors amongst other things. Now I'm gonna be a bit reductive, so don't criticize me for not getting it exactly detailed. But behind the scenes, they do use some custom objects in order to be able to log things a bit like the earlier technique that we talked about.

It does mean that you need to keep an eye on the storage since it is on the Salesforce platform.

It takes away a lot of the pain of writing and maintaining framework code, and they followed some really great practices.

Like I said, I'm a fan. To make the most of it, though, you will have to modify every piece of Apex code that you have and start using the logger class that they provide.

And you're gonna have to modify all of your flows and start using logging actions that they've made available. It is a project worth checking out if you have the technical skill and the capacity to implement it.

Once you've got all that data and all that logging in place, though, you're also then gonna have to start feeding it somewhere to give you those two key bits that we mentioned before, alerts and insights.

Services like Splunk, Datadog, and New Relic are just a a few that we can mention.

We think about Salesforce and some on on platform tools. And if you've been paying attention to the winter twenty five release, you may have noticed they have released their free tier for event monitoring. Now I'm gonna give you the the briefest of overviews, but it's one to investigate as well.

So it gives you access to a subset of event types that are really essential for for capturing information about, you know, running flows in Apex as it happens.

Specifically, you're gonna care about the event log file object, and you can query this and pull the information that you care about. You'll be looking for flow execution errors and Apex execution errors, those event types in particular.

Now the logs are generated by Salesforce.

They're retained, though, for only twenty four hours, often in a CSV format, and you will have to then take those files, process them, store it, and generate insights yourself. So there's a lot of work that still needs to be done even if you're not the one generating the actual logs themselves.

I would be remiss if I didn't mention some other third parties out there who are working on some interest in solutions. One example, you'll see it on the slide right there, is Farris dot ai. They've built a native Salesforce solution, and it's all on platform.

Again, and we've touched on it before, that does mean that you start thinking about storage costs that you're using within Salesforce. And we know that, you know, breaching those storage limits can be quite expensive.

You will then have to set up retention policies for your orgs to clean up older logs, and then you have to make that decision on do we archive things, do we delete things, like, where do we go from there? There's a lot of configurable options in Firestore AI, so so it takes a bit of time to get set up and working and especially working with our emails.

Now for those who've been paying attention, you'll probably notice that some of the things that I've talked about really just talk about logs. It's missing those additional bits of observability that we care about, and that is, you know, getting insights and getting more information.

I hope that's given you some ideas on where you can start with your observable observability journey.

I'm obviously biased, but I do think that the simplest and most comprehensive solution is what my team and I have built, which is using Gearset's error monitoring.

So I'm gonna take the chance to show you a bit of what we've built here in Gearset.

I hope that it serves as an example of what you can do to get started monitoring flow and Apex errors in your org. Don't get me wrong. I would love you to get really excited and buy our product. I'm very proud of what we've built. But if you don't and you only take away one thing from this session, I want you to walk away and think, I need to set up observability.

So before we launch straight into the demo, let me set the scene for a moment.

I'm a Salesforce admin. Last week, a user reported an issue with a flow. I have since fixed it and deployed that flow through Gearset.

Everyone with me so far?

Great. And I wanna keep an eye on things to make sure that I've actually resolved the issue. We've all put fixes out there and have not made anything better. They've only made it worse. But then I also wanna figure out, like, where am I gonna spend my energy next? So with that in mind, let's move over to gear set.

So it's a one click setup, to monitor your org. I've already done that, and I'm showing you a sample org that I have here that we've been monitoring a while just so we can get a sense of of what everything's gonna look like. So I'm gonna look at the chart first, but But what I'm gonna do is I'm gonna focus solely on the flows.

The chart tells us something interesting. Already, you can see there's maybe a story being told here. You'll see that somewhere between sort of May twenty third and May the twenty fourth, there is a huge spike in in flow errors. That's, as you can see, the the blue line here on the graph.

On the graph, we've also got gray lines representing when deployments happened. It is clear at a glance that something went wrong. So in the real world, what would have happened after I set this up would have been my Slack alerts would have been firing, and we would have caught the issue and probably done a deployment later that day. And you can see that on the graph with the second gray line.

So you can see that that second gray line is when we did a release, and we tried to fix the error. And you can see, thankfully, that things are starting to calm down again. All those spikes of errors are starting to taper off. Let me just add Apex in one more time.

So the purple spike, that's the Apex errors. And it looks like, thankfully, we fixed those while we were in there. But you are seeing maybe there's correlation between when Apex happened, when, flow errors happened, and when we did deployments. And that's really useful to know. It may not always point you to exactly what's going wrong, but it's gonna give you that extra context and that extra insight to try and figure out what's going on. So I'm gonna just go back to the flows only view again.

Great. We successfully solved the problems. But, again, if you keep following the chart all the way to the right, all the way to present day, we still have some work to do. I've realized that some things are causing us trouble. We fixed most of it. So now where do we spend our energy next?

And and this talks more about the insights that you know.

I know that my org has various screen flows that have validation rules. Users will see it. They'll be able to correct, and they'll be able to move on with their day. I unfortunately get email notifications about those.

I don't care about it. The user self corrected, and they can move on. So I wanna spend my time on things that are really causing trouble. We do have ignore functionality.

So we can move it into it's a screen it's a screen validation error or it's a random time out. We are not gonna do anything about it. Stop alerting us. So we move it into ignore.

But I've not had time to do it for all of them. So I've sorted things by users descending, and I can take a look here. And the table that we have along the bottom, for my active errors that I'm looking at, we can see that generated the individual invoice flow has a little bit of information when we first seen it within this seven day period, if we keep our eyes up to the top right here. Then last seen within that period, how many users were affected?

Thirteen. And an error count of seven in that time period. Well, I think that's probably something I should dig into. If we have thirteen users being impacted this in a seven day period, well, that could be damaging, you know, critical business processes.

I'm gonna expand that table further, and we can dig down. And I'm not gonna labor the point of of what everything's showing you, but this is then a further break breakdown of, well, how many of those errors occurred for which element in the flow and what type of error. Let's just look at one of them as an example.

This is probably a familiar view for anyone who's seen it. It's essentially the flow exception email that you would normally get in your inbox. And as I said, often a place where many people start. We can take a look through this, and it'll give you all of the information that you would normally get, all that rich information about when the interview started and what happened and and who was involved.

Remember I mentioned before about additional context? I think this is quite cool.

You can go and open your flow in Salesforce, or you can open it here in Gearset in what we call our flow navigator.

Again, you're looking for your observability tools to give you more context. So, you know, should I spend my time here? No. Anybody who's seen a flow before, you know I'm caught out. This is purely just a, you know, example flow for the sake of a demo. But it gives you an idea of what the erroring element was.

Whenever I've written Flows or Apex, two weeks later, I can't remember what any of them look like. So being able to do that in your tooling or at least your tooling direct you to it, that's giving you additional context.

I wanna talk about one more thing just before we move on. I mentioned notifications and, you know, alerts coming in.

You could set up alerts for absolutely everything that comes in, but then you're just recreating the noise of email inboxes. I don't think that makes sense.

What you really want to do is you want to start curating a list of useful errors that you're gonna do something about.

So and, again, I'd love to talk to you about all the features that we've built, but I really want you to take the idea of them away more than just what we're doing here.

Essentially, you can add a notification rule within Gearset, and we'll capture whatever kind of conditions that you're putting in place to say, hey. This is important. I want to know about it. These notifications will get piped to where we do our work the most. For us in Gearset, it's Slack, So we'll get it into Slack channel, but we support MS Teams as well. But I don't wanna know every time this goes wrong. I really wanna talk about thresholds.

So we get an occasional read time out. That's fine. If we get ten of them in one hour, for example, that's what I need to know. That's what I need to know that this isn't our normal behavior.

Something anomalous is happening. We need to do something about it. So I can save that notification rule, and that'll let me know what's going on. Just before we move on, one small point about notification rules.

It's great to set alerts, and it's great to get all that information right in front of everybody.

But I have one tip for you. When a notification comes in, ask yourself, what am I gonna do about it?

If you're not gonna do anything about that notification either now or at some point in the near future, do you really need it for now? And that's one thing that's very hard for us to do. My team and I, when we're first building a piece of, you know, software or or a project, we would turn on notifications for everything. And it does it's a lot of noise, and it can kinda be overwhelming at first.

But we will actively, as part of our process, look through them and say, right, day by day, is this serving us? Do we get something useful from it? Are we going to action it in any way? Often, the answer is we would love to, but we don't have time.

So it's not breaking anything. It's just annoying. Don't need an alert for it. We really want to get to those most specific business cases for the most critical business processes first, and then we can be, you know, talking about where we're gonna spend our energy afterwards.

Speaking of where you put your energy, I just wanna show you one last thing. And, again, take these concepts away with you. Even if it's not the tool for you, the tool that you're looking for should be giving you an opportunity to be proactive about where you spend your energy.

I've had the pleasure of speaking to leads, devs, and admins. And, honestly, I'm so impressed by how many people know the most problematic areas of their systems, be it flows, Apex, or some other system.

What I do find though is they don't really have the data to back it up. It's like, hey. I've got a feature that I think we shouldn't work on because some of our flows are problematic.

That's not quite as good as, hey. We've seen an increase in thirty percent of our flows. Two of them in the last seven days in particular, we think it was a change in the most recent release. It's impacting thirty three users. We're gonna pause feature work and get this fixed. Having those numbers is a really powerful story to tell. So let's take a very brief look at the dashboard.

You'll see that we're comparing this seven days with the previous seven days. And at a glance, I can start deciding how am I gonna spend my energy? Where am I going with this? We can look at the total errors and we'll see it's decreased sixteen percent.

The flow errors, seven percent. Total users impacted, up by two. Don't love that one. And Apex errors, we're sitting at twenty four percent down.

We can take a look at the most common error types. We can take a look at, for example, the flows with the most errors. I can keep scrolling and you get the same idea for Apex, some charts to go along with it. Maybe my leadership's given me, you know, time to go ahead with some things and fix it.

That's great. But, really, it's those insights that matter.

So remember that the goal is to be able to understand the overall health of your system or parts of your system, Know when they're going well, but also know when things are failing. And then have the data to back up your intuition, which you probably already have, about your org and your org's health. And then when it comes to dealing with those errors, we know getting to the root cause is really hard. So pull information and extra context to help you start your investigation, that triage step.

Listen. We know debugging is debugging, and we've all been there and it's tricky. And when a critical incident's going on and everything feels like it's on fire, users have reported the issue, your CFO's on board, your manager's on board, it's really hard to focus and know what you're working on. A good observability tool should let you start broad and start to narrow your focus, adding in those additional context clues. And then you can be like, yeah. That's where we need to start debugging or that's where our problem is or this is the release where things went wrong. Let's start there.

So it was a brief world winter of observability, Apex, and error monitoring. And I hope I've helped you understand some of the challenges with it, given you a couple of ideas on what to do about solving those problems, helped you understand observability and how it fits into DevOps life cycle and why it's so important. And, you know, I've shown you a little bit of Gear Sets error monitoring. I'm gonna do a shameless plug here. We do have a fourteen day fourteen day free trial if you wanna hop on to that URL. But, honestly, if you're starting your observability due journey today, just do something, any alert, any error, and you'll find that your team will be able to grow with that and mature there too. Thank you very much for coming.