Error handling and notifications are a basic requirement piece of technology. In #Flow, error handling is currently still in Kindergarden. I'll explain you why and what we can do to bring more transparency to the user and notify the right person if things go wrong.
This doesn't mean it wouldn't exist or not do what it should do! Just you need to implement it by yourself and oob there are different sources of information to get an global picture of all #Flows doing their thing or debug an error.
We often see users struggeling with trying to debug JSON response messages from actions, then bugging IT with the issue and operational efficiency gain trough "modern" is lost.
Basically there are two types/sources of errors.:
Flow run errors (Inner loop)
These are the errors/timeouts/ cancelactions which happen within a certain action in a running #Flow instance. I call them "inner loop", because the error is happening within the #Flow.
An example for that could be, that you have an action, which calls a webservice, but the service is not available. The action will retry according to your retry policy and then go into an error.
As a consequence of that, your #Flow will most probably fail, or you implement some logic to catch and handle this, so your process would be more stable in general.
The issue with Flow run errors is, that these are technical errors and not actually providing process insights. See the example screenshot with an error, we get to read JSON response data.
This is not meant to be for users then. The visual way to browse trought the flow activities and see their completion state with input and output is nice, but too hard for users too, as soon as the process has a bit more logic and not just two activities. Process error messages are sent to owners only, so users don't get a feedback, from #Flow running outside of #SharePoint back to their workplace.
We see that the Flow approval center is used by #Flow owners for try to find out if smth went wong, but mostly also don't get to the goal.
Some error messages like "Bad Gateway" are difficult to understand even for Dev's sometime because the meaning is very wide. I know that we will see massive optimization in this area somewhen soon. They spoke about using #MachineLearning to kind of self heal #Flows and offer auto fixes to the creator. In classic #SharePoint there wasn't a central, cross site collection task rollup but search and needed customization too.
In relation with the 30 days max #Flow run duration, there are also limitations on how long your Flow run data is kept. So, your approval history might be deleted, before you actually need it.
Flow analytics (Outer loop)
So, this is my approach to provide more transparency to the user in case of any unforseen circumstances.
I call them "inner loop", because the error is happening within the #Flow.
Goal
While #Microsoft is reinventing the wheel, we need to have a solution, which bubbles up the error to the user in a more meaningful way. Error resolution time and ping-pong between biz and IT has to be minimized.
Instead of confronting the user somewhere deep down in the darkness of nested actions with a "Server error 500 and some #JSON godness for breakfast", we need to tell more concretly what went wrong and offer action steps. For example: "Couldn't set #SharePoint approval state, because document is checked out by Demo User 1. Would you like to override the check-out with the risk, that some content gets lost or would you like to notify the person?" would be a clear and meaningful message, which the user event might be able to resolve by himself an no IT involvment.
If something goes wrong, the process initiator and owners should be notified via e-mail so they can make sure, that the processes are floating.
The #Flow should be as robust as possible and make use of custom retry policies to overcome some of the known limitations. Especially file locking issues in #SharePoint. The goal is to make the donut green and more happy.
Approach
A aproach to error handling in #Flow ist to make use of scopes and define their run after condition:
Create a first/action scope to wrap your actions and with that define your error scope and granularity of error notification. Basically every action could have a separate error handling condition, but obviously this doesnt make any sense and you wouldn't to that in code eigther.
Create a second scope to put your error handling activities like logging to a #SharePoint list or any repository or send e-mail notifications
Configure the "run after condition" on the second/error handling scope to run only if the first block has an error
This simply catches the error and you can notify the process initiator and owner, so they can inspect the issue in the #Flow run history.
Known Limitations
So, this doesn't give you a error notification, if the #Flow is canceled by the system somehow, but for that, the #Flow owner get's a admin notification. This can be detected via #FlowAnalytics and I will implement this soon, but for now it's not part of the story.
Learnings
When designing a #Flow, we should always plan error handling in advance. Think of what's really an error or you and handle it accordingly. It can be that some error circumstance is expected and should't kill your total process. Think which roles should be responsible for error tracking, so processes can flow fluently and without issues and long delays. There will be errors, so be prepared.
Microsoft Flow Analytics
When talking about error handling, #FlowAnalytics can provide you with debugging data, so have a look at it.
#Microsoft delivered a first version of #FlowAnalytics, which brought some nice #PowerBI dashboards and data collection across the service,so we can get more insights. This data is more intended for admins, than #PowerUsers and can be very useful, when exported to Excel or used via #API to trigger some activities, like for example to send a notification on error.
Flow Studio by John Liu
On the masters private initiative and I think in his spare time, #JohnLiu, is developing #FlowStudio. A community solution, which aggregates all #Flow run data, provides a more adavanced cockpit and helps to manage and export data much easier for further analysis.
Visit his page: http://johnliu.net/flow-studio/
Key features
Additional Navigation, Visibility
Additional Power, Convenience for Makers
Additional Bird's Eye View, Ease of Mind for Admins
Comments