šµ Every step you take, every call you make - Restateās fresh take on distributed apps observability
Posted February 19, 2025 by Giselle van Dongen, Nik Nasr, and Igal Shilman ‐ 6 min read
Applications have become more and more distributed over time: logic is split across services and intertwined with other components like message queues, K/V stores, and workflow orchestrators. This had some clear benefits, but also some costs. One of them is that it became much harder to keep an overview of the status of your application across all its components. The sheer number of companies in the observability space is a testament to how challenging this has become in modern infrastructure.
Restate lets you write mission-critical applications, like payment workflows, user management, and AI agents, without worrying about resiliency. It does so by integrating an SDK that wraps operations that might fail and captures your application’s activity - RPC calls, timers, promises, API calls ā into a single, reliable log. The log is used to drive retries and recovery of failed executions. While traditional observability tools stop at the request boundary, Restate takes it a step further into the handlerās execution itself. This means that Restate knows a lot about your application and its distributed status.
With the Restate 1.2 release, we are adding a powerful graphical UI to Restate that makes this information accessible and lets you manage, configure, understand, and debug applications. You can answer questions like “at what line of code is my handler blocked?ā, āwhere in the call chain did my handler get stuck?”, “what step is failing with what error?”, “which code path did my handler follow?”, āwhat other handlers are experiencing the same issues?ā, and āare there still ongoing executions on this deployment or can I remove it safely?ā.
View how requests move through your application
Figuring out what is going wrong and where, is not an easy task in distributed applications: whether an invocation is failing, its error message, on which deployment it is running, the last successful action it did, how many times it got retried, the call chain from service to service, etc.
The Restate UI now gives you access to all that information in a eye-pleasing overview. This is a treasure of information to understand what is going on. Afterwards, you can either fix the issue or cancel the invocation via the UI and let it roll back.
Finding the culprit
Sometimes you notice an issue but you donāt really know which invocation to look at. Maybe you scheduled an invocation but it remains pending, or a customer asked you why he still didnāt receive a confirmation email. For this kind of scenarios, the UI lets you query and filter invocations. For example, search for the invocation that is blocking a certain Virtual Object or Workflow, or list all invocations that havenāt made progress in the last hour.
Letās have a look at how we can figure out why the user signup process of userid2
hasnāt completed yet. Once we filtered out the relevant invocation, we notice that we are still waiting for the user to click an email confirmation link:
An aid for versioning
Versioning is one of the hardest bits of operating applications, and this especially counts for Durable Execution Engines like Restate, because you need to make sure that the log does not become out of tune with the code execution path when you update your code. Restateās approach to this is written down in a series of blog posts, if you are interested in the details.
The UI can actually help you out significantly here:
- When you deploy a new version, you can register it via the UI.
- You get an overview of the different versions that exist for each service, and the deployments on which they run.
- You can filter the ongoing invocations by deployment ID, to see whether an old deployment has drained, and can be safely removed.
- You can filter invocations that are scheduled to execute in the future. These will execute on the latest available deployment when their execution gets triggered, so you will know that you need to make sure your new version can handle these.
- You can delete deployments from within the UI, once drained.
Manage services from the comfort of a UI
We aim to make the UI your single source to manage your Restate applications. To get closer to this goal, you can also manage and configure your services from within the UI. A lot easier than figuring out the right curl request to our Admin API!
Restate services have some configuration settings to influence their timeouts, retention periods, etc. These can now be set via the UI:
A smooth experience from the first call
But the UI isnāt only useful further along in the development process. It also smoothens out some of the more tedious getting-started bits.
Until now, to register and invoke a Restate service, you had to craft a curl request and make sure the request body followed the right schema. Pretty annoying. With the UI, you can now easily register services and inspect their schema:
To make the experience even better, you can also invoke services from the UI playground. Send a curl request with the right schema straight from the UI or copy over the code template of your preferred language to invoke them programmatically:
SQL over your distributed app status
By now, you may be wondering what’s powering all of this. Restate Servers expose all persisted events and their relationships as SQL tables: state, journals, deployments, etc. This also includes non-persisted state such as failures, error causes, and retry counts. We have leveraged the excellent Apache DataFusion library, and integrated it with our distributed runtime to power this introspection API, giving you SQL over the distributed status of your application!āØ
The information rendered in the UI is powered simply by SQL queries behind the scenes. For example, to display the invocations that hadn’t had a step transition for a while, the UI issues the following query:
SELECT * FROM sys_invocation WHERE modified_at < '2025-02-18T12:00:00.000Z' AND status != 'completed'
data:image/s3,"s3://crabby-images/8d387/8d387fdaa3055b83f0884868c153ccf029a32cb5" alt="Restate UI query"
We will share more about our journey of implementing real-time observability with DataFusion in another blog post, so stay tuned!
Try it out yourself!
We have been looking forward to having a UI ever since we started Restate so we are excited to see this first version out there! The data stored in Restate is super powerful for understanding the behavior of your application and this is only the beginning. We will be improving and extending the UI further in the upcoming releases! š„
The UI is bundled together with the Restate Server. You will find the UI on port 9070 (http://localhost:9070). Follow the Restate quickstart and take it for a spin.
Star the GitHub project, and join the community on Discord or Slack. Be the first to know about new releases and technical posts āĀ like the upcoming deep-dive on Restateās architecture āĀ by following us on X, LinkedIn, Bluesky, or subscribing to email updates.