The simplest way to write workflows as code

Posted July 29, 2024 by Giselle van Dongen ‐ 10 min read

TL;DR: Restate lets you write lightweight, flexible, durable workflows-as-code. Combine the resiliency of workflows with the speed and flexibility of regular functions. Restate orchestrates and manages their execution till completion, whether it’s millis or millenia.


Workflows are a powerful tool to write business processes, cloud orchestration and data transformations. You express them as a set of steps, and a workflow orchestrator ensures those steps are executed one-by-one, with retries and recovery per step. Many use cases benefit from these semantics, like order processing flows, user sign up flows, infrastructure provisioning, logistics, file processing etc. But workflow orchestrators come with sacrifices. Self-hosting them requires running a database and dedicated workflow executors. You often need to express your workflow in a non-transparent, hard-to-debug configuration language (nobody likes coding in YAML), and the latency impact makes you reconsider workflows for many use cases.

Restate lets you execute workflows via its Durable Execution mechanism, giving you some unique properties. When code is executing, the process itself logs its progress in Restate. Restate then uses this information to drive retries and recovery, and to ensure that code always runs till completion. When services recover, they restore the results of already completed operations and code blocks without re-executing them. That way, functions continue from after the last completed step, just like a workflow. It’s as if you enable workflow-like semantics for any function in your application. No new quirky YAML expression languages to learn, no restrictions on how and where you run your services, and no extra infra needed to support workflows as it’s already build into the Restate core.

Let’s have a look at a workflow implemented with Restate.

Workflows with Restate

The example below shows a sign-up workflow. It has a run handler which implements the workflow. For each incoming user sign up, the workflow creates a user entry in a system, then sends an email to the user with a link that needs to be clicked to verify the email address. If this succeeds, then the sign-up was successful.

const signUpWorkflow = restate.workflow({
    name: "signup",
    handlers: {
        run: async (ctx: WorkflowContext, user: { name: string, email: string }) => {

            // Durably executed action; write to other system
            await ctx.run(() => createUserEntry(user))

            // Store some K/V state; can be retrieved from other handlers
            ctx.set("onboarding_status", "Created user");

            // Sent user email with verification link
            const secret = ctx.rand.uuidv4();
            await ctx.run(() => sendEmailWithLink({ email: user.email, secret }));
            ctx.set("onboarding_status", "Verifying user");

            // Wait until user clicked email verification link
            // Resolved or rejected by the other handlers
            const clickSecret = await ctx
                .promise<string>("email.clicked");
            ctx.set("onboarding_status", "Link clicked");

            return clickSecret == secret;
        },

        click: (ctx: restate.WorkflowSharedContext, secret: string) =>
            // Resolve the promise with the result secret
            ctx.promise<string>("email.clicked")
               .resolve(secret),

        getStatus: (ctx: restate.WorkflowSharedContext) =>
            // Get the onboarding status of the user
            ctx.get<string>("onboarding_status"),
    },
});

export type SignUpWorkflow = typeof signUpWorkflow;

restate.endpoint().bind(signUpWorkflow).listen();
@Workflow
public class SignupWorkflow {

    // References to K/V state and promises stored in Restate
    private static final DurablePromiseKey<String> EMAIL_CLICKED =
            DurablePromiseKey.of("email_clicked", JsonSerdes.STRING);
    private static final StateKey<String> ONBOARDING_STATUS =
            StateKey.of("status", JsonSerdes.STRING);

    @Workflow
    public boolean run(WorkflowContext ctx, User user) {

        // Durably executed action; write to other system
        ctx.run(() -> createUserEntry(user));

        // Store some K/V state; can be retrieved from other handlers
        ctx.set(ONBOARDING_STATUS, "Created user");

        // Sent user email with verification link
        String secret = ctx.random().nextUUID().toString();
        ctx.run(() -> sendEmailWithLink(user.getEmail(), secret));
        ctx.set(ONBOARDING_STATUS, "Verifying user");

        // Wait until user clicked email verification link
        // Resolved or rejected by the other handlers
        String clickSecret =
                ctx.promise(EMAIL_CLICKED)
                        .awaitable()
                        .await();
        ctx.set(ONBOARDING_STATUS, "Link clicked");

        return clickSecret.equals(secret);
    }


    @Shared
    public void click(SharedWorkflowContext ctx, String secret) {
        // Resolve the promise with the result secret
        ctx.promiseHandle(EMAIL_CLICKED).resolve(secret);
    }

    @Shared
    public String getStatus(SharedWorkflowContext ctx) {
        // Get the onboarding status of the user
        return ctx.get(ONBOARDING_STATUS).orElse("Unknown");
    }


    public static void main(String[] args) {
        RestateHttpEndpointBuilder.builder()
                .bind(new SignupWorkflow())
                .buildAndListen();
    }
}

Flexibility of code

The first thing you notice is that the workflow is implemented in pure code. No hard-to-debug YAML/JSON configuration files. Instead, workflows run like normal services with a Restate SDK library embedded in it. The Restate SDK notifies a central Restate Server of the progress the handler is making. As you see here, the code uses a WorkflowContext. Every time this is used (e.g. ctx.run), the SDK sends an event to the Restate Server. Once an event is persisted in the Server, this step will not re-execute on retries. It gets skipped and the previous result gets returned, by reading it from the event log (read more).

With this mechanism, we can write the sign-up workflow in just code, by wrapping key steps of the workflow in ctx.something commands, for example, calls to other systems and services, timers, results of code blocks, etc. Every step becomes durable, and gets skipped or retried on failures.

Being able to write workflows in code gives a lot of flexibility because you can do things like:

  • Parallelize work via a simple for loop which schedules work. Each of the scheduled tasks is guaranteed to run till the end. You can also delay tasks.
  • Flexible failure handling and rollbacks by recording compensations in your try-block and running them in the catch .
  • Time-based escalation by sleeping or delaying service calls. Restate tracks timers and makes sure they fire.
  • Use your regular testing and debugging tools.
  • And anything else you can pour into code.

💡 Workflows as regular, lightweight functions

Restate’s workflow functionality is integrated in its core. You don’t need to spin up any extra infrastructure. The Restate Server runs as a single binary. You can run it locally in a few seconds with a single command, self-host it on a container platform or get started immediately with Restate Cloud free tier. Your workflows can also run anywhere: locally, on K8S, FaaS, or mix-and-match.

Low-latency Durable Execution

The Restate Server and logs all events that come in from your workflows and services. It’s like an event broker with some add-ons for storing K/V state and tracking timers. Restate has an event-driven foundation that is designed in such a way that it can process events at a high speed. Submitting a workflow is just an RPC call that gets proxied and persisted by Restate, and executed immediately.

This means that you can use Restate workflows in the latency-sensitive paths of your application, which is something that workflows usually weren’t considered for. This opens up a box of use cases that benefit from snappy interaction, such as using workflows to track, reserve, and expire shopping cart items.

Durable building blocks

To make it easy to express workflows, the SDK provides a distributed, durable version of common workflow building blocks.

Orchestration

The SDK lets you call other services or workflows via durable RPC calls or messages. You can optionally add a delay to the request to schedule tasks for later. Restate handles retries and recovery for the nested invocations.

Signaling workflows

An example of a common action is signaling the workflow. In the example, we want to notify the run handler when the email link has been clicked. We implemented this via Restate’s Durable Promises. The run handler creates a Promise which then gets resolved via the approveEmail and rejectEmail handlers. This promise is tracked by Restate and recovered on failures.

Querying workflows

We can also query the workflow, via the getStatus handler. The run handler stores K/V state in Restate via ctx.set . The getStatus handler exposes this state to the outside world. State updates take part in durable execution, so happen exactly once. You can store any K/V pair in Restate.

Have a look at the documentation(TS/Java) to discover the other SDK features.

SDK clients

Restate supplies SDK clients to submit workflows. You can also use this to retrieve their results later on, or to re-attach if the caller lost the connection:

// import * as clients from "@restatedev/restate-sdk-clients";
const rs = clients.connect({url: "http://localhost:8080"});
await rs.workflowClient<SignUpWorkflow>({name: "sign-up-workflow"}, myUser)
    .workflowSubmit(user);

// Do something else, with workflow running in the background

// Attach back to the workflow
const result = await rs.workflowClient<SignUpWorkflow>({name: "sign-up-workflow"}, myUser)
    .workflowAttach();

Find out more

Client restate = Client.connect("http://localhost:8080");
SendResponse handle =
    SignupWorkflowClient.fromClient(restate, "someone")
        .submit(myUser);
        
        
// Do something else, with workflow running in the background
        
// Attach and wait for result
boolean result =
    SignupWorkflowClient.fromClient(restate, "someone")
        .workflowHandle()
        .attach();

Find out more

Server(less)

Some workflows run for a long time. In our example, we are waiting for a user to click a link. He could do this right away, but also tomorrow or next week. This doesn’t work well with Function-as-a-Service platforms such as AWS Lambda, where you get charged for execution time. Restate’s Durable Execution mechanism can help us out in an interesting way here. We can let the function register what it’s waiting for in Restate, and then artificially crash. Restate then does the waiting, and invokes the function again when the wait is over (sleep has been completed, call response received, etc.). On the re-invoke, Restate fast-forwards the function to where it was and lets it continue from there on.

With this ability in your toolbox, you can write functions in the same way as you would for long-running infrastructure (with sleeps, waiting for request-response calls, human-in-the-loop callbacks, etc), and deploy it on AWS Lambda without any changes and without increasing costs. All it requires is changing a single line of code (TS/Java). Read more in this dedicated blog post.

Gradual adoption

Restate makes it easy to add Durable Execution to a workflow:

  • Your workflows run on the same platform as the rest of your services / functions.
  • You can start by wrapping the orchestration function of your application in a Restate handler. This already gives you retries, durability of the invocations, and idempotency.
  • You can gradually define more fine-grained durable steps within the workflow itself. For example, by wrapping some of the key parts in ctx.run.
  • You can invoke workflows either by HTTP, Kafka, or with client libraries.

You can use Restate for more than workflows. Optionally, you can also enable Durable Execution for the other services and functions in your application and get end-to-end durability and resilience. There are a few parts of the application for which Durable Execution can be particularly useful: async tasks, microservices orchestration, and even event processing from Kafka.

Use cases

Have a look at the food ordering example with Restate, including workflows, Kafka event processing, and a digital twin pattern, all fully resilient. Or try to build a Restate application yourself with the Tour of Restate.

What can you build with Workflows and Restate?

There is no real limit to what kind of workflows you can build with Restate, but here are a few ideas:

  • User sign up flows like the one we showed: Create user in the system, wait until email confirmation, schedule a reminder, send welcome email, etc.
  • Order processing and logistics (TS/Java): Handle the payment, request the order preparation, wait for driver acceptance callback, etc.
  • Infrastructure provisioning: Go through a set of steps to provision a setup. Retry until resources are up, handle timeouts, rollbacks, etc.
  • Workflow interpreters (TS): Dynamically compose workflows based on user input. For example, an image transformer that lets users specify the steps to be taken. Each workflow execution takes a different path.
  • Discuss your use case with us on Discord or Slack

Wrapping up

A short recap of the most important bits. Restate lets you write workflows as code and execute them as lightweight, flexible, durable handlers. You write your handlers in plain Java/TS and use a simple, intuitive SDK at the key points throughout your code that you don’t want to re-execute (e.g. non-deterministic code, interaction with other services/systems, etc.). The Restate Server is a single binary that records actions that were taken, and recovers services to where they were before the crash. Your workflows and services can run on K8s, AWS Lambda, or wherever they are running now. Restate’s event-driven foundation lets you put workflows in the latency-sensitive path of your applications and make these resilient out of the box. Get started by enabling Restate for the main orchestration flow of your application. Have a look at the documentation to discover more!