Graceful cancellations: How to keep your application and workflow state consistent 💪

Posted January 24, 2024 by Till Rohrmann ‐ 5 min read

Restate allows you to build modern distributed applications that are resilient against all kinds of infrastructure faults. The basic building block to express your application is stateful services that can interact with each other and the outside world. Restate guarantees that invocations run to completion and that the state stays consistent even in the presence of infrastructure faults. It achieves this by capturing state changes, code execution progress, and effects in a single, durable log, and by reliably delivering requests to their destination.

Restate overview

Because every invocation holds a lock on its state and will eventually complete, the overall application state will transition from one valid state into the next. But what happens if you want to stop a service invocation because it is taking too long to complete or is no longer of interest? Ideally, you could interrupt it and undo all the changes the service invocation has made so far. With Restate’s new graceful cancellation feature this is now possible.

To make this a bit more concrete, let’s assume we want to build a travel booking application. The core of our application is the Travel service that first tries to book the flight, then rents a car, and finally processes the customer’s payment before confirming the flight and car rental.

const reserveTrip = async (ctx: restate.RpcContext, tripID: string) => {
	// set up RPC clients
    const flights = ctx.rpc(flightsService);
    const carRentals = ctx.rpc(carRentalService);
    const payments = ctx.rpc(paymentsService);

    const flightBooking = await flights.reserve(tripID);
    const carBooking = await carRentals.reserve(tripID);
    const payment = await payments.process(tripID);

    // confirm the flight and car
    await flights.confirm(tripID, flightBooking);
    await ctx.rpc(carRentalService).confirm(tripID, carBooking);
}

If this reminds you of a workflow, then you are not mistaken. In Restate, every service can act as a coordinator for other services. Depending on the business logic, an invocation can last from a few milliseconds up to many months if not years. That’s the power of Restate!

Now let us assume that we want to offer our customers a way to cancel their booking. If a user cancels her booking while the Travel service is still trying to rent a car, then, ideally, we would be able to cancel the invocation exactly at this point so that we don’t start processing the payment. Moreover, we need to cancel the already booked flight. Otherwise, the airline might charge us for the plane ticket and we remain stuck with the costs.

Graceful cancellations were introduced in Restate 0.7, and allow you to stop an ongoing invocation. Restate cancels an invocation by recursively canceling all children of the invocation (direct calls, sleeps, and awakeables). Once the child invocations have been canceled, they respond with a TerminalError.

Cancellation happens recursively, starting with the leaves of the call tree.

On receiving the TerminalError Restate allows running compensating actions for an invocation. The compensating actions can undo steps that have been completed before. The cool thing is that these compensations are executed with the same guarantees as the service code. So Restate ensures that the compensations will complete. As part of these compensations, it is for example possible to reset the state of the service, call other services to undo previously executed calls, or run sideEffects to delete previously inserted rows in a database.

So how would our example code from above look like if we wanted to keep the application state consistent in the presence of graceful cancellations?

const reserveTrip = async (ctx: restate.RpcContext, tripID: string) => {
    // set up RPC clients
    const flights = ctx.rpc(flightsService);
    const carRentals = ctx.rpc(carRentalService);
    const payments = ctx.rpc(paymentsService);

    // create an compensation stack
    const compensations = [];
    try {
        // call the flights Lambda to reserve, keeping track of how to cancel
        const flightBooking = await flights.reserve(tripID);
        compensations.push(() => flights.cancel(tripID, flightBooking));

        // RPC the rental service to reserve, keeping track of how to cancel
        const carBooking = await carRentals.reserve(tripID);
        compensations.push(() => carRentals.cancel(tripID, carBooking));

        // RPC the payments service to process, keeping track of how to refund
        const payment = await payments.process(tripID);
        compensations.push(() => payments.refund(tripID, payment));

        // confirm the flight and car
        await flights.confirm(tripID, flightBooking);
        await carRentals.confirm(tripID, carBooking);
    } catch (e) {
        // undo all the steps up to this point by running the compensations
        for (const compensation of compensations.reverse()) {
            await compensation();
        }

        // exit with an error
        throw new restate.TerminalError(`Travel reservation failed with err '${e}'`);
    }
}

You can find the full example code in our examples repository. It is also available in Java!

After every successful step, we register a compensation which will undo the effect in case of a failure (see the catch block). Because these compensations are guaranteed to complete, we can be sure that the overall system will return to a consistent state in the event of errors. Also, note that the code only contains pure business logic and no handling of infrastructure failures because Restate will take care of that for you.

An interesting observation is that the code snippet above can handle any TerminalErrors from called services (e.g. also those used for control flow logic). The cancellation mechanism was designed to blend in naturally with the general error propagation mechanism. So cancellations and TerminalErrors follow the same compensation steps to keep the application state consistent.

Restate will always try to run invocations to completion, but in some cases, this is just not possible. For example, if the service deployment with your service code is no longer present, then Restate will be unable to complete an invocation and run its compensations. Similarly, if you deploy new code that is incompatible with the journal of in-flight invocations, then Restate will not be able to make progress anymore. Instead Restate will keep retrying those invocations infinitely until the problem gets fixed or the user manually intervenes. For these situations, Restate allows to kill an invocation which immediately stops it without running its compensations. The callers of this invocation will still receive a TerminalError which they can handle appropriately (e.g. running their compensations).

You can use our new CLI to cancel or kill an invocation as follows:

# graceful cancellation
restate inv cancel [INVOCATION_ID]
# hard kill
restate inv cancel --kill [INVOCATION_ID]

If this sounds too good to be true, then give Restate a spin and try it yourself. You can get the binaries via our download page and with our quickstart you are up and running in only a few minutes.


Join the community and help shape Restate!

Join our Discord!