All Customer Stories
Coralogix logo

Coralogix builds autonomous AI observability agent with Restate

Observability & MonitoringTypeScriptNext.jsVercel AI SDKPostgres / pgvectorElectricSQL
We've implemented retry logic and durable execution from scratch at least 10 times throughout our careers. With Restate, we finally have infrastructure that just handles it. It's a waste of time to build this again.
A

Alon Gubkin

Engineering Lead, Coralogix

Coralogix is an observability platform. They built Olly, an autonomous AI observability agent that interprets natural-language questions and pulls insights directly from logs, metrics, traces, and alerts.

Restate powers the resiliency of Olly's agentic workflows and the complex background jobs behind it — including building the semantic layer for each customer's data and indexing code repositories.

We've implemented retry logic and durable execution from scratch at least 10 times throughout our careers. With Restate, we finally have infrastructure that just handles it. It's a waste of time to build this again.

-- Alon Gubkin, Engineering Lead at Coralogix

Before Restate: Architecture & Challenges

Coralogix wanted to build an AI-powered log analysis tool for their observability platform that required complex background processing:

  • Building semantic layers for each customer's data schema
  • Indexing and embedding large code repositories

They wanted to make sure that the system is resilient:

  • Processing gigabytes of log data without re-analyzing on failures
  • Maintaining state across long-running workflows

If I'm indexing a 10GB knowledge base and it fails in the middle, I need to continue from where it failed. I need retries with exponential backoff. We've built this logic many times before — it's always a lot of work.

-- Alon Gubkin, Engineering Lead at Coralogix

The team started looking for a Durable Execution framework that could give them this resilience without rebuilding it from scratch again.

Why Restate?

Restate gave the team what they were looking for without the complexity they wanted to avoid:

  • Serverless support: Restate lets them deploy business logic as serverless functions.
  • Self-hosting: critical for handling sensitive customer log data and controlling costs at scale, and something the competing platforms they evaluated didn't offer.
  • Long-running workflows: workflows can run for hours or days, suspending during external waits (LLM calls, retries with backoff, scheduled delays) and resuming with their state preserved.
  • Programming model: HTTP handlers with normal control flow (if/else, loops, etc.), giving the team the flexibility to implement customized agentic logic and workflows.

Restate is much better from an engineering perspective. It's more flexible than alternatives, and the self-hosting option was crucial for us.

-- Alon Gubkin, Engineering Lead at Coralogix

The Results

Coralogix migrated their data processing pipelines and agentic workflows to Restate. The key capabilities they now rely on:

  • Durable Execution: the semantic-layer pipelines use Restate to persist progress when building comprehensive knowledge bases for the agent, with embeddings stored in Postgres. When failures occur, pipelines automatically resume exactly where they left off.
  • Agentic workflows: the team integrated the Vercel AI SDK with Restate to persist LLM responses and manage token updates in Postgres, with results streamed to users via ElectricSQL for real-time updates.
  • Resilient background processing: all agentic workflows run asynchronously through Restate, letting users reconnect to in-progress workflows at any time — even after accidentally closing their browser or losing connection.

When indexing gigabytes of logs, if something fails in the middle, we don't want to re-analyze the initial data. Restate handles this perfectly.

-- Alon Gubkin, Engineering Lead at Coralogix

With these pipelines in place, the team is now extending Olly from answering user questions to proactively analyzing telemetry on its own.

We want the agent to automatically go through logs and find potential issues. Building proactive agents is a very useful use case for Restate.

-- Alon Gubkin, Engineering Lead at Coralogix

More Customer Stories

Ready to build resilient applications?

Start building with Restate today and join these success stories.