Coralogix builds autonomous AI observability agent with Restate
“We've implemented retry logic and durable execution from scratch at least 10 times throughout our careers. With Restate, we finally have infrastructure that just handles it. It's a waste of time to build this again.”
Alon Gubkin
Engineering Lead, Coralogix
Coralogix is an observability platform. They built Olly, an autonomous AI observability agent that interprets natural-language questions and pulls insights directly from logs, metrics, traces, and alerts.
Restate powers the resiliency of Olly's agentic workflows and the complex background jobs behind it — including building the semantic layer for each customer's data and indexing code repositories.
We've implemented retry logic and durable execution from scratch at least 10 times throughout our careers. With Restate, we finally have infrastructure that just handles it. It's a waste of time to build this again.
-- Alon Gubkin, Engineering Lead at Coralogix
Before Restate: Architecture & Challenges
Coralogix wanted to build an AI-powered log analysis tool for their observability platform that required complex background processing:
- Building semantic layers for each customer's data schema
- Indexing and embedding large code repositories
They wanted to make sure that the system is resilient:
- Processing gigabytes of log data without re-analyzing on failures
- Maintaining state across long-running workflows
If I'm indexing a 10GB knowledge base and it fails in the middle, I need to continue from where it failed. I need retries with exponential backoff. We've built this logic many times before — it's always a lot of work.
-- Alon Gubkin, Engineering Lead at Coralogix
The team started looking for a Durable Execution framework that could give them this resilience without rebuilding it from scratch again.
Why Restate?
Restate gave the team what they were looking for without the complexity they wanted to avoid:
- Serverless support: Restate lets them deploy business logic as serverless functions.
- Self-hosting: critical for handling sensitive customer log data and controlling costs at scale, and something the competing platforms they evaluated didn't offer.
- Long-running workflows: workflows can run for hours or days, suspending during external waits (LLM calls, retries with backoff, scheduled delays) and resuming with their state preserved.
- Programming model: HTTP handlers with normal control flow (if/else, loops, etc.), giving the team the flexibility to implement customized agentic logic and workflows.
Restate is much better from an engineering perspective. It's more flexible than alternatives, and the self-hosting option was crucial for us.
-- Alon Gubkin, Engineering Lead at Coralogix
The Results
Coralogix migrated their data processing pipelines and agentic workflows to Restate. The key capabilities they now rely on:
- Durable Execution: the semantic-layer pipelines use Restate to persist progress when building comprehensive knowledge bases for the agent, with embeddings stored in Postgres. When failures occur, pipelines automatically resume exactly where they left off.
- Agentic workflows: the team integrated the Vercel AI SDK with Restate to persist LLM responses and manage token updates in Postgres, with results streamed to users via ElectricSQL for real-time updates.
- Resilient background processing: all agentic workflows run asynchronously through Restate, letting users reconnect to in-progress workflows at any time — even after accidentally closing their browser or losing connection.
When indexing gigabytes of logs, if something fails in the middle, we don't want to re-analyze the initial data. Restate handles this perfectly.
-- Alon Gubkin, Engineering Lead at Coralogix
With these pipelines in place, the team is now extending Olly from answering user questions to proactively analyzing telemetry on its own.
We want the agent to automatically go through logs and find potential issues. Building proactive agents is a very useful use case for Restate.
-- Alon Gubkin, Engineering Lead at Coralogix
More Customer Stories
Advisoa Achieves Zero-Error, Durable Fintech Workflows with Restate
Advisoa relies on Restate to power Paypilot's most critical, error-sensitive systems, such as onboarding and bookkeeping workflows.
Read storyAient builds a production-aware AI DevOps agent on Restate
Aient turns runtime telemetry into merged code fixes — detecting production problems, finding the root cause, and opening pull requests automatically. Restate powers the agent harness, durable tool execution, and streaming.
Read storyDeliveru builds serverless AI-powered recruiting platform on Restate
Learn how Deliveru built a serverless recruiting platform on Restate Cloud and AWS Lambda that automates candidate screening and document processing.
Read storyReady to build resilient applications?
Start building with Restate today and join these success stories.