Building a Resilient Background Worker in Elixir (Without Overthinking It)

Background jobs sound simple — until they aren’t.

Retries pile up.
One failure takes down the whole worker pool.
Jobs disappear silently or run twice.

What I like about Elixir is that it encourages you to design for failure early, instead of patching it later.

This post walks through a simple but production-friendly background worker using GenServer and supervision — no frameworks, no magic.

BEAM VM Architecture


The Mental Model: Small Processes, Clear Responsibility

In Elixir, the goal isn’t to create one “smart” worker.
It’s to create many small, replaceable processes.

Each process should:

  • Do one thing
  • Fail loudly if it can’t
  • Be restarted automatically

This is the foundation of fault tolerance on the BEAM.

Elixir Supervision Tree


Step 1: Define the Worker (GenServer)

Let’s start with a worker that processes a single job and exits.

1defmodule MyApp.Worker do 2 use GenServer 3 require Logger 4 5 ## Public API 6 7 def start_link(job) do 8 GenServer.start_link(__MODULE__, job) 9 end 10 11 ## Callbacks 12 13 @impl true 14 def init(job) do 15 send(self(), :process) 16 {:ok, job} 17 end 18 19 @impl true 20 def handle_info(:process, job) do 21 case perform(job) do 22 :ok -> 23 Logger.info("Job completed successfully") 24 {:stop, :normal, job} 25 26 {:error, reason} -> 27 Logger.error("Job failed: #{inspect(reason)}") 28 {:stop, reason, job} 29 end 30 end 31 32 defp perform(_job) do 33 if :rand.uniform() > 0.7 do 34 :ok 35 else 36 {:error, :random_failure} 37 end 38 end 39end 40

Key Ideas

  • The worker does its job and exits
  • Success and failure are explicit
  • No retry logic inside the worker

Step 2: Supervise the Worker

Now we introduce a supervisor to manage worker lifecycles.

1defmodule MyApp.WorkerSupervisor do 2 use DynamicSupervisor 3 4 def start_link(_) do 5 DynamicSupervisor.start_link(__MODULE__, :ok, name: __MODULE__) 6 end 7 8 @impl true 9 def init(:ok) do 10 DynamicSupervisor.init(strategy: :one_for_one) 11 end 12 13 def start_job(job) do 14 spec = {MyApp.Worker, job} 15 DynamicSupervisor.start_child(__MODULE__, spec) 16 end 17end

Why This Works Well

  • Each job runs in isolation
  • One crash doesn’t affect others
  • Restart behavior is consistent and observable

What Happens When a Job Fails?

When perform/1 fails:

  • The GenServer crashes
  • The supervisor handles cleanup
  • Logs clearly show what happened

No silent retries.
No hidden state.
No cascading failures.

This is one of the biggest advantages of building background work directly on the BEAM.


When I Use This Pattern

This approach works well for:

  • API-triggered async work
  • Data enrichment
  • External service calls
  • Internal tooling and pipelines

I wouldn’t use it for:

  • Massive job queues
  • Exactly-once delivery
  • Persistent retries without storage

Elixir gives you primitives — not opinions.

Final Thoughts

Elixir didn’t just give me better concurrency.
It changed how I think about failure as a design input.

Small processes.
Clear supervision.
Predictable recovery.

When production gets boring, you’re doing it right.