Building a Resilient Background Worker in Elixir (Without Overthinking It)

Background jobs sound simple — until they aren’t.

Retries pile up.
One failure takes down the whole worker pool.
Jobs disappear silently or run twice.

What I like about Elixir is that it encourages you to design for failure early, instead of patching it later.

This post walks through a simple but production-friendly background worker using GenServer and supervision — no frameworks, no magic.

BEAM VM Architecture

The Mental Model: Small Processes, Clear Responsibility

In Elixir, the goal isn’t to create one “smart” worker.
It’s to create many small, replaceable processes.

Each process should:

Do one thing
Fail loudly if it can’t
Be restarted automatically

This is the foundation of fault tolerance on the BEAM.

Elixir Supervision Tree

Step 1: Define the Worker (GenServer)

Let’s start with a worker that processes a single job and exits.

defmodule MyApp.Worker do
  use GenServer
  require Logger

  ## Public API

  def start_link(job) do
    GenServer.start_link(__MODULE__, job)
  end

  ## Callbacks

  @impl true
  def init(job) do
    send(self(), :process)
    {:ok, job}
  end

  @impl true
  def handle_info(:process, job) do
    case perform(job) do
      :ok ->
        Logger.info("Job completed successfully")
        {:stop, :normal, job}

      {:error, reason} ->
        Logger.error("Job failed: #{inspect(reason)}")
        {:stop, reason, job}
    end
  end

  defp perform(_job) do
    if :rand.uniform() > 0.7 do
      :ok
    else
      {:error, :random_failure}
    end
  end
end

Key Ideas

The worker does its job and exits
Success and failure are explicit
No retry logic inside the worker

Step 2: Supervise the Worker

Now we introduce a supervisor to manage worker lifecycles.

defmodule MyApp.WorkerSupervisor do
  use DynamicSupervisor

  def start_link(_) do
    DynamicSupervisor.start_link(__MODULE__, :ok, name: __MODULE__)
  end

  @impl true
  def init(:ok) do
    DynamicSupervisor.init(strategy: :one_for_one)
  end

  def start_job(job) do
    spec = {MyApp.Worker, job}
    DynamicSupervisor.start_child(__MODULE__, spec)
  end
end

Why This Works Well

Each job runs in isolation
One crash doesn’t affect others
Restart behavior is consistent and observable

What Happens When a Job Fails?

When perform/1 fails:

The GenServer crashes
The supervisor handles cleanup
Logs clearly show what happened

No silent retries.
No hidden state.
No cascading failures.

This is one of the biggest advantages of building background work directly on the BEAM.

When I Use This Pattern

This approach works well for:

API-triggered async work
Data enrichment
External service calls
Internal tooling and pipelines

I wouldn’t use it for:

Massive job queues
Exactly-once delivery
Persistent retries without storage

Elixir gives you primitives — not opinions.

Final Thoughts

Elixir didn’t just give me better concurrency.
It changed how I think about failure as a design input.

Small processes.
Clear supervision.
Predictable recovery.

When production gets boring, you’re doing it right.