# Aadil Ghani — Complete Blog Content

> Berlin-based software engineer and startup founder. This file contains the full text of every published blog post for AI/LLM ingestion. For a structured overview, see [llms.txt](https://aadilghani.com/llms.txt).

---

## How We Built a Push Notification System That Actually Doesn't Lose Messages

- **URL**: https://aadilghani.com/blog/pushary-notification-pipeline
- **Published**: 2026-03-14
- **Category**: Engineering
- **Tags**: System Design, Kafka, Web Push, Pushary
- **Reading time**: 13 min read
- **Word count**: 2431
- **Author**: Aadil Ghani

I spent the last few months building Pushary's notification pipeline from scratch. Not because existing tools weren't available. Because they weren't good enough for what we needed: a system where if you hit "send," that notification reaches the browser. Period.

This is the technical breakdown. How it works, why it works, and the decisions that went into it.

## The Problem With "Just Send a Push Notification"

Web Push sounds simple on paper. You get a subscription endpoint from the browser, encrypt a payload, POST it to a push service (Google's FCM, Mozilla's autopush, Apple's push gateway), and the browser shows a notification.

In reality, it's a minefield.

Networks fail. Push services rate-limit you. Subscriptions expire silently. iOS has its own universe of constraints. And if you're sending to 50,000 subscribers at once, you need guarantees that "sent" actually means sent. Not "we tried once and gave up."

## The Architecture: Transactional Outbox + Kafka

Here's the core insight that drives everything: **never lose the intent to send.**

When a campaign is triggered, we don't immediately fire off web push calls. Instead, we write notification records and outbox events into Postgres in the same database transaction. This is the transactional outbox pattern, and it's the foundation of the entire reliability story.

```
Campaign Trigger -> [Postgres Transaction: notifications + outbox_events] -> OutboxPublisher -> Kafka -> Consumer -> Web Push API
```

Why does this matter? Because if the server crashes one millisecond after the database commit, the notification intent is already persisted. Nothing is lost. The outbox publisher picks it up on the next poll.

### The Outbox Publisher

The outbox publisher is the bridge between Postgres and Kafka. It uses two mechanisms to detect new events:

**Postgres LISTEN/NOTIFY** for near-instant pickup. When an outbox event is inserted, a trigger fires a NOTIFY on the `outbox_events_new` channel. The publisher hears it and immediately polls.

**Adaptive polling as a fallback.** If the LISTEN connection drops (networks are fun), polling kicks in with an interval that scales between 500ms and 10 seconds based on load. Busy? Poll faster. Quiet? Back off.

The publisher claims events using `SELECT ... FOR UPDATE SKIP LOCKED`. This is critical for horizontal scaling. Multiple publisher instances can run without stepping on each other. Each instance grabs unclaimed events, publishes them to Kafka in batches grouped by topic, and marks them as sent.

If Kafka is unreachable, events stay in Postgres. After 5 failed publish attempts, they route to a dead letter queue. And if we see 5 consecutive failures, a circuit breaker opens and we back off for 60 seconds instead of hammering a broken connection.

### Kafka: The Backbone

We run Kafka with an idempotent producer (exactly-once semantics on the producer side), Snappy compression, and manual offset commits on the consumer.

Two topics:

- `pushary.notifications` — the main event stream (6 partitions, replication factor 3)
- `pushary.dlq` — dead letter queue for messages that can't be processed

The consumer uses batch consumption with a concurrency semaphore capped at 10. This means we process up to 10 notifications simultaneously per consumer instance, with backpressure built in. We don't commit offsets until a message is fully processed or routed to the DLQ. That's at-least-once delivery.

"At-least-once" means a message might be delivered twice. So we need idempotency.

### Two-Layer Idempotency

This is where most systems cut corners. We don't.

**Layer 1: Event-level idempotency.** Every event gets a unique ID. Before processing, we check the `processed_events` table. If it's already marked `completed`, we skip. If it's not there, we insert a `reserved` row. If processing fails, we delete the reservation so it can be retried. If it succeeds, we mark it `completed`.

**Layer 2: Delivery-level idempotency.** Even within a single event, we deduplicate at the notification level. The `notification_deliveries` table uses a dedupe key of `NOTIFICATION:{notificationId}` with `SELECT FOR UPDATE SKIP LOCKED` in a transaction. A notification can only be sent once, regardless of how many times the event is replayed.

On startup, we clean up stale reservations older than 5 minutes. This handles the case where a worker dies mid-processing.

---

## The Actual Web Push Delivery

Once a message reaches the event handler, we call `webpush.sendNotification()` with the subscriber's push endpoint, their p256dh and auth keys, and our VAPID credentials.

Each site gets its own VAPID key pair. The payload is encrypted per RFC 8291 (VAPID) and RFC 8188 (content encoding). We set a TTL of 86,400 seconds (24 hours) and urgency "normal."

The payload itself contains everything the service worker needs: title, body, icon, image, badge, notification ID, subscriber ID, campaign ID, site key, and our API URL. This data travels encrypted end-to-end from our server to the browser's push service to the service worker.

On success, we atomically update the notification status to `sent`, increment the subscriber's total notification count, increment the campaign's sent counter, and update daily stats. All in one transaction.

On failure, we inspect the HTTP status code:

- **410/404**: Subscription expired or gone. We mark the subscriber as `expired` so we never waste bandwidth on dead endpoints again.
- **401**: VAPID authentication error. Something's wrong with our keys.
- **429**: Push service rate-limiting us. Back off.
- **413**: Payload too large.

Every failure updates the notification status to `failed` with the error code and increments the campaign's failure counter.

---

## Click Tracking and the Redirect URL

Here's a question most push notification services don't think about carefully: how do you know someone actually clicked?

We track clicks through two parallel paths, because reliability means redundancy.

### Path 1: Direct Tracking

When the service worker's `notificationclick` event fires, we immediately POST a `click` event to `/api/v1/track`. This uses `fetch` with `keepalive: true` so the request survives even if the page navigates away.

If that POST fails (offline, network blip, whatever), the event gets queued into IndexedDB. It's retried on the next push event, on `visibilitychange`, or via Background Sync. Max 3 attempts over 24 hours before we give up.

### Path 2: The Redirect URL

Simultaneously, we redirect the user through our tracking endpoint. Instead of navigating directly to `https://yoursite.com/sale`, we navigate to:

```
https://pushary.com/api/v1/redirect?url=https://yoursite.com/sale&sk=your_site_key&nid=notification_id&cid=campaign_id&sid=subscriber_id
```

The redirect endpoint does three things:

1. **Races tracking against a 500ms timeout.** We record the click (notification status, campaign counters, subscriber stats, daily stats, and a full analytics event) but we never hold the user's redirect hostage to our database. If tracking takes longer than 500ms, we redirect anyway.
2. **Uses an atomic CTE query.** When we have both a notification ID and campaign ID, a single SQL statement updates the notification status, increments the campaign click counter, increments the subscriber's click count, upserts daily stats, and inserts the analytics event. One round trip. Zero race conditions.
3. **Prevents double-counting.** The `clicked_at IS NULL` guard in the CTE means if two click events arrive for the same notification (from both tracking paths), only the first one increments counters.

The redirect then returns a `302` to the actual target URL.

### Why Both Paths?

Because the direct POST gives us faster, more reliable tracking data (it fires before any navigation), but the redirect URL is the safety net. If the service worker's fetch fails, the redirect still captures the click. If the redirect is slow, the direct POST already recorded it.

---

## Click Rates

Click rate is calculated as:

```
clickRate = (totalClicked / totalDelivered) * 100
```

Where `totalDelivered` counts notifications with status `delivered` or `clicked` (because clicked implies delivered). Not `sent`. There's a meaningful difference between "we sent it to the push service" and "the browser actually showed it."

We know a notification was delivered because the service worker sends an `impression` event when `handlePush` fires. That's the browser telling us "I received this and showed it to the user."

We track these metrics at multiple granularities:

- **Per-notification**: Individual status lifecycle (pending, sent, delivered, clicked, dismissed, failed)
- **Per-campaign**: Denormalized counters (totalTargeted, totalSent, totalDelivered, totalClicked, totalDismissed, totalFailed)
- **Per-day**: Aggregated daily stats per site with unique clicker counts
- **Per-subscriber**: Running totals of notifications received and clicks, plus last active timestamp

The analytics layer goes deeper. We parse user agents for device type, browser, and OS. We pull geo data from Vercel and Cloudflare headers. We track which URLs get clicked, at what time of day, on which day of the week, and we surface "best send times" based on historical click patterns.

---

## iOS: The Hard Part

iOS doesn't support Web Push the way every other platform does. On Android and desktop browsers, you call `Notification.requestPermission()`, the user says yes, you get a push subscription, done.

On iOS, push notifications only work inside a Progressive Web App that the user has installed to their home screen. This has been the case since iOS 16.4, and it's not changing anytime soon.

So we built a dedicated subscribe flow that handles the full iOS journey:

### Step 1: Detect the Browser

We check if the user is on iOS, and if so, which browser. Safari is required for PWA installation. If they're in Chrome for iOS (CriOS), Firefox for iOS (FxiOS), or an in-app browser (Instagram, Facebook, Twitter, TikTok, LinkedIn), we show a prompt: "Open in Safari to enable notifications."

We detect in-app browsers specifically because they're the most common way users land on a page from social media, and none of them support PWA installation.

### Step 2: Guide PWA Installation

Once in Safari, we check `navigator.standalone` and the `(display-mode: standalone)` media query. If the user isn't in standalone mode, we show an `IOSInstallGuide` component that walks them through: Share button, "Add to Home Screen," confirm.

This is a UX challenge, not a technical one. You're asking users to do 3 extra taps before they can subscribe. Every word and every visual in that guide matters for conversion.

### Step 3: Request Permission in Standalone Mode

Once the PWA is open in standalone mode, we show the notification permission prompt. Standard `pushManager.subscribe()` with the VAPID public key. If they accept, we POST the subscription to our server and they're subscribed.

### The iOS Background Problem

Here's a subtle bug that took real debugging time: on iOS, when a PWA is backgrounded, the service worker's clients can be "frozen." Calling `client.focus()` throws. Calling `client.navigate()` fails silently.

Our solution is a navigation acknowledgment protocol. When a notification is clicked:

1. The service worker sends a `PUSHARY_NAVIGATE` message to the client with a unique navigation token.
2. It waits 800ms for a `PUSHARY_NAVIGATE_ACK` response.
3. If the client is frozen (no ack), we fall through to `clients.openWindow()`.
4. If even that fails, we store the pending navigation in IndexedDB. When the PWA eventually wakes up (via `visibilitychange` or `pageshow`), it checks for pending navigations and executes them.

The service worker URL is versioned with `?v=20260305-ios-bg-nav-ack1` specifically because this iOS background navigation handling is a targeted feature we iterate on.

### Context Recovery

Another iOS quirk: `notification.data` can sometimes be stripped by the OS between when the notification is shown and when the user clicks it. The service worker shows the notification and saves the full context (notification ID, subscriber ID, campaign ID, site key, API URL) to IndexedDB, keyed by notification tag.

When the click handler fires, if `event.notification.data` is empty, we recover context by matching the tag, or the notification ID extracted from the tag, or even by matching the title and body against recently saved contexts (within a 2-minute window for a more permissive fallback, 10 minutes for exact matches).

This means even when iOS strips our data, we still track the click accurately and redirect to the right URL.

---

## The DLQ and Recovery

Messages that can't be processed end up in the dead letter queue Kafka topic. Every DLQ message carries the original payload, the error details, attempt count, and metadata about which topic and partition it came from.

We classify errors:

- `invalid-json` and `schema-validation`: These are permanently broken. Replaying them won't help.
- `processing-error`: Transient failures that might succeed on retry.
- `outbox-publish-error`: Kafka was unreachable.

There's an admin endpoint at `/admin/dlq/replay` that supports filtering by error type, setting a limit, and doing dry runs. Permanently broken messages are automatically skipped during replay.

The admin server also exposes `/admin/metrics` with processing latency percentiles (p50, p95, p99) and counters for sent, failed, outbox processed, outbox failed, and consumer errors.

---

## What We Didn't Build

We didn't build Apple Push Notification Service (APNs) integration. We use standard Web Push (VAPID) exclusively. iOS 16.4+ supports it through PWAs, and the web standard is where the momentum is. No certificate management, no proprietary protocols, no App Store dependency.

We didn't build our own push service relay. Google's FCM, Mozilla's autopush, and Apple's push gateway are the endpoints. They're the ones with global edge infrastructure. We encrypt the payload, send it, and trust the pipe.

We didn't build complex retry scheduling with exponential backoff at the Kafka consumer level beyond 3 attempts. If a notification fails 3 times, something is fundamentally wrong with that subscriber's endpoint. Mark it failed, move on, keep the pipeline fast for the millions that work.

---

## The Result

A notification goes from "campaign triggered" to "showing on the user's screen" through: Postgres transaction, outbox publisher (LISTEN/NOTIFY + adaptive polling), Kafka (idempotent producer, manual offset commits), event handler (two-layer idempotency, web push delivery), service worker (impression tracking, click handling, retry queue).

Every step has a fallback. Every step has idempotency. Every failure is classified and either retried or routed to recovery.

| Layer | Technology |
|-------|-----------|
| Message broker | Kafka (idempotent producer, manual offsets) |
| Database | PostgreSQL (transactional outbox) |
| Event detection | Postgres LISTEN/NOTIFY + adaptive polling |
| Concurrency | SELECT FOR UPDATE SKIP LOCKED |
| Push protocol | Web Push (VAPID, RFC 8291, RFC 8188) |
| Click tracking | Dual-path (service worker POST + redirect URL) |
| iOS support | PWA with navigation ack protocol |
| Dead letter queue | Kafka DLQ with admin replay |
| Monitoring | p50/p95/p99 latency, per-event counters |

That's the system. Not because we love complexity. Because push notifications that don't arrive are worse than no push notifications at all.

---

For another look at how I orchestrate complex multi-service pipelines, read [how I built an AI video ad pipeline that coordinates 6 AI services](/blog/ai-video-ad-pipeline-scene-composer) — same philosophy of typed pipelines and failure recovery, applied to generative AI.

---

*Building [Pushary](https://pushary.com) in public. If you're sending push notifications at scale and care about delivery reliability, we should talk.*

---

## I Built an AI Video Ad Pipeline That Orchestrates 6 AI Services

- **URL**: https://aadilghani.com/blog/ai-video-ad-pipeline-scene-composer
- **Published**: 2025-12-25
- **Category**: Engineering
- **Tags**: AI, Effect-TS, Video Generation, System Design
- **Reading time**: 10 min read
- **Word count**: 1919
- **Author**: Aadil Ghani

Most "AI video" tools give you a text box and a prayer. Type a prompt. Wait. Hope what comes back looks like something you'd actually run as an ad.

I wanted something different.

When a small business owner types "create a video ad for my plumbing business," I wanted the system to *think like a creative director*. Plan scenes. Cast characters. Direct cameras. Write voiceover copy. Compose music. Deliver a broadcast-ready video ad with burned-in captions. Zero manual editing.

So I built the Scene Composer. Here's how it works.

---

## Video Ads Are the Hardest Creative to Automate

Text generation is solved. Image generation is table stakes. But a 30-second video ad with coherent scene transitions, synced voiceover, background music, and animated captions? That takes coordinating multiple AI models, keeping them temporally aligned, and delivering output that looks *intentional*. Not generated.

The hard part isn't any single AI call. It's the orchestration.

A video ad needs at least six operations that depend on each other: script and scene planning, reference image generation per scene, image-to-video conversion, text-to-speech with word-level timestamps, background music composition, and final assembly with caption burning. Some of these can run in parallel. Some absolutely cannot. And any one of them can fail.

I needed architecture that could handle all of this reliably, resume from any failure point, and still finish in under two minutes.

---

## Architecture: Effect-TS as the Backbone

I didn't reach for a workflow engine or a queue system. I reached for Effect-TS.

If you haven't used Effect, think of it as TypeScript's answer to the question: "What if errors, concurrency, retries, timeouts, and dependency injection were all first-class language features instead of afterthoughts?"

The Scene Composer is structured as five directories, each with a strict boundary:

- **`domain/`** Pure TypeScript. Zero side effects. Types, validation, constants, helpers. You can test every function here without mocking a single dependency.
- **`services/`** Six stateless Effect services, one per external integration. Each implements a typed interface and is provided via Effect's dependency injection layer.
- **`orchestration/`** The conductors. These modules coordinate services, manage concurrency, and handle per-scene error recovery.
- **`stages/`** Sequential pipeline phases that persist state to the database between each step.
- **`clients/`** Thin API wrappers. Nothing smart happens here.

The service layer composes into a single dependency:

```typescript
const SceneComposerLive = Layer.mergeAll(
  SceneCompositionServiceLive,    // Google Gemini
  ImageGenerationServiceLive,     // Flux via fal.ai
  VideoGenerationServiceLive,     // Veo 3.1 / Sora-2
  VideoAssemblerServiceLive,      // FFmpeg assembly
  AudioServiceLive,               // ElevenLabs TTS + music
  CaptionBurningServiceLive,      // Remotion Lambda
)
```

Six services. One composable layer. Every function in the pipeline declares exactly which services it needs in its type signature.

---

## The Pipeline: 4 Stages, 6 AI Services, ~90 Seconds

### Stage 1: Scene Composition (Google Gemini)

The pipeline starts with a multimodal prompt to Gemini. I send the business context, product details, any uploaded images, and the user's creative brief. Gemini returns structured output. Not prose. A typed `SceneCompositionOutput`:

- A `GlobalStyle` object defining the visual direction, protagonist characteristics, voice configuration, color palette, and mood
- An array of `SceneDescription` objects, each with a narrative, camera direction, emotion, duration (4/6/8 seconds), and a detailed first-frame image prompt
- A complete voiceover script, pre-calibrated to match the total video duration

That last part matters. I measured words-per-second rates for each voice in the library (ranging from 1.95 to 2.22 WPS depending on voice style). The scene composition prompt includes these rates so Gemini writes scripts that actually fit the runtime. If the script runs long, the system auto-adjusts scene durations before proceeding.

### Stage 2: Reference Images (Flux via fal.ai)

Each scene gets a reference image. A high-quality still frame that serves as the visual anchor for video generation.

These generate in parallel (up to 8 concurrent) using Flux through fal.ai. Each prompt is built from the scene's `firstFramePrompt` plus anti-collage constraints to avoid the "AI grid" look that ruins coherence.

Here's where the character library comes in. I maintain 13 pre-generated diverse character personas, each photographed from three angles: headshot, bodyshot, and seated. When the scene calls for a protagonist and the user hasn't uploaded their own reference, Gemini selects from this shuffled library and includes the character's reference images in the generation prompt. The character metadata (age, gender presentation, ethnicity, vibe) helps the AI make contextually appropriate casting decisions.

This gives you something that typically requires a photo shoot: visual consistency across scenes with a recognizable protagonist.

### Stage 3: Video Generation (Veo 3.1 / Sora-2 via fal.ai)

Each reference image converts to a video clip using image-to-video generation. The engine selection is deliberate:

- **Multi-scene videos** use Google Veo 3.1. Better at maintaining visual consistency across clips.
- **Single-scene videos** use OpenAI Sora-2. Stronger at complex motion and cinematic quality for longer standalone shots.

Both models run through fal.ai as a proxy layer, giving me unified queue management and progress tracking. Concurrency stays at 8 parallel generations with exponential backoff retries. Each scene independently tracks its status (`pending` → `generating_image` → `generating_video` → `completed` or `failed`) so a single scene failure doesn't kill the run.

### Stage 4: Assembly (ElevenLabs + FFmpeg + Remotion Lambda)

This is where everything comes together.

Three operations run in parallel:

1. **Voiceover generation.** ElevenLabs converts the full script to speech with word-level timestamps. Not sentence-level. Individual word start/end times.
2. **Background music.** ElevenLabs composes an instrumental track matched to the scene's emotion, mood, and ambience.
3. **Video concatenation.** An FFmpeg-based service stitches scene clips together with the audio tracks, mixing voiceover and music at appropriate levels.

Then the final piece: caption burning.

Word-level timestamps from the voiceover feed into a Remotion composition running on AWS Lambda. I built three caption styles:

- **Hormozi.** Bold Anton font, uppercase, green highlight, two words per line, scale animation. The internet marketing standard.
- **Framed.** Inter font, black pill highlight, three words per line. Clean and readable.
- **Simple.** Inter font, text shadow, six words per line, bottom-positioned with spring enter/exit animation.

Remotion renders at 1080x1920, 30fps, H.264. Directly on Lambda, no GPU instances needed. The rendered video uploads to Supabase storage, and the pipeline updates both the `generated_videos` and `ads` records atomically.

---

## The Part Nobody Talks About: Resumability

The pipeline takes about 90 seconds when everything works. But AI APIs fail. Network requests timeout. Lambda functions run out of memory.

This is where Effect-TS earns its keep.

Every stage writes its progress to the database. The `generated_videos` table tracks the current phase (`scenes` → `images` → `videos` → `assembly` → `complete`), and each `video_scene` record tracks its individual status.

When a generation resumes, the system runs a completion analysis:

- **All scenes have images but some lack videos?** Skip to Stage 3.
- **All scenes have videos but assembly failed?** Skip to Stage 4.
- **Caption data exists but captioned video doesn't?** Run caption burning only.
- **Some scenes stuck for more than 15 minutes?** Reset those scenes and retry, respecting a maximum retry count.

The resume function figures out exactly where things broke and runs the minimum work needed to finish. 20-minute timeout on fresh runs. 25-minute timeout on resumes since they might need to re-run expensive stages.

```typescript
export const composeScenesEffect = (input: ComposeScenesInput) =>
  Effect.scoped(
    Effect.gen(function* () {
      const {
        generatedVideoId, globalStyle,
        protagonistReferenceUrl, savedScenes,
      } = yield* initializeGenerationEffect(input)

      const scenesWithImages =
        yield* runImageStageEffect({ ... })
      yield* runVideoStageEffect({ ... })
      const { videoUrl } =
        yield* runAssemblyStageEffect({ ... })

      return { videoUrl }
    }),
  ).pipe(
    Effect.timeout(Duration.millis(COMPOSE_PIPELINE_TIMEOUT_MS)),
    Effect.tapError((error) =>
      updateGeneratedVideoErrorByAdIdEffect(
        input.adId, extractErrorDetails(error)
      )
    ),
    Effect.onInterrupt(() =>
      updateGeneratedVideoErrorByAdIdEffect(
        input.adId, 'pipeline-interrupted'
      )
    ),
  )
```

Four lines of pipeline. Complete timeout handling, error persistence, and interruption cleanup. All declarative, all typed.

---

## Selective Scene Regeneration

After the initial generation, users often want to tweak one or two scenes without re-running the entire pipeline. "Make scene 3 more dramatic" or "change the protagonist in scene 1."

The edit module handles this through a clone-and-replace strategy:

1. Clone the entire `generated_videos` and `video_scenes` to a new run
2. Send the existing scenes plus user feedback back to Gemini for targeted recomposition
3. Re-run only the modified scenes through image → video → assembly
4. Rebuild the voiceover script to merge unchanged and updated sections seamlessly

You get a new complete video that reflects targeted edits without starting from scratch. Each edit creates a clean generation run. No mutation of previous outputs.

---

## Error Handling: Tagged Errors, Not String Messages

Every error in the pipeline extends `Data.TaggedError`, giving each failure type a discriminant tag that the type system can reason about:

```typescript
class VideoAssemblyError extends Data.TaggedError(
  'VideoAssemblyError'
)<{
  message: string
  details?: unknown
}> {}
```

Error recovery is pattern-matched, not stringly-typed. I can catch a `VideoAssemblyError` and retry assembly without accidentally swallowing an `ImageGenerationError`. The orchestration layer uses `Effect.catchTag` to implement per-error-type recovery strategies.

When a scene fails, the error handler updates that scene's status in the database and continues processing other scenes. The pipeline produces partial results. Four successful scenes out of five is better than zero.

---

## What I'd Do Differently

Not everything is perfect. A few honest notes:

**The Monogo service is a single point of failure.** FFmpeg-based video assembly runs on a separate service rather than serverlessly. If it goes down, assembly fails for everyone. I'm exploring moving this to Lambda or a container-based approach.

**Duration calibration is still approximate.** Despite measuring WPS rates per voice, there's inherent variance in TTS output. The system handles mismatches with audio padding and truncation, but occasionally there's a noticeable gap between the last spoken word and the end of the video. Still learning the best way to handle this edge case.

**Veo and Sora are both accessed through fal.ai.** This gives me a unified interface but adds a proxy layer. For latency-sensitive production workloads, going direct might save 2-3 seconds per generation. Something I'm testing.

---

## The Stack

| Layer | Technology |
|-------|-----------|
| Orchestration | Effect-TS |
| Scene AI | Google Gemini (Vercel AI SDK) |
| Image generation | Flux via fal.ai |
| Video generation | Google Veo 3.1, OpenAI Sora-2 via fal.ai |
| Voice & music | ElevenLabs |
| Video assembly | FFmpeg (Monogo service) |
| Caption rendering | Remotion Lambda on AWS |
| Database | Drizzle ORM + PostgreSQL |
| Storage | Supabase |
| Framework | Next.js 16 (App Router) |

---

## Why This Matters

A plumber in Munich doesn't have a creative agency. A real estate agent in Lagos doesn't have a video production budget. A bakery owner in Mexico City doesn't have three weeks to wait for an ad.

What they do have is a phone, a product, and three minutes to spare.

The Scene Composer turns that into a broadcast-ready video ad with professional voiceover, scene-appropriate music, and animated captions. The kind of content that used to cost $5,000 and take two weeks.

I didn't build this because AI video generation is technically interesting (though it is). I built it because the alternative, paying an agency or learning After Effects, means most small businesses never advertise with video at all.

That's the real problem worth solving.

---

If you're interested in how I approach reliable distributed systems, check out [how we built a push notification system that actually doesn't lose messages](/blog/pushary-notification-pipeline) — similar patterns of orchestration and failure recovery, applied to real-time delivery.

---

*Built at [Glorya](https://glorya.ai). We're hiring engineers who think a bit more, preferably out of the 📦.*

---