· 9 min read
Rendi vs RunPod vs Vast.ai for FFmpeg at scale
Managed FFmpeg API or raw GPU on RunPod / Vast.ai? Failure modes, ops footprint, total cost, and when each is the right pick.
Two ways to run FFmpeg in the cloud at scale:
- Managed FFmpeg API. POST a job with an FFmpeg command, get an output URL. The platform owns the runtime, the scaling, the on-call.
- Raw GPU compute (RunPod, Vast.ai). Rent a box, build the pipeline. You own the runtime, the scaling, the on-call.
In production they’re different things — different failure modes, latency, ops footprint, reliability, total cost.
Failure modes on RunPod and Vast.ai
Build a Docker image with FFmpeg against NVIDIA Video Codec SDK 13. Spin up an RTX 4090 on Vast.ai. Your first NVENC job dies:
[h264_nvenc @ 0x...] Driver does not support the required nvenc API version
The host’s driver is older than your SDK’s floor. Vast.ai hosts run whatever driver the operator installed; you inherit it. Options: try another host, rebuild against an older SDK, or modprobe -r nvidia-uvm && modprobe nvidia-uvm on a rented box and hope (NVIDIA forum 1, forum 2).
The pattern the hourly rate hides shows up everywhere:
- “Why is my job pending for 90 seconds before it starts?” — cold-start tax
- “Why are no RTX 4090s available right now?” — marketplace inventory
- “Why did my interruptible instance disappear mid-job?” — bid auction eviction
- “Why is my image taking 8 minutes to pull?” — each new worker has to download your multi-GB container image before it can run anything
- “Where did my data go?” — 48-hour expiration deletion on Vast.ai
None of it appears on a pricing page.
How many videos an RTX 4090 can encode at once
NVIDIA caps consumer GeForce GPUs (RTX 4090 included) at 8 concurrent NVENC sessions per system, driver-enforced (NVENC App Note). An unofficial patch (keylase/nvidia-patch) removes the cap; neither RunPod nor Vast.ai publishes whether its RTX 4090 hosts ship stock or patched drivers.
The RTX 4090 chip has 3 NVENC hardware encoders built in, but NVIDIA only enables 2 on the consumer card. The workstation RTX 6000 Ada and datacenter L40 use the same chip with all 3 enabled (videocardz).
A single RTX 4090 runs 8 parallel encode pipes max. Scaling up means more cards, more cold starts, more orchestration. The escape hatch is datacenter cards (L4 / L40S / A100 / H100) without the session cap — at 3–10x the hourly rate, which wipes out the consumer-cost advantage.
FFmpeg speed on an RTX 4090
The “12 seconds for 60 seconds of 1080p” number comes from pure-encode benchmarks — decode, NVENC, write, no filters. NVIDIA’s throughput table puts H.264 P1 at 910 fps for 1080p — 1,440 frames in ~1.6 seconds (NVENC App Note).
Real pipelines aren’t pure encode. A community hevc_nvenc benchmark on an RTX 4090 clocked ~14.4 seconds for a 30-second clip (scottstuff.net) — closer to ~28 seconds for 60 seconds of source. Add drawtext, xfade, or amix and none of them has a CUDA variant (ffmpeg-filters.html); the pipeline forces hwupload/hwdownload round-trips per frame (HWAccelIntro).
Encode quality is worse, too. FFmpeg’s HWAccelIntro:
Hardware encoders typically generate output of significantly lower quality than good software encoders like x264… they require a higher bitrate to make output with the same perceptual quality, or they make output with a lower perceptual quality at the same bitrate.
Cold start latency and worker billing modes
RunPod Serverless markets sub-second cold starts via FlashBoot. Users report 40–70 seconds for Docker first-pulls, 1–2 minutes for fresh workers (worker-vllm #111, Fooocus-API #5). FlashBoot is a hot-cache; it doesn’t help a worker that hasn’t pulled your image yet.
RunPod’s own remedy: “Setting active workers > 0 can eliminate cold starts entirely” (RunPod docs). RunPod has two worker modes:
- Flex Workers scale to zero when there’s no traffic. Cheap when idle — but every new spike pays the cold-start tax.
- Active Workers stay running 24/7. No cold start, and the per-second rate is ~40% lower than Flex. But you pay for every idle hour: at the community-cloud RTX 4090 rate of ~$0.34/hr (gpuperhour), one worker running 24/7 is ~$245/month, whether it processes one job or a thousand.
Active is cheaper overall once your workers run more than ~25% of the month (RunPod blog). Below that, Flex is cheaper — but you’re eating 30–90 seconds of latency per spike to get there.
How reliable is RunPod and Vast.ai?
Vast.ai’s terms of service: “cannot guarantee … Services will be always available,” with no liability for outages (vast.ai/terms). Not an omission — the contract. Vast is a marketplace of third-party hosts; the default sort weight on their listings is price > internet speed > reliability > DLPerf (Vast docs). Cheapest hosts surface first. Instances expire 48 hours after non-renewal; data is gone. One independent review estimates unverified hosts cost 20–40% more after factoring downtime and restarts (gpunex).
RunPod is more curated but publishes no hard SLA on standard docs (Secure Cloud reached SOC 2 Type II in October 2025 — RunPod compliance). StatusGator has tracked 230+ outages across 41 components in 9 months since September 2025 (StatusGator). Recent incidents: a US-NC-1 power interruption in April 2026 with a failed UPS-to-generator switchover, a CA-MTL-3 network outage in May, multi-day DockerHub image-pull degradation through 8 June 2026 (uptime.runpod.io).
Rendi: 99.9% uptime on Pro, SOC 2 and a custom uptime SLA on Enterprise.
Setup and integration work per platform
On RunPod Serverless: write handler.py + Dockerfile, build with --platform linux/amd64, push to Docker Hub, paste the image URL into an endpoint, configure autoscaling, set up GPU-type fallback lists for inventory gaps (quickstart, endpoint configs). Then a polling client. Then retry logic, idempotency, dead-letter queue, observability. Mux on durable video workflows: “Implementing durable workflows correctly requires building custom orchestration with message queues, state machines, retry logic, and observability” (Mux engineering).
On Vast.ai: the same orchestration, plus driver-version probing per host, plus checkpointing every 30–60 minutes for eviction recovery, plus an SDK-driven lifecycle layer to spin hosts up and down.
On Rendi:
curl -X POST https://api.rendi.dev/v1/commands/run-ffmpeg-command \
-H "X-API-KEY: $KEY" \
-d '{
"input_files": {"in_1": "https://storage.rendi.dev/sample/big_buck_bunny_720p.mp4"},
"output_files": {"out_1": "result.mp4"},
"ffmpeg_command": "-i {{in_1}} -c:v libx264 -preset slow {{out_1}}"
}'
Async, webhook callback, output at a stable URL with FFprobe metadata (docs).
Six things you don’t buy on Rendi that you do on the alternatives, before the GPU bill:
- Container image, handler code, registry, endpoint config, autoscaler tuning
- Active-worker idle billing, or the cold-start latency you eat instead
- Queue, retry, idempotency, dead-letter, observability
- Storage and egress on both sides of every job (Vast.ai egress is host-set)
- Codec CVE patching, on-call, NVIDIA driver-version mismatches on rented hosts
- Engineer-hours on items 1–5
What Rendi includes that the alternatives don’t
- Security isolation.
libavcodechas a history of heap-corruption CVEs exploitable by crafted media files. On Rendi each command runs in a disposable sandbox; malicious inputs never touch your infrastructure. Self-host the same code and the exploit lands in your container. - Predictable pricing. Flat monthly tier. A retry storm, a traffic spike, or a host-markup change doesn’t move your bill.
- Storage and delivery included. Every output lands at a stable URL with FFprobe metadata, delivered to your webhook when the job completes — ready to use. On RunPod and Vast.ai you stand up your own bucket (S3, R2, GCS) and the delivery layer that puts results in it and notifies your app — extra infrastructure with its own pricing.
- Autoscaling inside the plan. No active-worker insurance, no min/max worker tuning. A quiet week plus a 10x burst costs your monthly processing quota, nothing more.
Pricing
Rendi’s price includes compute, storage, delivery, and ops. RunPod and Vast.ai prices are GPU rental alone — you still pay for storage, egress, the queue/retry layer you build, and the engineer running it.
For 100k × 1-min 1080p videos averaging ~60 MB processing per video (6 TB/month):
-
Rendi: $600/month flat (pricing) — Pro 96-vCPU tier (6 TB processing, output storage with stable URLs and webhook delivery, up to 24 concurrent 4-vCPU commands at peak, above the 13 the spike needs). Same bill regardless of traffic shape.
-
RunPod: ~$0.34/hr per RTX 4090 (gpuperhour) — total bill depends on traffic shape and latency tolerance. With the ~28s real-pipeline time per video from the speed section above, 100k videos = ~778 GPU-hours; spiky traffic compresses into 60 peak-hours/month with 13 parallel workers needed. GPU rental only — add storage, egress, queue/retry/observability infrastructure, and engineering time.
-
Vast.ai: ~$0.32/hr per RTX 4090 on-demand (vast.ai pricing) — total bill depends on traffic shape AND on you writing the start/stop scheduler (no native autoscaling). GPU rental only — add storage, host-set egress, queue/retry/observability infrastructure, and engineering time. No SLA.
| Configuration | Steady 24/7 | Spiky (2hr daily peak) |
|---|---|---|
| Rendi (all-in) | $600 | $600 |
| RunPod All-Active | ~$245 | ~$3,185 |
| RunPod All-Flex (60s+ cold-start per peak) | ~$265 | ~$265 |
| RunPod Hybrid (1 Active + Flex burst) | ~$245 | ~$490 |
| Vast.ai on-demand pod, always on | ~$230 | ~$2,995 |
| Vast.ai on-demand pod, you build the scheduler | ~$230 | ~$250 |
Where raw GPU wins
Three cases where Rendi is the wrong pick:
- Pure-encode pipelines at steady high volume. No filter graph, no spikes.
- ML pipelines that need a GPU for inference anyway. Background removal, super-resolution, depth maps. The GPU is paid for; FFmpeg is incidental.
- GPU encode speedups. Rendi runs CPU-only on AMD silicon. A GPU self-host gets ~30% on heavy encodes — usually dwarfed by ops cost.
When to pick what
| Workload | Pick |
|---|---|
| MVP, validating an idea | Rendi |
| Serverless backend (Vercel / Supabase / Workers) | Rendi |
| Spiky load, quiet weeks then sudden bursts | Rendi |
| Production pipeline, on-call has to be someone else | Rendi |
| 24/7 high-volume pure-encode + platform team | RunPod (data-center cards) |
| Research / experiments, you’ll babysit the box | Vast.ai interruptible |
| GPU already needed for ML inference | Either raw-GPU option |