RunPod vs Vast.ai vs Rendi for FFmpeg at scale

Two ways to run FFmpeg in the cloud at scale:

Managed FFmpeg API. POST a job with an FFmpeg command, get an output URL. The platform owns the runtime, the scaling, the on-call.
Raw GPU compute (RunPod, Vast.ai). Rent a box, build the pipeline. You own the runtime, the scaling, the on-call.

In production they’re different things — different failure modes, latency, ops footprint, reliability, total cost.

Failure modes on RunPod and Vast.ai

Build a Docker image with FFmpeg against NVIDIA Video Codec SDK 13. Spin up an RTX 4090 on Vast.ai. Your first NVENC job dies:

[h264_nvenc @ 0x...] Driver does not support the required nvenc API version

The host’s driver is older than your SDK’s floor. Vast.ai hosts run whatever driver the operator installed; you inherit it. Options: try another host, rebuild against an older SDK, or modprobe -r nvidia-uvm && modprobe nvidia-uvm on a rented box and hope (NVIDIA forum 1, forum 2).

The pattern the hourly rate hides shows up everywhere:

“Why is my job pending for 90 seconds before it starts?” — cold-start tax
“Why are no RTX 4090s available right now?” — marketplace inventory
“Why did my interruptible instance disappear mid-job?” — bid auction eviction
“Why is my image taking 8 minutes to pull?” — each new worker has to download your multi-GB container image before it can run anything
“Where did my data go?” — 48-hour expiration deletion on Vast.ai

None of it appears on a pricing page.

How many videos an RTX 4090 can encode at once

NVIDIA caps consumer GeForce GPUs (RTX 4090 included) at 8 concurrent NVENC sessions per system, driver-enforced (NVENC App Note). An unofficial patch (keylase/nvidia-patch) removes the cap; neither RunPod nor Vast.ai publishes whether its RTX 4090 hosts ship stock or patched drivers.

The RTX 4090 chip has 3 NVENC hardware encoders built in, but NVIDIA only enables 2 on the consumer card. The workstation RTX 6000 Ada and datacenter L40 use the same chip with all 3 enabled (videocardz).

A single RTX 4090 runs 8 parallel encode pipes max. Scaling up means more cards, more cold starts, more orchestration. The escape hatch is datacenter cards (L4 / L40S / A100 / H100) without the session cap — at 3–10x the hourly rate, which wipes out the consumer-cost advantage.

FFmpeg speed on an RTX 4090

The “12 seconds for 60 seconds of 1080p” number comes from pure-encode benchmarks — decode, NVENC, write, no filters. NVIDIA’s throughput table puts H.264 P1 at 910 fps for 1080p — 1,440 frames in ~1.6 seconds (NVENC App Note).

Real pipelines aren’t pure encode. A community hevc_nvenc benchmark on an RTX 4090 clocked ~14.4 seconds for a 30-second clip (scottstuff.net) — closer to ~28 seconds for 60 seconds of source. Add drawtext, xfade, or amix and none of them has a CUDA variant (ffmpeg-filters.html); the pipeline forces hwupload/hwdownload round-trips per frame (HWAccelIntro).

Encode quality is worse, too. FFmpeg’s HWAccelIntro:

Hardware encoders typically generate output of significantly lower quality than good software encoders like x264… they require a higher bitrate to make output with the same perceptual quality, or they make output with a lower perceptual quality at the same bitrate.

Cold start latency and worker billing modes

RunPod Serverless markets sub-second cold starts via FlashBoot. Users report 40–70 seconds for Docker first-pulls, 1–2 minutes for fresh workers (worker-vllm #111, Fooocus-API #5). FlashBoot is a hot-cache; it doesn’t help a worker that hasn’t pulled your image yet.

RunPod’s own remedy: “Setting active workers > 0 can eliminate cold starts entirely” (RunPod docs). RunPod has two worker modes:

Flex Workers scale to zero when there’s no traffic. Cheap when idle — but every new spike pays the cold-start tax.
Active Workers stay running 24/7. No cold start, and the per-second rate is ~40% lower than Flex. But you pay for every idle hour: at the community-cloud RTX 4090 rate of ~$0.34/hr (gpuperhour), one worker running 24/7 is ~$245/month, whether it processes one job or a thousand.

Active is cheaper overall once your workers run more than ~25% of the month (RunPod blog). Below that, Flex is cheaper — but you’re eating 30–90 seconds of latency per spike to get there.

How reliable is RunPod and Vast.ai?

Vast.ai’s terms of service: “cannot guarantee … Services will be always available,” with no liability for outages (vast.ai/terms). Not an omission — the contract. Vast is a marketplace of third-party hosts; the default sort weight on their listings is price > internet speed > reliability > DLPerf (Vast docs). Cheapest hosts surface first. Instances expire 48 hours after non-renewal; data is gone. One independent review estimates unverified hosts cost 20–40% more after factoring downtime and restarts (gpunex).

RunPod is more curated but publishes no hard SLA on standard docs (Secure Cloud reached SOC 2 Type II in October 2025 — RunPod compliance). StatusGator has tracked 230+ outages across 41 components in 9 months since September 2025 (StatusGator). Recent incidents: a US-NC-1 power interruption in April 2026 with a failed UPS-to-generator switchover, a CA-MTL-3 network outage in May, multi-day DockerHub image-pull degradation through 8 June 2026 (uptime.runpod.io).

Rendi: 99.9% uptime on Pro, SOC 2 and a custom uptime SLA on Enterprise.

Setup and integration work per platform

On RunPod Serverless: write handler.py + Dockerfile, build with --platform linux/amd64, push to Docker Hub, paste the image URL into an endpoint, configure autoscaling, set up GPU-type fallback lists for inventory gaps (quickstart, endpoint configs). Then a polling client. Then retry logic, idempotency, dead-letter queue, observability. Mux on durable video workflows: “Implementing durable workflows correctly requires building custom orchestration with message queues, state machines, retry logic, and observability” (Mux engineering).

On Vast.ai: the same orchestration, plus driver-version probing per host, plus checkpointing every 30–60 minutes for eviction recovery, plus an SDK-driven lifecycle layer to spin hosts up and down.

On Rendi:

curl -X POST https://api.rendi.dev/v1/commands/run-ffmpeg-command \
  -H "X-API-KEY: $KEY" \
  -d '{
    "input_files":  {"in_1": "https://storage.rendi.dev/sample/big_buck_bunny_720p.mp4"},
    "output_files": {"out_1": "result.mp4"},
    "ffmpeg_command": "-i {{in_1}} -c:v libx264 -preset slow {{out_1}}"
  }'

Async, webhook callback, output at a stable URL with FFprobe metadata (docs).

Six things you don’t buy on Rendi that you do on the alternatives, before the GPU bill:

Container image, handler code, registry, endpoint config, autoscaler tuning
Active-worker idle billing, or the cold-start latency you eat instead
Queue, retry, idempotency, dead-letter, observability
Storage and egress on both sides of every job (Vast.ai egress is host-set)
Codec CVE patching, on-call, NVIDIA driver-version mismatches on rented hosts
Engineer-hours on items 1–5

What Rendi includes that the alternatives don’t

Security isolation. libavcodec has a history of heap-corruption CVEs exploitable by crafted media files. On Rendi each command runs in a disposable sandbox; malicious inputs never touch your infrastructure. Self-host the same code and the exploit lands in your container.
Predictable pricing. Flat monthly tier. A retry storm, a traffic spike, or a host-markup change doesn’t move your bill.
Storage and delivery included. Every output lands at a stable URL with FFprobe metadata, delivered to your webhook when the job completes — ready to use. On RunPod and Vast.ai you stand up your own bucket (S3, R2, GCS) and the delivery layer that puts results in it and notifies your app — extra infrastructure with its own pricing.
Autoscaling inside the plan. No active-worker insurance, no min/max worker tuning. A quiet week plus a 10x burst costs your monthly processing quota, nothing more.

Pricing

Rendi’s price includes compute, storage, delivery, and ops. RunPod and Vast.ai prices are GPU rental alone — you still pay for storage, egress, the queue/retry layer you build, and the engineer running it.

For 100k × 1-min 1080p videos averaging ~60 MB processing per video (6 TB/month):

Rendi: $600/month flat (pricing) — Pro 96-vCPU tier (6 TB processing, output storage with stable URLs and webhook delivery, up to 24 concurrent 4-vCPU commands at peak, above the 13 the spike needs). Same bill regardless of traffic shape.
RunPod: ~$0.34/hr per RTX 4090 (gpuperhour) — total bill depends on traffic shape and latency tolerance. With the ~28s real-pipeline time per video from the speed section above, 100k videos = ~778 GPU-hours; spiky traffic compresses into 60 peak-hours/month with 13 parallel workers needed. GPU rental only — add storage, egress, queue/retry/observability infrastructure, and engineering time.
Vast.ai: ~$0.32/hr per RTX 4090 on-demand (vast.ai pricing) — total bill depends on traffic shape AND on you writing the start/stop scheduler (no native autoscaling). GPU rental only — add storage, host-set egress, queue/retry/observability infrastructure, and engineering time. No SLA.

Configuration	Steady 24/7	Spiky (2hr daily peak)
Rendi (all-in)	$600	$600
RunPod All-Active	~$245	~$3,185
RunPod All-Flex (60s+ cold-start per peak)	~$265	~$265
RunPod Hybrid (1 Active + Flex burst)	~$245	~$490
Vast.ai on-demand pod, always on	~$230	~$2,995
Vast.ai on-demand pod, you build the scheduler	~$230	~$250

Where raw GPU wins

Three cases where Rendi is the wrong pick:

Pure-encode pipelines at steady high volume. No filter graph, no spikes.
ML pipelines that need a GPU for inference anyway. Background removal, super-resolution, depth maps. The GPU is paid for; FFmpeg is incidental.
GPU encode speedups. Rendi runs CPU-only on AMD silicon. A GPU self-host gets ~30% on heavy encodes — usually dwarfed by ops cost.

When to pick what

Workload	Pick
MVP, validating an idea	Rendi
Serverless backend (Vercel / Supabase / Workers)	Rendi
Spiky load, quiet weeks then sudden bursts	Rendi
Production pipeline, on-call has to be someone else	Rendi
24/7 high-volume pure-encode + platform team	RunPod (data-center cards)
Research / experiments, you’ll babysit the box	Vast.ai interruptible
GPU already needed for ML inference	Either raw-GPU option