Alibaba HappyHorse 1.0 — Native talking-video AI with built-in lip sync — 720p and 1080p tiers.

Alibaba · From 34 credits / 3s on Dahab Studio.

Alibaba HappyHorse 1.0 is a talking-video model — it generates a person speaking your script with synchronised lip movement in a single pass. No separate TTS or lipsync step required. HappyHorse comes in two tiers on Dahab Studio: 720p ($0.14/s) for everyday social posts and 1080p ($0.28/s) for premium ads. Both tiers handle 3–15 second clips and 9 aspect ratios.

Specs

  • Max duration: 15s
  • Resolution: 720p or 1080p (separate tiers)
  • Aspect ratios: 16:9, 9:16, 1:1, 4:3, 3:4
  • Native audio: Yes
  • Multi-reference: No
  • Pricing: 34 cr / 3s, 111 cr / 15s

Use cases

  • Avatar talking videos: Upload a portrait, write a script, and HappyHorse renders the person speaking with realistic lip movement. Single-pass — no external TTS or lipsync.
  • Customer testimonials: Generate documentary-style customer-style spots without booking real talent. The 720p tier keeps cost low for high-volume A/B testing.
  • Premium product spokespeople: Switch to the 1080p tier when client deliverables need broadcast-grade resolution. Same model, sharper output.
  • Localized ads in 9 dialects: Pair HappyHorse video with Dahab's Egyptian-Arabic ElevenLabs TTS or Sama (9 other dialects) for fully native voice + lip sync.

HappyHorse vs alternatives

  • vs Synthesia: Pay-per-second instead of per-seat. No need to commit to a $30/month subscription for a one-off video. Synthesia wins on avatar library breadth.
  • vs HeyGen: Direct prompt-to-video without uploading a custom avatar. HeyGen requires avatar setup; HappyHorse renders from one portrait.
  • vs D-ID: Higher resolution ceiling (1080p tier) and more aspect ratios (5 vs 2). D-ID wins on real-time streaming.

Frequently asked questions

How is HappyHorse different from regular image-to-video?
Standard i2v animates a still image with motion only. HappyHorse renders a person speaking your script with lip sync in the same pass — saves a separate TTS + lipsync step that other pipelines need.
What's the difference between the 720p and 1080p tiers?
Same model, different output resolution. 720p costs $0.14/second (34 cr/3s, 111 cr/10s); 1080p costs $0.28/second (67 cr/3s, 222 cr/10s). Use 720p for iteration, 1080p for client final.
Can I use Arabic dialogue?
Yes. HappyHorse handles Arabic speech natively in pro mode for English-style speech, and Dahab's Talking Head tool layers Egyptian-dialect ElevenLabs TTS or Sama (9 other Arabic dialects) for fully native Arabic voice + lip sync.
How long can the video be?
3 to 15 seconds, with curated chip durations of 3, 5, 8, 10, 12, 15. The full enum [3..15] is available via the API.
What aspect ratios does HappyHorse support?
16:9, 9:16, 1:1, 4:3, and 3:4 — covering everything from horizontal YouTube to vertical Reels and square posts.

Related models

  • Google Veo 3.1 Fast — Google's fast-tier Veo 3.1 — 1080p with native synchronised audio.
  • Kling V2.6 — Kling's latest text-to-video and image-to-video at 1080p with audio.
  • Grok Imagine Video — xAI's native-audio video model — fast, cheap, and 1080p out of the box.

Generate with HappyHorse →

← All AI video models on Dahab Studio