PixVerse Platform Docs
  1. Getting started
PixVerse Platform Docs
  • Learn about PixVerse Platform
    • Introduce to PixVerse Platform
    • Quick Start
  • Getting started
    • How does the API work?
    • How to get API key?
    • How to check account balance?
    • How to subscribe API plans?
    • How to use Text-to-video
    • How to use Image-to-video
    • How to use Effects?
    • How to use Transition(First-last frame Feature)
    • How to use Speech(Lip sync)?
    • Lip sync TTS speaker list
    • How to use Extend?
    • How to get video generation status?
  • API Reference
    • API Parameter Description
    • Get user credit balance
      GET
    • Upload Image
      POST
    • Text-to-Video generation
      POST
    • Image-to-Video generation
      POST
    • Transition(First-last frame) generation
      POST
    • Upload Video&audio
      POST
    • Speech(Lipsync) generation
      POST
    • Get Speech(Lipsync) tts list
      GET
    • Extend generation
      POST
    • Get Video Generation Status
      GET
  • Trouble Shooting
    • Common errors and Solutions
    • Error codes
    • FAQ
  • Resources
    • Model & Pricing
    • Rate limit
    • Support
    • Change Logs
    • Term of Service
    • Privacy policy
  • PixVerse Tools
    • PixVerse MCP
  1. Getting started

How to use Speech(Lip sync)?

Overview#

The Speech (LipSync) endpoint is designed to solve voice synchronization issues in videos.
It analyzes both the audio and the speaker’s mouth movements in the video, matching them precisely. This makes your videos more expressive and engaging, adding storytelling depth.
Related API References:
Media Upload Task
Speech(Lipsync) Generation Task

Prerequisites#

Before you begin, make sure you have:
A valid PixVerse API key
A unique Ai-trace-id for each API request
An active subscription with available or purchased API credits
A Video either:
A video_id generated from PixVerse
or
An uploaded video in supported formats (mp4, mov)
Max resolution: 1920
Max file size: 50MB
Max duration: 30 seconds
A Audio either:
A script for using our built-in TTS service.
or
An audio file in supported formats (.mp3, .wav)
Max file size: 50MB
Max duration: 30 seconds

Step-by-Step Guide#

Step 1-1: Prepare Your Video from External video#

1.
External Video (User-Provided)
To ensure optimal results, please provide:
A .mp4 or .mov video file
Max resolution: 1920p
Max size: 50MB
Max duration: 30s
Construct your API request with the appropriate parameters:
you will get "media_id" with "video" media_type
{
    "ErrCode": 0,
    "ErrMsg": "success",
    "Resp": {
        "media_id": 0,
        "media_type": "video",
        "url": "https://media.pixverse.ai/111111.mp4"
    }
}

Step 1-2: Prepare Your Video from PixVerse API#

If you previously generated a video using our API, you should already have a video_id.
To extend this video, pass the video_id into the source_video_id field of the generation request.

Step 2 : Prepare Your Audio or Script#

You can either:
Upload a pre-recorded audio file (.mp3 or .wav, ≤ 30s, ≤ 50MB)
Use our TTS service with a provided script
The audio must be clear. Multiple languages and audio types are supported, including speech, singing, and advertisements.
for uploading audio file
Construct your API request with the appropriate parameters:
you will get "media_id" with "video" media_type
{
    "ErrCode": 0,
    "ErrMsg": "success",
    "Resp": {
        "media_id": 0,
        "media_type": "audio",
        "url": "https://media.pixverse.ai/111111.mp3"
    }
}
for TTS service
you can get tts list from API
Parameter NameRequiredTypeDescription
page_numoptionalinthow many pages you want to get
page_sizeoptionalinthow many datas on one page

Step 3: Send Speech(Lip Sync) API Request#

Parameter NameRequiredTypeDescription
source_video_idchoose either source_video_id or video_media_id, not both.intvideo from PixVerse API
video_media_idchoose either source_video_id or video_media_id, not bothintuploaded external video
audio_media_idchoose either audio_media_id or lip_sync_tts_speaker_id + lip_sync_tts_content, not bothintuploaded external audio
lip_sync_tts_speaker_idchoose either audio_media_id or lip_sync_tts_speaker_id + lip_sync_tts_content, not bothstringTTS speaker from tts speaker list
lip_sync_tts_contentchoose either audio_media_id or lip_sync_tts_speaker_id + lip_sync_tts_content, not bothstringTTS script ~200 characters (not UTF-8 Encoding)

Step 4 Handle the API Response#

The API returns a JSON response with a video_id:
{
  "ErrCode": 0,
  "ErrMsg": "success",
  "Resp": {
    "video_id": 0
  }
}

Step 5 Check Generation Status#

After creating the task, you will receive a video_id
Query periodically Get Video Generation Status API using this video_id
The status will change from 5 to 1 when processing is complete
  {
 "ErrCode": 0,
 "ErrMsg": "string",
 "Resp": {
   "create_time": "string",
   "id": 0,
   "modify_time": "string",
   "negative_prompt": "string",
   "outputHeight": 0,
   "outputWidth": 0,
   "prompt": "string",
   "resolution_ratio": 0,
   "seed": 0,
   "size": 0,
   "status": 5,
   "style": "string",
   "url": "string"
 }
}

Step 6 Download the Generated Video#

You can access a generated video with "url"
  {
 "ErrCode": 0,
 "ErrMsg": "string",
 "Resp": {
   "create_time": "string",
   "id": 0,
   "modify_time": "string",
   "negative_prompt": "string",
   "outputHeight": 0,
   "outputWidth": 0,
   "prompt": "string",
   "resolution_ratio": 0,
   "seed": 0,
   "size": 0,
   "status": 1,
   "style": "string",
   "url": "string"
 }
}

Trobule shooting#

Common issue#

1.
Your video is stuck in "Generating" status and hasn't completed after a long wait.
Please check if you're using the same AI-trace-ID for every request. This is the most common cause of this issue.
2.
Status codes: 1: Generation successful; 5: Waiting for generation; 7: Content moderation failure; 8: Generation failed;
If you encounter status code 7, it means your generated video was filtered by our content moderation system. Please modify your parameters and try again. Any credits used for filtered videos will be automatically refunded to your account.

Common error codes#

400/500 status : Incorrect code
400013 : Invalid binding request: incorrect parameter type or value
400017 : Invalid parameter
Either "audio_media_id" or "lip_sync_tts_speaker_id" + " llip_sync_tts_content" must be provided
couldn’t find a matching source_video_id. Please re-upload your video and try again.
couldn’t find a matching video_media_id. Please re-upload your video and try again.
Invalid media_type: not a video resource. Please check the ID and try again.
couldn’t find a matching audio_media_id. Please re-upload your video and try again.
Invalid media_type: not a audio resource. Please check the ID and try again.
The specified speaker ID is invalid or not supported.
TTS text must be within 200 characters.
TTS content is invalid or does not meet content guidelines
500044 : Reached the limit for concurrent generations.
Previous
How to use Transition(First-last frame Feature)
Next
Lip sync TTS speaker list