Identify the most compelling moments in a video using the Mux Robots API.
Identify the most compelling moments in a video. This workflow analyzes both audio and visual content to find segments that stand out for their hook strength, clarity, emotional intensity, novelty, or soundbite quality. It's useful for generating highlight reels, social media clips, or preview content. See the Find Key Moments API referenceAPI for the full endpoint specification.
find-key-moments jobcurl https://api.mux.com/robots/v0/jobs/find-key-moments \
-H "Content-Type: application/json" \
-X POST \
-d '{
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"max_moments": 5
}
}' \
-u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}Key moment extraction uses transcript cues from the asset to identify compelling segments. Make sure your asset has captions, either auto-generated or manually added, before creating a find-key-moments job.
| Parameter | Type | Description |
|---|---|---|
asset_id | string | Required. The Mux asset ID of the video to analyze. |
max_moments | integer | Maximum number of key moments to extract (1-10). Defaults to 5. |
target_duration_ms | object | Preferred highlight duration range in milliseconds. Both min and max are required when provided. |
target_duration_ms.min | integer | Required. Preferred minimum highlight duration in milliseconds. |
target_duration_ms.max | integer | Required. Preferred maximum highlight duration in milliseconds. |
When the job completes, the outputs object contains:
| Field | Type | Description |
|---|---|---|
moments | array | Extracted key moments, ordered by position in the video. |
moments[].start_ms | number | Moment start time in milliseconds. |
moments[].end_ms | number | Moment end time in milliseconds. |
moments[].overall_score | number | Weighted quality score (0.0-1.0) based on hook strength, clarity, emotional intensity, novelty, and soundbite quality. |
moments[].title | string | Short catchy title for the moment (3-8 words). |
moments[].audible_narrative | string | One-sentence summary of what is being said. |
moments[].notable_audible_concepts | array | Key audible concepts (2-5 word phrases). |
moments[].visual_narrative | string | One-sentence summary of what is visually happening. Present for video assets only. |
moments[].notable_visual_concepts | array | Scored visual concepts extracted from sampled frames (video assets only). Each has concept, score, and rationale. |
moments[].cues | array | Contiguous transcript segments with start_ms, end_ms, and text. |
{
"data": {
"id": "job_mno345",
"workflow": "find-key-moments",
"status": "completed",
"units_consumed": 1,
"parameters": {
"asset_id": "YOUR_ASSET_ID",
"max_moments": 3
},
"outputs": {
"moments": [
{
"start_ms": 12400,
"end_ms": 28900,
"overall_score": 0.92,
"title": "The Future of Video Data",
"audible_narrative": "The speaker explains how AI transforms video from passive content into structured, queryable data.",
"notable_audible_concepts": ["video as data", "AI transformation", "structured information"],
"visual_narrative": "The speaker gestures at a diagram showing video processing pipeline stages.",
"notable_visual_concepts": [
{ "concept": "pipeline diagram", "score": 0.87, "rationale": "Directly illustrates the concept being discussed" }
],
"cues": [
{ "start_ms": 12400, "end_ms": 16200, "text": "What's exciting is that video isn't just content anymore." },
{ "start_ms": 16200, "end_ms": 22100, "text": "Every video you upload is a dataset waiting to be queried." }
]
}
]
}
}
}