On this page

Find key moments

Identify the most compelling moments in a video using the Mux Robots API.

Identify the most compelling moments in a video. This workflow analyzes both audio and visual content to find segments that stand out for their hook strength, clarity, emotional intensity, novelty, or soundbite quality. It's useful for generating highlight reels, social media clips, or preview content. See the Find Key Moments API referenceAPI for the full endpoint specification. See Mux Robots pricing for unit costs.

Create a `find-key-moments` job

curl https://api.mux.com/robots/v0/jobs/find-key-moments \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "max_moments": 5
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

This request is asynchronous. The POST returns immediately with the job in pending status and does not include results. We strongly recommend listening for the robots.job.find_key_moments.completed webhook — the payload contains the full completed job, so no follow-up API call is needed. If webhooks aren't an option, you can poll GET /robots/v0/jobs/find-key-moments/{JOB_ID} with the id from the response until the status is completed.

Key moment extraction uses transcript cues from the asset to identify compelling segments. Make sure your asset has captions, either auto-generated or manually added, before creating a find-key-moments job.

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video to analyze.
`max_moments`	integer	Maximum number of key moments to extract (1-10). Defaults to 5.
`target_duration_ms`	object	Preferred highlight duration range in milliseconds. Both `min` and `max` are required when provided.
`target_duration_ms.min`	integer	Required. Preferred minimum highlight duration in milliseconds.
`target_duration_ms.max`	integer	Required. Preferred maximum highlight duration in milliseconds.
`output_steering`	object	Curated controls that guide moment selection, titles, audience, and concepts without changing the output schema. See Output steering.

Output steering

Use output_steering when you want best-effort control over which moments are selected and how they're described. These fields guide the workflow but do not guarantee exact output.

Field	Type	Description
`selection_strategy`	string	Preferred definition of a strong standalone moment. Supported values: `standalone_hooks`, `educational_takeaways`, `story_beats`, `product_moments`, and `speaker_highlights`.
`title_style`	string	Preferred style for generated moment titles. Supported values: `descriptive`, `punchy`, `educational`, and `social`.
`audience`	string	Intended audience used to guide moment selection and titles.
`brand_terms`	array of strings	Preferred brand or domain terms to use when supported by the source content.
`rubric_priorities`	array of strings	Up to 4 rubric dimensions used as tie-breakers after applying the selection strategy. Supported values: `clarity_in_isolation`, `emotional_intensity`, `novelty`, and `soundbite_quality`.
`topic_taxonomy`	object	Controlled vocabulary used to steer notable audible concepts without changing the response schema.
`topic_taxonomy.name`	string	Optional customer-facing name for the taxonomy.
`topic_taxonomy.values`	array	Controlled vocabulary values. Each value has a required `label` and optional `description` and `aliases`.
`topic_taxonomy.allow_other`	boolean	When `true`, non-taxonomy values may be used when no taxonomy value applies.

{
  "parameters": {
    "asset_id": "YOUR_ASSET_ID",
    "max_moments": 5,
    "output_steering": {
      "selection_strategy": "standalone_hooks",
      "title_style": "social",
      "audience": "developers scrolling a social feed",
      "brand_terms": ["Mux Video", "Mux Data"],
      "rubric_priorities": ["soundbite_quality", "emotional_intensity"],
      "topic_taxonomy": {
        "name": "Themes",
        "values": [
          {
            "label": "Video as data",
            "description": "Treating video content as structured, queryable information",
            "aliases": ["structured video", "queryable video"]
          },
          {
            "label": "Developer experience"
          }
        ],
        "allow_other": true
      }
    }
  }
}

Output

The outputs object is included in the job once its status is completed. You'll receive it on the robots.job.find_key_moments.completed webhook (recommended), or you can fetch it with GET /robots/v0/jobs/find-key-moments/{JOB_ID}. It contains:

Field	Type	Description
`moments`	array	Extracted key moments, ordered by position in the video.
`moments[].start_ms`	number	Moment start time in milliseconds.
`moments[].end_ms`	number	Moment end time in milliseconds.
`moments[].overall_score`	number	Weighted quality score (0.0-1.0) based on hook strength, clarity, emotional intensity, novelty, and soundbite quality.
`moments[].title`	string	Short catchy title for the moment (3-8 words).
`moments[].audible_narrative`	string	One-sentence summary of what is being said.
`moments[].notable_audible_concepts`	array	Key audible concepts (2-5 word phrases).
`moments[].visual_narrative`	string	One-sentence summary of what is visually happening. Present for video assets only.
`moments[].notable_visual_concepts`	array	Scored visual concepts extracted from sampled frames (video assets only). Each has `concept`, `score`, and `rationale`.
`moments[].cues`	array	Contiguous transcript segments with `start_ms`, `end_ms`, and `text`.

Example response

This is the payload delivered to the robots.job.find_key_moments.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-key-moments/{JOB_ID}:

{
  "data": {
    "id": "rjob_mno345",
    "workflow": "find-key-moments",
    "status": "completed",
    "units_consumed": 1,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "max_moments": 3
    },
    "outputs": {
      "moments": [
        {
          "start_ms": 12400,
          "end_ms": 28900,
          "overall_score": 0.92,
          "title": "The Future of Video Data",
          "audible_narrative": "The speaker explains how AI transforms video from passive content into structured, queryable data.",
          "notable_audible_concepts": ["video as data", "AI transformation", "structured information"],
          "visual_narrative": "The speaker gestures at a diagram showing video processing pipeline stages.",
          "notable_visual_concepts": [
            { "concept": "pipeline diagram", "score": 0.87, "rationale": "Directly illustrates the concept being discussed" }
          ],
          "cues": [
            { "start_ms": 12400, "end_ms": 16200, "text": "What's exciting is that video isn't just content anymore." },
            { "start_ms": 16200, "end_ms": 22100, "text": "Every video you upload is a dataset waiting to be queried." }
          ]
        }
      ]
    }
  }
}

On this page

Find key moments

Identify the most compelling moments in a video using the Mux Robots API.

Create a `find-key-moments` job

curl https://api.mux.com/robots/v0/jobs/find-key-moments \
  -H "Content-Type: application/json" \
  -X POST \
  -d '{
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "max_moments": 5
    }
  }' \
  -u ${MUX_TOKEN_ID}:${MUX_TOKEN_SECRET}

Parameters

Parameter	Type	Description
`asset_id`	string	Required. The Mux asset ID of the video to analyze.
`max_moments`	integer	Maximum number of key moments to extract (1-10). Defaults to 5.
`target_duration_ms`	object	Preferred highlight duration range in milliseconds. Both `min` and `max` are required when provided.
`target_duration_ms.min`	integer	Required. Preferred minimum highlight duration in milliseconds.
`target_duration_ms.max`	integer	Required. Preferred maximum highlight duration in milliseconds.
`output_steering`	object	Curated controls that guide moment selection, titles, audience, and concepts without changing the output schema. See Output steering.

Output steering

Use output_steering when you want best-effort control over which moments are selected and how they're described. These fields guide the workflow but do not guarantee exact output.

Field	Type	Description
`selection_strategy`	string	Preferred definition of a strong standalone moment. Supported values: `standalone_hooks`, `educational_takeaways`, `story_beats`, `product_moments`, and `speaker_highlights`.
`title_style`	string	Preferred style for generated moment titles. Supported values: `descriptive`, `punchy`, `educational`, and `social`.
`audience`	string	Intended audience used to guide moment selection and titles.
`brand_terms`	array of strings	Preferred brand or domain terms to use when supported by the source content.
`rubric_priorities`	array of strings	Up to 4 rubric dimensions used as tie-breakers after applying the selection strategy. Supported values: `clarity_in_isolation`, `emotional_intensity`, `novelty`, and `soundbite_quality`.
`topic_taxonomy`	object	Controlled vocabulary used to steer notable audible concepts without changing the response schema.
`topic_taxonomy.name`	string	Optional customer-facing name for the taxonomy.
`topic_taxonomy.values`	array	Controlled vocabulary values. Each value has a required `label` and optional `description` and `aliases`.
`topic_taxonomy.allow_other`	boolean	When `true`, non-taxonomy values may be used when no taxonomy value applies.

{
  "parameters": {
    "asset_id": "YOUR_ASSET_ID",
    "max_moments": 5,
    "output_steering": {
      "selection_strategy": "standalone_hooks",
      "title_style": "social",
      "audience": "developers scrolling a social feed",
      "brand_terms": ["Mux Video", "Mux Data"],
      "rubric_priorities": ["soundbite_quality", "emotional_intensity"],
      "topic_taxonomy": {
        "name": "Themes",
        "values": [
          {
            "label": "Video as data",
            "description": "Treating video content as structured, queryable information",
            "aliases": ["structured video", "queryable video"]
          },
          {
            "label": "Developer experience"
          }
        ],
        "allow_other": true
      }
    }
  }
}

Output

Field	Type	Description
`moments`	array	Extracted key moments, ordered by position in the video.
`moments[].start_ms`	number	Moment start time in milliseconds.
`moments[].end_ms`	number	Moment end time in milliseconds.
`moments[].overall_score`	number	Weighted quality score (0.0-1.0) based on hook strength, clarity, emotional intensity, novelty, and soundbite quality.
`moments[].title`	string	Short catchy title for the moment (3-8 words).
`moments[].audible_narrative`	string	One-sentence summary of what is being said.
`moments[].notable_audible_concepts`	array	Key audible concepts (2-5 word phrases).
`moments[].visual_narrative`	string	One-sentence summary of what is visually happening. Present for video assets only.
`moments[].notable_visual_concepts`	array	Scored visual concepts extracted from sampled frames (video assets only). Each has `concept`, `score`, and `rationale`.
`moments[].cues`	array	Contiguous transcript segments with `start_ms`, `end_ms`, and `text`.

Example response

This is the payload delivered to the robots.job.find_key_moments.completed webhook, and the same shape you get from GET /robots/v0/jobs/find-key-moments/{JOB_ID}:

{
  "data": {
    "id": "rjob_mno345",
    "workflow": "find-key-moments",
    "status": "completed",
    "units_consumed": 1,
    "parameters": {
      "asset_id": "YOUR_ASSET_ID",
      "max_moments": 3
    },
    "outputs": {
      "moments": [
        {
          "start_ms": 12400,
          "end_ms": 28900,
          "overall_score": 0.92,
          "title": "The Future of Video Data",
          "audible_narrative": "The speaker explains how AI transforms video from passive content into structured, queryable data.",
          "notable_audible_concepts": ["video as data", "AI transformation", "structured information"],
          "visual_narrative": "The speaker gestures at a diagram showing video processing pipeline stages.",
          "notable_visual_concepts": [
            { "concept": "pipeline diagram", "score": 0.87, "rationale": "Directly illustrates the concept being discussed" }
          ],
          "cues": [
            { "start_ms": 12400, "end_ms": 16200, "text": "What's exciting is that video isn't just content anymore." },
            { "start_ms": 16200, "end_ms": 22100, "text": "Every video you upload is a dataset waiting to be queried." }
          ]
        }
      ]
    }
  }
}