Codex Skills for turning long videos into post-ready short-form clips. Hand your coding agent a video URL and ask for a Short — it cuts, reframes to 9:16, captions, and stamps a hook. Built as Codex Skills so the whole faceless-video pipeline runs inside your agent.
Skills are also compatible with Claude Code, Gemini CLI, and other agents that read the
.agents/skillsconvention — but the metadata is tuned for Codex.
Every "AI shorts generator" is a monolith you have to adopt wholesale. shortsmith is the opposite: small, composable skills your agent already knows how to chain. Ask in plain language —
"Take this YouTube link and make me a 25-second vertical short with captions."
— and Codex picks up clip-to-short, runs it, and hands you an MP4.
The ffmpeg graphs here aren't toy code: the crop, caption, and audio-ducking settings are lifted from a faceless-shorts pipeline that ships clips daily.
| Skill | What it does | Status |
|---|---|---|
| clip-to-short | URL/file → 9:16 highlight cut, burned captions, title card | ✅ ready |
| script-from-source | source/transcript → first-person POV narration script (JSON) | ✅ ready |
| voiceover | script → TTS audio (edge-tts / Kokoro) + aligned timing | ✅ ready |
| burn-captions | word-by-word center captions, locked readable style | ✅ ready |
| ttdy-metadata | finished clip → viral title + tags + description + music | ✅ ready |
| publish-checklist | pre-upload QA: aspect, duration, loudness, faststart | ✅ ready |
Six skills that chain into a full pipeline — URL → cut → script → VO → captions → metadata → QA → post-ready — and each link works on its own.
Ask your agent in plain language, or run it by hand:
S=.agents/skills
# 1) long video -> a 9:16 highlight cut
python3 $S/clip-to-short/scripts/clip_to_short.py "https://youtu.be/XXXX" \
--length 45 --no-captions -o cut.mp4
# 2) write a first-person POV script from the source (you/the agent author it),
# then sanity-check it
python3 $S/script-from-source/scripts/validate_script.py script.json
# 3) synthesize the voiceover (free edge-tts) + aligned per-line timing
python3 $S/voiceover/scripts/voiceover.py script.json -o narration.wav
# 4) burn word-by-word captions and mux the narration -> finished short
python3 $S/burn-captions/scripts/burn_captions.py cut.mp4 \
--script script.aligned.json --audio narration.wav -o final.mp4
# 5) QA it before posting (aspect / duration / loudness / faststart)
python3 $S/publish-checklist/scripts/publish_checklist.py final.mp4
# 6) write the title, tags, description + a trending-music pick (agent-driven)
# -> run the ttdy-metadata skill, or just say "ttdy" over final.mp4voiceover writes script.aligned.json with the real timing of every spoken
line; burn-captions reads that so the captions land exactly on the voice. The
shared script.json schema is what makes the skills compose.
Clone into a place Codex scans for skills (repo .agents/skills, or your user
dir ~/.agents/skills), then restart Codex:
git clone https://github.com/bitofacoder/shortsmith
cp -r shortsmith/.agents/skills/* ~/.agents/skills/
# restart Codex to loadOr, inside Codex, point the skill installer at the GitHub directory:
$skill-installer https://github.com/bitofacoder/shortsmith/tree/main/.agents/skills/clip-to-short
python3 .agents/skills/clip-to-short/scripts/clip_to_short.py \
"https://youtu.be/XXXX" --length 25 --title "He was wrong" -o short.mp4Output is a 1080×1920 H.264 MP4 with +faststart. Add --json for a
machine-readable result line your agent can parse.
| Tool | Needed for | Install |
|---|---|---|
ffmpeg + ffprobe |
everything | brew install ffmpeg |
yt-dlp |
URL sources | brew install yt-dlp |
whisper |
auto-captions | pip install openai-whisper |
Captions and the title card need an ffmpeg built with libass and libfreetype (most full builds have them; some minimal Homebrew bottles don't). If yours lacks them, shortsmith still renders the cut and tells you what it skipped — it never hard-fails on a missing filter.
-
clip-to-shortflagship -
script-from-source+voiceover+burn-captions(the composing chain) -
ttdy-metadataandpublish-checklist - a demo GIF in
examples/ - more source platforms, caption styles, and TTS engines
See CONTRIBUTING.md to add a skill. Issues and PRs welcome — especially new source platforms, caption styles, and TTS engines.
MIT. Each skill also ships its own LICENSE.txt.