The official YouTube API won't give you caption text for other people's videos. Here's what actually works, with copy-paste code for both languages.
Every developer building on YouTube transcripts hits the same wall. The YouTube Data API v3 has a captions resource, so it looks like the obvious answer — until you read the fine print on captions.download: it requires OAuth authorization from the video's owner. It exists so creators can manage captions on their own channels, not so third parties can read them.
In practice that means there is no official, API-key-only endpoint that returns the caption text of an arbitrary public video. captions.list will happily tell you that caption tracks exist and in which languages — it just won't give you the words. This surprises everyone the first time, usually after an afternoon of wiring up OAuth scopes that turn out not to help.
So if your app summarizes videos, indexes them for search, feeds them to an LLM, or quotes them, you need another route. There are two.
Community libraries in Python and Node fetch the same caption data the YouTube player uses. They're free, well-documented, and great for prototypes and small personal projects.
The honest trade-offs show up in production:
None of this is a criticism of the libraries — they're doing something YouTube doesn't officially support, impressively well. It's just unpaid infrastructure work that lands on you.
The VidWords API exists so that all of the above is somebody else's problem — we handle the reliability problems for you, and your side stays a single HTTPS call:
POST https://vidwords.com/api/transcripts, get structured transcripts back.Authentication is an Authorization: Basic header carrying your API token (created from your account dashboard).
import requests
API_TOKEN = "YOUR_API_TOKEN"
resp = requests.post(
"https://vidwords.com/api/transcripts",
headers={"Authorization": f"Basic {API_TOKEN}"},
json={"ids": ["dQw4w9WgXcQ", "9bZkp7q19f0"]},
timeout=60,
)
resp.raise_for_status()
for video in resp.json()["results"]:
if "error" in video:
print(video["id"], "failed:", video["error"])
continue
print(video["title"])
print(video["text"][:200], "...") # full plain-text transcript
first = video["segments"][0] # timestamped segments
print(f'[{first["start"]}s] {first["text"]}')
const API_TOKEN = "YOUR_API_TOKEN";
const resp = await fetch("https://vidwords.com/api/transcripts", {
method: "POST",
headers: {
"Authorization": `Basic ${API_TOKEN}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ ids: ["dQw4w9WgXcQ", "9bZkp7q19f0"] }),
});
if (!resp.ok) throw new Error(`HTTP ${resp.status}`);
const { results } = await resp.json();
for (const video of results) {
if (video.error) {
console.warn(`${video.id} failed: ${video.error}`);
continue;
}
console.log(video.title);
console.log(video.text.slice(0, 200)); // plain text
const [first] = video.segments; // timestamped
console.log(`[${first.start}s] ${first.text}`);
}
Each request returns one entry per video, in order, so a partial failure never sinks the whole batch:
{
"results": [
{
"id": "dQw4w9WgXcQ",
"title": "Video title",
"segments": [
{ "text": "We're no strangers to love", "start": 18.6, "duration": 3.4 },
...
],
"text": "Full transcript as one plain-text string..."
}
]
}
segments gives you each caption line with its start time and duration in seconds — ideal for building search-with-jump-to-timestamp or generating subtitle files. text is the whole transcript joined into clean plain text, which is what you want for LLM input. (If that's your use case, the guide to summarizing YouTube videos with AI includes prompts that work well on transcript text.)
Per-video failures arrive as an error field on that result, with a machine-readable code:
no_transcript — the video exists, but has no caption track in any language (creator disabled captions and YouTube generated none). Common enough that you should treat it as a normal case, not an exception.video_unavailable — private, deleted, or region-blocked video.invalid_id — the string you sent isn't a valid YouTube video ID or URL.You're only charged credits for transcripts actually delivered — a no_transcript result costs nothing.
Usage is metered in credits: one successful transcript costs one credit, and batching 50 videos into one request costs the same as 50 single requests — batch for latency, not for price. Per-minute rate limits and monthly credit allowances vary by plan; current numbers are on the pricing page, and every endpoint, parameter, and response field is documented in the full API reference.
If you just need a transcript right now without writing code, paste the link on the VidWords homepage — the transcript extraction guide walks through formats and exports — and the bulk tool handles playlists and channels from the browser.
No. You only need a VidWords API token, since the Data API can't return caption text for arbitrary videos anyway.
Any caption track the video actually has, author-uploaded or auto-generated. You can request a specific language; see the API reference for the parameter.
Yes — free accounts include monthly credits that work for API calls too, so you can integrate and test before picking a plan.