YouTube transcript API for developers — get transcripts in Python & JavaScript

Written by the VidWords Team · June 12, 2026 · Updated June 12, 2026 · Report a correction

The official YouTube API won't give you caption text for other people's videos. Here's what actually works, with copy-paste code for both languages.

The short answer

The official API won't do it. YouTube Data API v3's captions.download requires OAuth authorization and permission to edit the video, so there is no API-key-only endpoint that returns caption text for an arbitrary public video. captions.list tells you which tracks exist — not what they say.
Two routes remain. Open-source scraper libraries (free, but you maintain the integration, handle YouTube's blocking yourself, and get one video per call), or a hosted transcript API.
The hosted call. POST https://vidwords.com/api/transcripts with an Authorization: Basic token and up to 50 video ids per request returns, per video, a plain-text text field and timestamped segments of text/start/duration.
Failures are per video. A bad video comes back with an error field — no_transcript, video_unavailable, invalid_id — rather than failing the batch, and costs no credits. One delivered transcript costs one credit.

The problem: the official Data API doesn't do this

Every developer building on YouTube transcripts hits the same wall. The YouTube Data API v3 has a captions resource, so it looks like the obvious answer — until you read the official captions.download documentation: the method requires OAuth authorization and permission to edit the video. It exists so authorized creators and content owners can manage caption tracks, not as an API-key-only endpoint for arbitrary public videos.

In practice that means there is no official, API-key-only endpoint that returns the caption text of an arbitrary public video. captions.list will happily tell you that caption tracks exist and in which languages — it just won't give you the words. This surprises everyone the first time, usually after an afternoon of wiring up OAuth scopes that turn out not to help.

So if your app summarizes videos, indexes them for search, feeds them to an LLM, or quotes them, you need another route. There are two.

Option 1: open-source scraper libraries

Community libraries in Python and Node fetch the same caption data the YouTube player uses. They're free, well-documented, and great for prototypes and small personal projects.

The honest trade-offs show up in production:

You maintain the integration. These libraries depend on YouTube's internal, undocumented responses. When YouTube changes something, your pipeline breaks until the library ships a fix — or you patch it yourself.
You handle YouTube's blocking yourself. YouTube rate-limits and blocks automated caption fetching, especially from cloud datacenter IPs. A script that works fine on your laptop often starts failing once deployed to a server, and keeping it reliable at volume becomes its own engineering project.
No batching, no channel discovery. One video per call; if you want "every video on this channel," you build that layer too.

None of this is a criticism of the libraries — they're doing something YouTube doesn't officially support, impressively well. It's just unpaid infrastructure work that lands on you.

Option 2: a hosted transcript API

The VidWords API exists so that all of the above is somebody else's problem — we handle the reliability problems for you, and your side stays a single HTTPS call:

One POST, JSON out. Send video IDs or URLs to POST https://vidwords.com/api/transcripts, get structured transcripts back.
Batching built in — up to 50 videos per request.
Channel listings — resolve a channel or playlist to its video IDs, then fetch transcripts, without scraping anything yourself.
Stable contract. The response shape below is what you code against; whatever changes upstream is our job to absorb.

Authentication is an Authorization: Basic header carrying your API token (created from your account dashboard).

Python example

import requests

API_TOKEN = "YOUR_API_TOKEN"

resp = requests.post(
    "https://vidwords.com/api/transcripts",
    headers={"Authorization": f"Basic {API_TOKEN}"},
    json={"ids": ["dQw4w9WgXcQ", "9bZkp7q19f0"]},
    timeout=60,
)
resp.raise_for_status()

for video in resp.json()["results"]:
    if "error" in video:
        print(video["id"], "failed:", video["error"])
        continue
    print(video["title"])
    print(video["text"][:200], "...")        # full plain-text transcript
    first = video["segments"][0]              # timestamped segments
    print(f'[{first["start"]}s] {first["text"]}')

JavaScript example

const API_TOKEN = "YOUR_API_TOKEN";

const resp = await fetch("https://vidwords.com/api/transcripts", {
  method: "POST",
  headers: {
    "Authorization": `Basic ${API_TOKEN}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({ ids: ["dQw4w9WgXcQ", "9bZkp7q19f0"] }),
});
if (!resp.ok) throw new Error(`HTTP ${resp.status}`);

const { results } = await resp.json();
for (const video of results) {
  if (video.error) {
    console.warn(`${video.id} failed: ${video.error}`);
    continue;
  }
  console.log(video.title);
  console.log(video.text.slice(0, 200));            // plain text
  const [first] = video.segments;                    // timestamped
  console.log(`[${first.start}s] ${first.text}`);
}

The response shape

Each request returns one entry per video, in order, so a partial failure never sinks the whole batch:

{
  "results": [
    {
      "id": "dQw4w9WgXcQ",
      "title": "Video title",
      "segments": [
        { "text": "We're no strangers to love", "start": 18.6, "duration": 3.4 },
        ...
      ],
      "text": "Full transcript as one plain-text string..."
    }
  ]
}

segments gives you each caption line with its start time and duration in seconds — ideal for building search-with-jump-to-timestamp or generating subtitle files. text is the whole transcript joined into clean plain text, which is what you want for LLM input. (If that's your use case, the guide to summarizing YouTube videos with AI includes prompts that work well on transcript text.)

Error codes you should handle

Per-video failures arrive as an error field on that result, with a machine-readable code:

no_transcript — the video exists, but has no caption track in any language (creator disabled captions and YouTube generated none). Common enough that you should treat it as a normal case, not an exception.
video_unavailable — private, deleted, or region-blocked video.
invalid_id — the string you sent isn't a valid YouTube video ID or URL.

You're only charged credits for transcripts actually delivered — a no_transcript result costs nothing.

Rate limits and credits

Usage is metered in credits: one successful transcript costs one credit, and batching 50 videos into one request costs the same as 50 single requests — batch for latency, not for price. Per-minute rate limits and monthly credit allowances vary by plan; current numbers are on the pricing page, and every endpoint, parameter, and response field is documented in the full API reference.

If you just need a transcript right now without writing code, paste the link on the VidWords homepage — the transcript extraction guide walks through formats and exports — and the bulk tool handles playlists and channels from the browser.

FAQ

Do I need an API key from Google?

No. You only need a VidWords API token, since the Data API can't return caption text for arbitrary videos anyway.

Which languages can I fetch?

Any caption track the video actually has, author-uploaded or auto-generated. You can request a specific language; see the API reference for the parameter.

Can I try it before paying?

Yes — free accounts include monthly credits that work for API calls too, so you can integrate and test before picking a plan.

Read the API docs →