How to Debug AI API Responses (JSON Formatting, Testing & Validation)

· Javed Iqbal · 9 min read

The first time an LLM took down a feature I built, it wasn’t a crash. The model returned "servings": "four" instead of 4, my code did math on it, and everything downstream quietly turned into NaN. No error, no alert — just wrong numbers on a dashboard until someone noticed.

That’s the thing nobody warns you about when you start building on GPT, Claude, or Gemini: the request is the easy part. The response is where you’ll spend your time. “Give me JSON” and “give me valid, correctly-typed JSON every single time” are very different promises, and the model only sort of keeps the first one.

Here’s the workflow I’ve settled on after shipping a few of these. It’s three moves: see it, test it, validate it.

Why AI JSON breaks (it’s always the same few things)

Once you’ve debugged enough of these, the failure modes start to rhyme:

  • The markdown fence. The model wraps your JSON in ```json ... ``` and now JSON.parse throws on the backticks.
  • The friendly preamble. “Sure! Here’s the JSON you asked for:” — followed by the JSON, followed by “Let me know if you’d like changes!”
  • Type drift. A number comes back as a string, a boolean as "true", a date in three different formats across three calls.
  • Missing vs null. A field you required is just… not there. Or it’s null. Or it’s an empty string. All three break different things.
  • Hallucinated fields & enum drift. You asked for status: "active" | "closed" and got "Active." with a capital A and a period.
  • Truncation. You hit the token limit mid-object and the JSON just stops. Unexpected end of JSON input.

None of these are exotic. They’re the everyday reality of probabilistic output, which is exactly why you can’t treat an LLM like a normal API that returns a contract.

Step 1 — See the response clearly

Before you write a single line of parsing logic, look at the actual bytes the model sent back. Half my “bugs” here turned out to be a stray fence or a trailing sentence I couldn’t see in a console log.

Two things make this fast. First, hit the endpoint directly so you’re looking at the raw payload, not your app’s mangled version of it — I keep the API tester open in a tab for exactly this: set the headers and body, send, and read the real status code and response. Second, take whatever JSON you extracted and drop it into the JSON viewer — it’ll either render a clean tree (so you can eyeball the shape and spot the field that’s the wrong type) or tell you exactly where it stopped being valid JSON. That second case is the giveaway for truncation and stray prose.

It sounds basic. It is. But “actually look at the response” saves more debugging time than any clever code, and it’s the step people skip.

Step 2 — Get to the JSON

If you can’t use the provider’s strict JSON mode (more on that below), you’ll need to strip the wrapper the model insists on adding. A small extractor handles 95% of cases:

// Pulls JSON out of a fenced or chatty response.
function extractJson(raw: string): unknown {
  const fenced = raw.match(/```(?:json)?\s*([\s\S]*?)```/i);
  const candidate = (fenced ? fenced[1] : raw).trim();
  return JSON.parse(candidate);
}

If JSON.parse still throws, paste the raw text into the JSON viewer to find the bad character — it’s usually a smart quote, an unescaped newline, or the response getting cut off. (And if someone hands you a JavaScript object rather than JSON, the JS-to-JSON converter fixes the unquoted keys and trailing commas.)

Step 3 — Validate, don’t trust

This is the part that actually matters in 2026, and it’s where Zod (TypeScript) and Pydantic (Python) earn their keep. You define the shape you expect, and you refuse to let anything else into your app. Parsing tells you it’s JSON; validation tells you it’s the right JSON.

TypeScript with Zod:

import { z } from "zod";

const Recipe = z.object({
  title: z.string().min(1),
  // coerce handles the classic "4" instead of 4
  servings: z.coerce.number().int().positive(),
  ingredients: z.array(z.string()).min(1),
  status: z.enum(["active", "closed"]),
});

const result = Recipe.safeParse(extractJson(raw));

if (!result.success) {
  // exactly which field failed, and why
  console.error(result.error.issues);
  // ...retry, repair, or fall back — but never ship the bad data
} else {
  doStuff(result.data); // fully typed and trusted from here
}

The z.coerce trick alone kills a whole category of type-drift bugs. And error.issues gives you a precise, machine-readable list of what was wrong — which becomes useful in a second.

Python with Pydantic v2:

from pydantic import BaseModel, ValidationError
from typing import Literal

class Recipe(BaseModel):
    title: str
    servings: int          # Pydantic coerces "4" -> 4 for you
    ingredients: list[str]
    status: Literal["active", "closed"]

try:
    recipe = Recipe.model_validate(data)   # data = parsed dict
except ValidationError as e:
    print(e.errors())   # structured: field, message, the bad input
    # retry / repair / fall back

Same idea, different language. The schema is your contract, the validator enforces it, and your app only ever sees data that already passed. If you take one thing from this post: a typed schema between the model and your business logic is non-negotiable now.

Step 4 — When it fails, repair instead of crashing

Validation failing isn’t the end — it’s information. The move that’s saved me the most: feed the validation error back to the model and ask it to fix its own output.

// Pseudo-retry: hand the model its own mistakes.
"Your previous response failed validation with these errors:
" + JSON.stringify(result.error.issues) + "
Return ONLY corrected JSON that matches the schema. No prose, no code fences."

One repair round fixes the large majority of failures. Cap it at one or two retries so a stubborn response doesn’t loop forever, and have a fallback for when it just won’t comply.

A note on “structured outputs” / JSON mode

Most providers now offer a way to constrain output to a schema — structured outputs, JSON mode, response schemas, tool/function calling. Use them. They dramatically cut the fenced-prose and broken-JSON problems at the source, and they’re the right first line of defense.

But — and this is the honest caveat — don’t let that make you skip Step 3. Constrained generation reduces breakage, it doesn’t guarantee semantics (a schema can say “string” while the value is nonsense), behavior varies between providers, and the day you swap models you’ll be glad your Zod/Pydantic layer is still standing there. Constrain at generation, validate at the boundary. Belt and suspenders.

The whole loop, in practice

  1. Hit the endpoint in the API tester, read the raw response and status.
  2. Drop the body into the JSON viewer to confirm it’s valid and the right shape.
  3. Extract the JSON, then run it through your Zod/Pydantic schema.
  4. On failure, log issues/errors, retry once with the error fed back, then fall back.

Two minutes of looking at the real response beats an hour of guessing why your app gotNaN. Ask me how I know.

FAQ

Why is my AI API returning invalid JSON?

Usually markdown code fences, a chatty preamble, a value with the wrong type, or a response truncated by the token limit. Paste it into the JSON viewer to see exactly where it breaks.

Do I still need Zod/Pydantic if I use JSON mode?

Yes. Structured outputs reduce malformed JSON, but they don’t guarantee correct values or protect you when you switch providers. Validate at your app’s boundary regardless.

What’s the fastest way to inspect a model’s response?

Send the request in a browser API tester and view the body in a JSON viewer — no scripts, nothing uploaded, runs locally.

Debugging one right now? Open the API tester and JSON viewer side by side — or see the other free browser tools I use daily.