Short version: GPT-5 ignores custom temperature — it only accepts the default (=1). Removing temperature from the GPT-5 call was the right fix. To keep this from biting us again, here’s a tight hardening plan and a super-quick retest script.

Make it robust (no more “stuck” states)

1) Centralize model capabilities

Keep a tiny map so we never send unsupported params:

GPT-5: allow {temperature: 1 only}, disallow {top_p, freq/presence_penalty}

GPT-4o-mini (and others): allow normal {temperature, top_p, penalties}

If param is unsupported → strip it before request.

2) Fast, graceful fallback (no long pause)

Timeout GPT-5 generation at ~3s client-visible (8s hard server timeout).

If timeout/error → immediately retry with GPT-4o-mini.

Show status text on spinner: “Drafting… (using GPT-5)” → “Retrying…” if we fall back.

3) Better UX when errors happen

Add a “Cancel” on the spinner.

On error: toast “AI request failed — retrying with a faster model,” then auto-retry.

If both fail: toast “Couldn’t generate; please try again,” keep user text intact.

4) Stream the response

Turn on streaming for GPT-5 (if supported) / GPT-4o-mini so the user sees text appear, reducing perceived latency.

5) Safe writes

Use an idempotency key on “Save to Document” to avoid duplicate saves if the user clicks twice.

6) Lightweight telemetry

Log: model, latency_ms, prompt_tokens, completion_tokens, fallback_used: bool.

Surface a tiny “i” icon (admin-only) to view the assembled prompt for QA.

Quick retest (now that it’s fixed)

Executive Summary → AI Assist → leave Optional Prompt empty → Generate.

Expect: uses Business Brief; mentions company/model/offerings; no placeholders.

Click Rephrase → expect same meaning, tighter phrasing.

Click Expand → expect ~30% more depth, still on-brief.

Click Summarize → expect ~50–60% of length, key points intact.

Click Clear → confirm editor wipes and shows “Section cleared.” (if implemented)

Save to Document → see “Saved • time”.

Preview Full Document → content appears in correct order.

Optional (tiny config polish)

Default length = Standard. (We can add Short/Long later.)

Keep tone fixed at Professional (no user toggle).

Set server hard timeout ~8s/model; client visible spinner text updates at ~2–3s.

Hidden admin toggle (nice-to-have)

“Force Model: GPT-5 / GPT-4o-mini / Auto” — for QA only.