Long Context Prompting: Tips That Improve Results

Modern AI models can handle enormous inputs — Claude supports up to 200K tokens, and Gemini goes even further. But stuffing a long document into a prompt doesn’t guarantee a good response. How you structure that input matters as much as what you include.

Document First, Question Last

The single most impactful technique for long-context prompts: place your documents at the top and your question at the bottom.

<document>
{{LONG_DOCUMENT_TEXT}}
</document>

Based on the document above, what are the three main risks
identified in the financial analysis?

When your query comes after the context, the model has already “read” the full document before generating — significantly improving response quality.

Structure Multiple Documents

When working with several documents, label them clearly so the model can reference and distinguish them:

<documents>
  <document index="1">
    <source>Q3 Earnings Report</source>
    <content>{{EARNINGS_REPORT}}</content>
  </document>
  <document index="2">
    <source>Competitor Analysis</source>
    <content>{{COMPETITOR_ANALYSIS}}</content>
  </document>
</documents>

Compare revenue growth between our Q3 results and the
competitor analysis. Cite which document supports each claim.

Indexed documents let the model cite sources — and let you verify its claims.

Ground Responses in Quotes

For long documents, ask the model to quote relevant passages before answering. This forces it to locate evidence first, rather than generating from a vague impression of the content:

First, extract the exact quotes from the contract that relate
to termination clauses. Place them in <quotes> tags.

Then, based only on those quotes, summarize the termination
conditions in plain language.

Grounding dramatically reduces hallucination. If the model can’t find a relevant quote, that’s a useful signal — the answer might not be in the document.

Tips

Query at the end — always place your question or instructions after the document content
Label your sources — indexed, tagged documents let the model cite where each claim comes from
Ask for quotes first — grounding reduces hallucination and makes verification easy

When Context Isn’t Enough

Long context windows are powerful, but they have limits: knowledge goes stale, token costs add up, and not everything fits. When you hit these limits, you need a different approach — fetching only the relevant pieces at query time. That’s retrieval-augmented generation, up next.