LibreChat FROM ZERO to Presets

AI models like GPT and Claude generate responses based on parameters—configurable settings that balance creativity, precision, and length, instructions and context. Presets in LibreChat are saved setups with a specific combination of model + parameters + instructions so you can reuse it instantly.

With presets you can create different “profiles” tailored to your workflows. For example, a marketer might have one profile for writing newsletters (friendly tone, higher creativity, longer outputs) and another for paid ads (direct tone, high precision, strict character limits). A designer could use one profile for generating creative briefs (multiple directions, rich language) and another for documenting a design system (concise, technical wording, clearly structured tables).

This guide explains the key parameters and how to use presets effectively.

What is LibreChat (and who uses it)?

LibreChat is an open-source chat interface for working with large language models (LLMs). You can run it yourself (commonly via Docker) and connect it to one or more LLM providers depending on your configuration or you can find a Librechat a demo at HuggingFace spaces.

People use LibreChat to:

centralize AI chat in one UI
reuse repeatable workflows (via presets)
switch quickly between models/configurations
standardize output formats (great for teams, but useful solo too)

Who uses it: developers, support teams, marketers, writers, researchers—anyone who wants consistent results without reconfiguring every chat.

Install / try it:

LibreChat project + setup instructions: https://github.com/danny-avila/LibreChat
Find a demo at HuggingFace spaces.

What are models (101)?

An AI model is the “AI brain”: a combination of natural language processing and machine learning algorithms trained on large amounts of text. When you interact with LibreChat, you’re interacting with the output of a model.

Examples include: GPT (OpenAI), Claude (Anthropic), Gemini (Google), Grok (xAI), Llama (Meta), and others. Different models can be better suited to different tasks, and benchmarking sites (e.g., community leaderboards) can help you get a feel for relative strengths.

In LibreChat, you typically choose a model from a model selector in the UI.

Model selector in Librechat UI

What are model parameters?

Model parameters control how a model generates output—how random or creative it is, how long responses can be, and (for some models) how much “internal reasoning” time it’s allowed before replying. LibreChat usually exposes these in a Parameters panel. Note: different models support different parameters—so you may not see the same set across providers.

Common examples:

Parameter	What it controls	Low value (≈0.1–0.3)	High value (≈0.8–1.0+)
Temperature	Creativity / randomness	Precise, technical	Creative, brainstorming
Top P	Word variety	Highly predictable	More exploratory, diverse
Thinking budget (if supported)	Internal reasoning tokens	Fast/simple	More complex/detailed

Want an example? Let’s try to define Coffee with different Temp, Top P and Top K values, you’ll get very different results.

Different definitions for COFFEE depending on Temp, Top P and Top K

Parameters are visible in the right sidebar under the “Parameters” section, where you can adjust these options. Remember, we’ll have access to different parameters based on the model selection, as different models have different parameters. For instance:

Context tokens

A token is a chunk of text (a word, part of a word, or a few characters) that models process. A rough rule of thumb: 1,000 tokens ≈ 750 words (about ~1.5 single-spaced pages of text).

Context tokens control how much information the model can “keep in mind” from the conversation and provided text.

Max Output Tokens

Controls the maximum length of each response.

Typical use cases:

1,024: short response (e.g., a brief email)
2,048: medium response (standard document)
4,096: long response (deeper analysis)

Temperature

Temperature ranges from 0 to 1.0 or 2.0 depending on the model. It controls creativity vs predictability.

Lower (0.2–0.4): technical analysis, data-heavy answers, precision
Mid (≈0.6–0.8): general use (often a good default)
High (≈0.9+): brainstorming, creative writing, variety

If your outputs feel too “same-y” or stiff, increase temperature slightly. If you want repeatability and accuracy, lower it.

Top P

Top P limits the variety of tokens the model considers by selecting from the smallest set of tokens whose cumulative probability is ≥ P. It ranges from 0 to 1 (It’s a probability value).

Practical ranges:

0.0–0.7: very deterministic; can sound robotic (useful for strict templates)
0.7–0.85: conservative; safe with a bit of variation
0.85–0.95: balanced mix of coherence and creativity
1.0: most exploratory; can get surprising (or chaotic) outputs

Important: using high Temperature + high Top P at the same time can increase randomness quickly. If things get messy, lower one of them.

Top K

Top K considers only the K most probable tokens. Not all models support it.

Low (<20): very deterministic (fewer choices)
Medium: balanced
High: more choices, more creative

TOP P vs top K (what’s the difference?)

They sound similar but operate differently ifferently and can be used together when the model allows (not always):

Top‑P sampling: probability-based cutoff (dynamic set size)
“Consider the smallest set of tokens that covers 90% of probability mass.”
Top‑K sampling: fixed-size cutoff
“Consider only the top 50 tokens, ignore everything else.”

Top‑P Sampling Probability Based butoff (dynamic size)	Top‑K Sampling Fixed size (allow exactly K options)
Consider the smallest set of tokens whose cumulative probability ≥ P. Example P=0.9 → pick from the smallest set summing ≥90% probability.	Consider only the K most probable tokens, ignoring the rest. Example: K=50 → pick from the top 50 most likely tokens.
Probability-based cutoff → dynamically sized set.	Fixed-size cutoff → always K tokens.

Using both can give you extra control: Top‑P ensures you don’t pick from the very unlikely tail while Top‑K caps the number of candidates considered.

Thinking budget (Tokens) — only for some models

Some models support a thinking budget (or similar) that controls how many tokens the model can use for internal reasoning before it answers.

General guidance:

Low / none: faster, simpler responses
Mid: general use
High: complex problems (slower, potentially more thorough)

If your goal is exploratory, detailed output, use higher thinking + mid/high Top‑P/K.
If your goal is fast, stable output, use low thinking + lower randomness.

You can often leave this blank to let the model decide.

What are presets?

A preset lets you save your preferred combination of:

model
parameters (temperature, top_p, etc.)
custom instructions (system-style guidance)

…so you can quickly switch between configurations for different workflows.

Some ecosystems also support “agents” (more feature-rich assistants with tools and capabilities). Even if you plan to use agents later, presets are still a great way to learn the basics of model behavior and parameter tuning.

How to create and save a preset

The exact UI can vary by version, but the flow is generally as it follows:

Open the Parameters panel. This pannel usually is in the right sidebar with other options like promts, memories, Agent builder, MCP settings, bookmarks… The exact number of options depend on your case.
Create a new Preset
Name it (e.g., “Daily Work”, “Code Assistant”)
Add your custom instructions. Be detailed: it’ll be the context for your requests.
Adjust parameters (temperature, top_p, etc.). The available parameters will vary depending on the model you’ve selected.
Save

Note: Presets are tied to the model/provider endpoint you selected when creating them.

Before you save presets: experiment a little

The best presets come from experimentation:

tweak parameters
adjust custom instructions
observe how behavior changes

Here are some simplified starting points :

Marketer: Newsletter profile

Model: a strong general model (GPT‑4-class / Claude-class)
Temperature: 0.7–0.9
Top P: 0.9
Max tokens: high (e.g., 800–1200+)
Frequency penalty: 0.3–0.5 (reduce repetition) and Presence penalty: 0.2–0.4 (if your model supports these)

Custom instructions: You are an expert email marketer. Write engaging newsletters in a friendly, human tone that matches a modern SaaS brand. Optimize for clarity and storytelling, not hype. Use short paragraphs and clear headings.
Always include: a concise hook in the first 2–3 sentences, 1–3 key sections with clear value, and a soft call to action at the end. Keep language simple, avoid jargon, and write at a B2 English level.
When I provide context (audience, product, campaign), adapt tone and examples accordingly. Reply only with the email body.

Code Assistant

Temperature: 0.3
Top P: 0.85
Max tokens: 2000

Custom instructions: You are a senior software engineer. Write clean, well-commented code. Explain your reasoning briefly before providing code. Prioritize readability and best practices. If multiple approaches exist, recommend the most maintainable option and mention performance tradeoffs.

Marketer: Paid Ads

Model: a strong general model (GPT‑4-class / Claude-class)
Temperature: 0.3–0.5
Top P: 0.7–0.8
Max tokens: low–medium (e.g., 200–400)
Frequency penalty: 0–0.2 (allow repeating key terms) Presence penalty: 0–0.2 (if your model supports these / OpenAI)

Custom instructions: You are a performance marketer specialized in paid ads (Meta, Google, LinkedIn). Write concise, conversion-focused ad copy that stays within strict character limits.
Always return:

3–5 headline variants (max 30–40 characters unless I specify another limit)
3–5 primary text variants (1–2 short sentences each)

Make the benefit extremely clear and concrete. Use simple, direct language and strong verbs. Avoid clickbait and exaggerated claims.
Output as a compact list or table for easy copy/paste.

Daily work (general productivity)

Max output tokens: 2,048
Temperature: 0.7
Top P: 0.9
Top K: 35 (if supported)
Thinking budget: ~5,000 (if supported)

Brainstorming / creative

Max output tokens: 2,048
Temperature: 1.0
Top P: 0.95
Top K: 40 (if supported)
Thinking budget: ~3,000 (if supported)

Weekly Standup

Temperature: 0.2
Top P: 0.9

Custom instructions: You are a concise technical writer. Summarize my updates in bullet points using this format:
Completed, In Progress, Next Steps.
Keep it under 150 words. Focus on outcomes and blockers, not activity. No emojis

Technical analysis

Max output tokens: 4,096
Temperature: 0.3
Top P: 0.8
Top K: 30 (if supported)
Thinking budget: ~10,000 (if supported)

Project Recap

Temperature: 0.5
Top P: 0.9

Custom instructions: You are a strategic communicator. Help me write project summaries that highlight business impact, not just tasks completed. Use metrics when possible. Structure: Context → Actions → Results → Next Steps. Tone: professional but engaging.

How to use saved presets

When starting a new chat, select your preset from the preset dropdown/menu. LibreChat will apply the saved model, parameters, and instructions automatically.

LibreChat typically lets you export and import presets as JSON files.

To export presets, first create them in the UI by selecting the model, adjusting parameters, and adding any custom instructions, then save them. To share or reuse a preset later, use the Export to JSON option. You can access this by clicking the pencil icon (which appears when you hover) next to the preset in the LibreChat Presets menu. (See pictures below these lines)
To Import Presets, you can use the Import button in the Librechat Presets menu options. (See pictures below these lines)

Sharing presets is a great way to standardize workflows across a team—just make sure to doublecheck when a preset contains sensitive internal instructions or data.

Troubleshooting common errors

Error: “400 status code (no body)”

The error message would be something along the lines of:

Something went wrong. Here’s the specific error message we encountered: An error occurred while processing the request: 400 status code (no body)

This usually means a request validation/formatting issue— (bad parameters, unsupported combination…). Most probable causes and solutions:

You’re sending a parameter combination that the model doesn’t support
Action: Reset to Model Parameters and ask the model about the parameters you used saying the error you got. LIbrechat will likely answer with the culprit.
“Thinking budget” is not universally available across all model IDs / endpoints / accounts, and some UIs implement it in a way that can trip a 400.
Action: Disable thinking, or switch to a model you know supports it in your environment.
Your context window is too big
Action: Set it to a conservative value (e.g., 20k–32k).
Or… ask the model 🙂

Error: “overloaded”

The error is something along the lines of:

{"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"},"request_id":"req ID"}

The “overloaded_error” from the model API means the service is currently at capacity – it’s not a configuration issue on their end.

Action: ask the model about the error and for an alternative.

Final tip: build a small preset library

You don’t need dozens of presets. Start with those workflows you can easily identify in your working day. Once you have 4–6 reliable presets, LibreChat becomes much more “plug and play”—and you’ll spend less time fiddling with settings.