Checkpoints 101: Base models, fine-tunes, and merges (what to use, when)
A checkpoint is the “main brain” file that generates the image. Everything else (LoRA, ControlNet, etc.) sits on top of a checkpoint.
This guide focuses only on checkpoints:
- Base models (the foundations)
- Trained models (fine-tunes) vs merge models
- The most commonly used base families + their typical use-cases
- Practical pros/cons + notable restrictions/flaws
1) Base checkpoint vs fine-tuned checkpoint
Base checkpoint
A general-purpose foundation released by a lab/vendor (or a large community release). Examples: SD 1.5, SDXL 1.0, Stable Diffusion 3.5, FLUX.1 [dev].
Why you care: the base decides capabilities and ecosystem support (tools, LoRAs, ControlNets, workflows).
Fine-tuned checkpoint (trained model)
A base checkpoint that has been further trained on a dataset to specialize style/content (e.g., “anime portraits”, “product renders”, “cinematic realism”). Example: Pony Diffusion v6 XL is an SDXL finetune.
Why you care: you get “the look” faster, often with shorter prompts.
2) Trained models vs merge models
Trained model (fine-tune)
How it’s made: additional training updates the weights using examples (gradient descent). Typical result: more coherent “personality” and consistent behavior.
Pros
- More reliable style/subject bias
- Often better consistency across prompts
- Usually easier prompting (less wrestling)
Cons
- Can overfit (repeating faces/compositions)
- Strong biases (e.g., “everything becomes glossy portraits”)
- Quality depends heavily on dataset curation
Merge model
How it’s made: combines two+ checkpoints (weight blending) without extra training. Typical result: a “cocktail” of traits (e.g., realism from one + lighting from another).
Pros
- Fast to create, lots of experimentation
- Can mix complementary strengths
Cons
- Can be less stable (odd artifacts, inconsistent prompt response)
- “Best of both” is not guaranteed; sometimes you get “worst of both”
- Lineage/licensing can get messy if merged components have restrictions
3) The most used base families (and what they’re good at)
Below are widely used foundations or foundation-like families you’ll see constantly across tools and community checkpoints.
A) Stable Diffusion 1.5 (SD 1.5)
Best for: lightweight local generation, huge ecosystem, tons of fine-tunes/LoRAs.
Pros
- Runs on modest GPUs
- Massive community library (styles, characters, niches)
Cons / common flaws
- Weaker at text-in-image and complex compositions than newer bases
- Anatomy/hands often need extra help (inpainting, ADetailer-like workflows)
License note: released under CreativeML Open RAIL-M.
B) Stable Diffusion 2.1 (SD 2.1)
Best for: “cleaner” general image generation compared to 1.5, some workflows prefer it.
Pros
- Solid general base; can feel less “muddy” than many old 1.5 blends
Cons
- Smaller community ecosystem than 1.5/SDXL
- Many users find it pickier with prompts than 1.5-style finetunes
License note: commonly distributed under CreativeML Open RAIL++-M.
C) Stable Diffusion XL 1.0 (SDXL)
Best for: higher resolution, better prompt understanding, better typography than SD1.5 in many cases.
Pros
- Strong general-purpose base with modern “look”
- Great for: portraits, environments, design-ish images, better composition
Cons
- Heavier GPU/VRAM demands than SD1.5
- Many SDXL finetunes expect SDXL-specific workflows (refiners, SDXL VAEs, etc.)
License note: CreativeML Open RAIL++-M (official release).
D) SDXL Turbo / SD-Turbo (distilled “fast” variants)
Best for: speed, quick ideation, real-time-ish exploration.
Pros
- Very fast generation (great for sketching ideas)
Cons
- Often lower fidelity than full SDXL at the same resolution
- Less “room” to refine details; can look a bit “compressed”
License note: SD-Turbo points to Stability’s licensing terms; commercial terms depend on the license path.
E) Stable Diffusion 3.5 (SD 3.5 suite)
Best for: modern prompt-following and high quality on consumer hardware (where supported).
Pros
- Designed as a newer generation; positioned as customizable and consumer-hardware friendly
Cons
- Tooling/workflow compatibility can be more variable than SDXL/1.5 depending on your stack
License note: Stability states SD 3.5 models are free for commercial + non-commercial use under the Stability AI Community License.
F) FLUX.1 [dev] (Black Forest Labs)
Best for: high-quality generations and strong modern results (where supported).
Pros
- Strong general image quality and “modern” rendering (popular momentum in 2024–2025+)
Cons / restrictions
- The model is under a “Non-Commercial License” for the weights, but the license explicitly states users can use outputs broadly (with caveats) and forbids using outputs to train/fine-tune a competing model.
- Heavier compute requirements than SD1.5; best experience depends on your tooling
G) “Pony” (as a foundation-like community base: Pony Diffusion XL)
Best for: anime/cartoon/anthro ecosystems, Danbooru-tag-style prompting culture, huge LoRA scene around it.
Pros
- Very strong community “content + LoRA” ecosystem in its niche
- Often excellent for stylized characters and illustration workflows
Cons / quirks
- Prompting style can differ from “plain SDXL”: many users lean into tag-like prompting
- Can bias outputs toward the dataset’s dominant styles
Evidence note: Pony Diffusion v6 XL is explicitly an SDXL finetune in its model card.
H) “Anything” (SD1.5 anime finetune family, e.g., Anything V5)
Best for: classic SD1.5 anime look, fast stylized portraits/characters.
Pros
- Easy to get anime-style results with short prompts
- Lightweight (SD1.5 base)
Cons
- Less flexible outside its core aesthetic vs newer SDXL-based anime finetunes
Evidence note: Anything V5 is described as an anime-focused fine-tuned SD checkpoint.
I) “Illustrious XL” (SDXL illustration-focused checkpoint family, OnomaAIResearch)
Best for: high-resolution illustration/anime-style images (notably native 1536×1536) with a mix of natural-language + Danbooru/tag prompting.
Pros
- Native 1536px support (and wide resolution range without extra hi-res tricks)
- Works well with plain English prompts, tags, or both (hybrid prompting)
- Strong as a foundation for training/using add-ons (the model card highlights compatibility with common extensions like LoRA/ControlNet)
Cons
- Version behavior can vary; the authors publish a v2.0 “STABLE” checkpoint specifically for more stable generation behavior
- Knowledge cutoff noted on v1.0 (trained July 2024; knowledge up to June 2024)
- Licensing differs by release: v1.0 is marked sdxl-license, while v2.0 is marked creativeml-openrail-m—so always check the exact version you download before commercial use/distribution
Evidence note: On Hugging Face, Illustrious XL v1.0 is described as an SDXL-based model with native 1536×1536 plus NLP + tag-based prompting, and v2.0 is presented as a more stabilized checkpoint.
4) Practical selection guide (fast)
- I want maximum compatibility + huge community assets: SD 1.5 or SDXL.
- I want a modern all-rounder at higher quality: SDXL 1.0 (then pick a finetune like a realism/art variant).
- I want speed for brainstorming: SDXL Turbo / SD-Turbo.
- I want high-res illustration/anime with hybrid prompting (English + tags): Illustrious XL.
- I want anime/stylized character pipelines (tag culture + big LoRA ecosystem): Pony (SDXL finetune).
- I want classic lightweight SD1.5 anime aesthetics: Anything-style (SD1.5 finetune).
- I want strong modern results and my tooling supports it (but check license constraints): FLUX.1 [dev].