AI DevelopmentMarch 10, 2025· 10 min read

Running Stable Diffusion Locally: The Setup I Actually Use

My complete local AI image generation stack — ComfyUI, SDXL, ControlNet, LoRAs, and the workflow I've refined over months of experimentation.

By Connor Delia·AIStable DiffusionComfyUIPython

Running Stable Diffusion Locally: The Setup I Actually Use

After months of running local AI image generation for my projects, I've settled on a stack that's both powerful and practical. This is the actual workflow I use — not a beginner tutorial, but the real setup.

Why Local?

Three reasons:

  • **No rate limits** — Generate as many images as you want
  • **Full control** — Your own models, LoRAs, and fine-tuning
  • **Privacy** — Your prompts and images stay on your machine
  • The trade-off is upfront investment in hardware. I run on an RTX 3090, which handles SDXL comfortably.

    The Stack

  • **ComfyUI** — Node-based workflow editor, far more flexible than AUTOMATIC1111
  • **SDXL 1.0** — Base model for quality at 1024x1024
  • **SDXL Refiner** — Two-pass refinement for detail enhancement
  • **ControlNet XL** — For precise composition control
  • **Custom LoRAs** — Fine-tuned for specific styles and subjects
  • ComfyUI Over AUTOMATIC1111

    I switched from A1111 to ComfyUI for one reason: the node graph is pure power. You can build complex pipelines with multiple models, upscalers, ControlNet stacks, and custom logic — all visual.

    My base SDXL workflow nodes:

  • `KSampler` → Base model, 30 steps, DPM++ 2M Karras
  • `KSampler` → Refiner, 10 steps, denoise 0.3
  • `LatentUpscale` → 1.5x before refiner pass
  • `VAEDecode` → Convert latent to image
  • Prompt Engineering at Scale

    Raw prompts get mediocre results. Here's my actual prompt template:

    [SUBJECT], [SCENE/CONTEXT], [LIGHTING], [CAMERA/COMPOSITION], 
    [STYLE QUALIFIERS], [QUALITY BOOSTERS]
    
    quality boosters: masterpiece, best quality, ultra detailed, 
    sharp focus, professional photography, 8k resolution
    
    negative: worst quality, low quality, blurry, distorted, 
    deformed, ugly, disfigured, bad anatomy, watermark, text

    Breaking it into structured sections makes iteration systematic rather than random.

    Seed Management

    The biggest beginner mistake is ignoring seeds. Use a fixed seed when refining a prompt, only randomize when exploring:

    # In my Python API wrapper:
    def generate(prompt, seed=None):
        if seed is None:
            seed = random.randint(1, 2**32 - 1)
        
        # Store seed with output for reproducibility
        output_path = f"output_{seed}_{timestamp}.png"
        return run_inference(prompt, seed, output_path)

    Reproducibility is everything when you're iterating on a specific look.

    What's Next

    I'm currently experimenting with video generation via AnimateDiff and Wan2.1 for short clip synthesis. The quality isn't there yet for production use, but it's moving fast.