lm-studiolocal-llmtutorialopenclaw

Running Local LLMs with LM Studio - Updated

March 20, 20265 min read

Running Local LLMs with LM Studio

After months of paying OpenAI and Anthropic for API access, I finally decided to bring my AI workload in-house. LM Studio has become my go-to solution for running quantized LLMs locally.

Why Local LLMs?

The benefits are compelling:

  • Cost savings — Once you have the hardware, inference is free
  • Privacy — Your data never leaves your machine
  • Latency — No network round-trips for simple tasks
  • Offline capability — Works without internet

Hardware Requirements

You don't need a beefy GPU. I'm running a Qwen3.5-9B model on a mid-range setup:

  • CPU: AMD Ryzen 7 5800X
  • RAM: 32GB DDR4
  • GPU: NVIDIA RTX 3060 (12GB VRAM)
  • Storage: NVMe SSD

The key insight: quantized models (Q4_K_M, Q6_K) fit in GPU VRAM or even system RAM alone.

Setting Up LM Studio

Download LM Studio from their website and install it. The interface is straightforward:

# Model download happens through the UI
# But you can also use the CLI:
./lmstudio download <model-path>

I prefer the Qwen3.5-9B model at Q6_K quantization. It's a good balance:

  • Performance: ⭐⭐⭐⭐⭐
  • Quality: ⭐⭐⭐⭐
  • RAM usage: ~11GB

Configuration for OpenClaw

Once you have LM Studio running, enable the local server:

  1. Click the "Server" icon in the sidebar
  2. Set port (default: 1234)
  3. Enable CORS if needed
  4. Click "Start Server"

Then configure OpenClaw to use your local endpoint:

{
  "lmstudio": {
    "endpoint": "http://localhost:1234/v1/chat/completions",
    "model": "qwen3.5-9b"
  }
}

Performance Tips

A few things I've learned:

  1. Keep context small — Don't ask for 32k context if you only need 4k
  2. Batch requests — If you're processing multiple prompts, batch them
  3. Use GPU acceleration — Enable CUDA in LM Studio settings
  4. Temperature tuning — Lower temp (0.3-0.5) for coding, higher for creative

When to Use Cloud vs Local

Not everything belongs on your local machine. I use cloud for:

  • Complex reasoning tasks
  • Large document analysis
  • When I need the latest models (GPT-4o, Claude 3.5)

Local works great for:

  • Quick code completions
  • Text transformation tasks
  • brainstorming
  • Draft generation

Conclusion

Local LLMs aren't replacing cloud APIs anytime soon, but they're a valuable addition to any AI workflow. My monthly API bill dropped from ~$50 to ~$15 after moving routine tasks to LM Studio.

Give it a try — the barrier to entry is lower than you might think.