Running Local LLMs with LM Studio

After months of paying OpenAI and Anthropic for API access, I finally decided to bring my AI workload in-house. LM Studio has become my go-to solution for running quantized LLMs locally.

Why Local LLMs?

The benefits are compelling:

Cost savings — Once you have the hardware, inference is free
Privacy — Your data never leaves your machine
Latency — No network round-trips for simple tasks
Offline capability — Works without internet

Hardware Requirements

You don't need a beefy GPU. I'm running a Qwen3.5-9B model on a mid-range setup:

CPU: AMD Ryzen 7 5800X
RAM: 32GB DDR4
GPU: NVIDIA RTX 3060 (12GB VRAM)
Storage: NVMe SSD

The key insight: quantized models (Q4_K_M, Q6_K) fit in GPU VRAM or even system RAM alone.

Setting Up LM Studio

Download LM Studio from their website and install it. The interface is straightforward:

# Model download happens through the UI
# But you can also use the CLI:
./lmstudio download <model-path>

I prefer the Qwen3.5-9B model at Q6_K quantization. It's a good balance:

Performance: ⭐⭐⭐⭐⭐
Quality: ⭐⭐⭐⭐
RAM usage: ~11GB

Configuration for OpenClaw

Once you have LM Studio running, enable the local server:

Click the "Server" icon in the sidebar
Set port (default: 1234)
Enable CORS if needed
Click "Start Server"

Then configure OpenClaw to use your local endpoint:

{
  "lmstudio": {
    "endpoint": "http://localhost:1234/v1/chat/completions",
    "model": "qwen3.5-9b"
  }
}

Performance Tips

A few things I've learned:

Keep context small — Don't ask for 32k context if you only need 4k
Batch requests — If you're processing multiple prompts, batch them
Use GPU acceleration — Enable CUDA in LM Studio settings
Temperature tuning — Lower temp (0.3-0.5) for coding, higher for creative

When to Use Cloud vs Local

Not everything belongs on your local machine. I use cloud for:

Complex reasoning tasks
Large document analysis
When I need the latest models (GPT-4o, Claude 3.5)

Local works great for:

Quick code completions
Text transformation tasks
brainstorming
Draft generation

Conclusion

Local LLMs aren't replacing cloud APIs anytime soon, but they're a valuable addition to any AI workflow. My monthly API bill dropped from ~$50 to ~$15 after moving routine tasks to LM Studio.

Give it a try — the barrier to entry is lower than you might think.