Running Local LLMs with LM Studio - Updated
Running Local LLMs with LM Studio
After months of paying OpenAI and Anthropic for API access, I finally decided to bring my AI workload in-house. LM Studio has become my go-to solution for running quantized LLMs locally.
Why Local LLMs?
The benefits are compelling:
- Cost savings — Once you have the hardware, inference is free
- Privacy — Your data never leaves your machine
- Latency — No network round-trips for simple tasks
- Offline capability — Works without internet
Hardware Requirements
You don't need a beefy GPU. I'm running a Qwen3.5-9B model on a mid-range setup:
- CPU: AMD Ryzen 7 5800X
- RAM: 32GB DDR4
- GPU: NVIDIA RTX 3060 (12GB VRAM)
- Storage: NVMe SSD
The key insight: quantized models (Q4_K_M, Q6_K) fit in GPU VRAM or even system RAM alone.
Setting Up LM Studio
Download LM Studio from their website and install it. The interface is straightforward:
# Model download happens through the UI
# But you can also use the CLI:
./lmstudio download <model-path>
I prefer the Qwen3.5-9B model at Q6_K quantization. It's a good balance:
- Performance: ⭐⭐⭐⭐⭐
- Quality: ⭐⭐⭐⭐
- RAM usage: ~11GB
Configuration for OpenClaw
Once you have LM Studio running, enable the local server:
- Click the "Server" icon in the sidebar
- Set port (default: 1234)
- Enable CORS if needed
- Click "Start Server"
Then configure OpenClaw to use your local endpoint:
{
"lmstudio": {
"endpoint": "http://localhost:1234/v1/chat/completions",
"model": "qwen3.5-9b"
}
}
Performance Tips
A few things I've learned:
- Keep context small — Don't ask for 32k context if you only need 4k
- Batch requests — If you're processing multiple prompts, batch them
- Use GPU acceleration — Enable CUDA in LM Studio settings
- Temperature tuning — Lower temp (0.3-0.5) for coding, higher for creative
When to Use Cloud vs Local
Not everything belongs on your local machine. I use cloud for:
- Complex reasoning tasks
- Large document analysis
- When I need the latest models (GPT-4o, Claude 3.5)
Local works great for:
- Quick code completions
- Text transformation tasks
- brainstorming
- Draft generation
Conclusion
Local LLMs aren't replacing cloud APIs anytime soon, but they're a valuable addition to any AI workflow. My monthly API bill dropped from ~$50 to ~$15 after moving routine tasks to LM Studio.
Give it a try — the barrier to entry is lower than you might think.