News

Ollama Adds Qwen 3.5 with Native Tool Calling and Multimodal Support

Ollama now supports the full Qwen 3.5 small model series locally, including tool calling, reasoning, and multimodal capabilities. For developers building AI-powered applications, this means sophisticated features without cloud API dependencies.

What It Is

Ollama, the popular local LLM runner, added support for Alibaba's Qwen 3.5 models in four sizes (0.8B, 2B, 4B, and 9B parameters). These aren't stripped-down versions—they include the full feature set: native tool calling (letting models invoke functions), thinking/reasoning capabilities, and multimodal support (processing both text and images). Available via simple commands like 'ollama run qwen3.5:9b'.

How This Helps Today

For developers building AI features, this enables rapid local prototyping without API keys or usage limits. Tool calling support means you can build agents that actually do things—query databases, call APIs, trigger actions—running entirely on your laptop. The size range lets you trade off capability for speed based on your hardware; deploy the 0.8B model for simple tasks on older machines, use 9B when accuracy matters more. For teams concerned about data privacy, local execution means sensitive code and data never leaves your infrastructure.

The Context

Ollama has become the de facto standard for running models locally, with over a million downloads. By adding Qwen 3.5 with full capabilities, they're challenging the assumption that cloud APIs are necessary for advanced AI features. This aligns with a broader trend toward edge AI—running models on-device rather than in data centers. The combination of open-weight models (Qwen) and easy-to-use local runners (Ollama) democratizes access to AI development tools.

What to Watch

Performance varies significantly by hardware—test on your target deployment environment, not just your development machine. Tool calling with local models can be less reliable than cloud APIs; implement error handling and validation. Model updates may lag behind official releases; check Ollama's update cadence for critical security or capability patches. Also consider the operational overhead of managing local models versus cloud APIs—you trade privacy for convenience in updates and scaling.

Stay ahead with the latest news in AI

You will not get replaced by AI, but by someone using AI - Samuel Altman