Futuristic illustration of a mid-size AI model architecture with layered neural blocks and efficient attention pathways

Qwen3.6-27B Deep Dive: Why This Mid-Size Dense Model Works So Well

Qwen3.6-27B is one of the most interesting open models released this year—not because it is the biggest, but because it makes a strong case that mid-size dense models are now good enough to challenge much larger systems when the architecture, post-training, and inference strategy are designed well. That matters. The industry has spent years obsessing over parameter count, but developers do not deploy parameter counts. They deploy systems that need to be accurate, fast, stable, affordable, and easy to serve. Qwen3.6-27B lands right in that sweet spot. ...

April 23, 2026 · 67 AI Lab
vLLM with 4 T4 GPUs for distributed LLM inference

Running vLLM with Qwen3.5-35B GPTQ on 4× Nvidia T4 GPUs

Executive Summary Running Qwen3.5-35B GPTQ Int4 on 4× Nvidia T4 16GB GPUs is feasible with vLLM through tensor parallelism, distributing model computation across all GPUs. The Qwen3.5-35B model (35B total parameters with 3B activated via MoE) has an estimated GPTQ Int4 footprint of approximately 8-10 GB, which requires tensor parallelism across all 4 GPUs (totaling 64GB) to achieve optimal performance. vLLM’s architecture, built on PagedAttention for efficient memory management and GPTQ quantization support, enables this configuration to deliver reasonable throughput for inference workloads while staying within T4 GPU memory constraints. However, performance will be substantially lower than on higher-end GPUs due to T4’s limited PCIe bandwidth (16× Gen3) and lower FP32 compute capability. ...

March 21, 2026 · 67 AI Lab