LLM | 67 AI Lab

vLLM with 4 T4 GPUs for distributed LLM inference

Running vLLM with Qwen3.5-35B GPTQ on 4× Nvidia T4 GPUs

Executive Summary Running Qwen3.5-35B GPTQ Int4 on 4× Nvidia T4 16GB GPUs is feasible with vLLM through tensor parallelism, distributing model computation across all GPUs. The Qwen3.5-35B model (35B total parameters with 3B activated via MoE) has an estimated GPTQ Int4 footprint of approximately 8-10 GB, which requires tensor parallelism across all 4 GPUs (totaling 64GB) to achieve optimal performance. vLLM’s architecture, built on PagedAttention for efficient memory management and GPTQ quantization support, enables this configuration to deliver reasonable throughput for inference workloads while staying within T4 GPU memory constraints. However, performance will be substantially lower than on higher-end GPUs due to T4’s limited PCIe bandwidth (16× Gen3) and lower FP32 compute capability. ...

Agentic AI workflow diagram showing LLM orchestrating biological tools

What Is Agentic AI? From Chatbots to Autonomous Scientific Agents

Introduction: Beyond the Chatbot When you ask ChatGPT a question, it answers. When you ask an agentic AI system a question, it acts. This distinction — between passive assistance and autonomous execution — marks one of the most significant shifts in artificial intelligence since the transformer architecture itself. Agentic AI systems are not merely more sophisticated chatbots. They are autonomous entities capable of perception, reasoning, planning, tool use, action, and memory. They can independently execute multi-step workflows, make decisions when faced with uncertainty, and adapt their approach based on feedback from the environment. In scientific contexts, this means agents that can read literature, formulate hypotheses, design experiments, execute computational analyses, interpret results, and iterate — all with varying degrees of human oversight. ...

Giving it a Brain: Connecting Gemini & OpenAI

Yesterday, we installed OpenClaw on the Raspberry Pi. It was alive, but silent. Today, we give it a voice—and a brain. A true agent isn’t just a script; it needs a Large Language Model (LLM) to reason, understand intent, and generate human-like responses. OpenClaw makes this incredibly easy by supporting multiple providers right out of the box. In this guide, we’ll connect Google Gemini (for speed and reasoning) and OpenAI (as a backup or for specific tasks). ...