What GPU do you need to run Qwen3 27B?

Qwen 3.6 27B at Q4 is about 16 GB of weights. Allow roughly 20 GB to start and 24 GB to be comfortable with context, so any 24 GB or 32 GB card runs it — including a used 24 GB RTX 3090. The decision is software maturity, decode speed, and price, not whether it fits.

Is the RTX 5090 or a used RTX 3090 better for Qwen3 27B?

A used RTX 3090 (about $1,050 for the card) is the value baseline: 24 GB of mature CUDA, every tool works on day one, roughly 70 to 85 tokens per second. The RTX 5090 is the fastest single card under $5,000 and the only one that stays fast past 100K context, but a full build lands around $4,500. Pick the 3090 for least money with zero friction; pick the 5090 for prompt speed and long context.

Why is the Intel Arc Pro B70 slower than its bandwidth suggests?

Memory bandwidth is only the ceiling — you reach it only if the inference kernels keep the memory bus busy. The CUDA cards run close to their rated bandwidth, so an RTX 5090 or 3090 decodes roughly where its bandwidth predicts. The Arc Pro B70's younger SYCL/Vulkan stack saturates only about half of its 608 GB/s on dense decode today, which is why its tokens per second trail a 3090 by more than the raw bandwidth gap suggests.

All guides

Best GPU for Qwen3 27B: RTX 5090 vs R9700 vs Arc Pro B70 vs MI50

Qwen 3.6 27B is the model that made a lot of people start pricing out a local box. It is a dense 27B that scores 77.2% on SWE-bench Verified, which is higher than Qwen's own 397B mixture-of-experts model from the previous generation, and it ships under Apache 2.0 with a 256K context window. In plain terms: a coding model you can actually fit on one card now matches things that used to need a server.

So the question in everyone's search bar is simple. Which card should run it?

Short answer: they all fit, so pick on software and speed

Qwen 3.6 27B at Q4 is about 16 GB of weights. Add room for context and you want roughly 20 GB to start and 24 GB to be comfortable. That means every card below runs it, including a used 24 GB RTX 3090. The decision is not "will it fit." The decision is how mature the software stack is, how fast it decodes, and what you pay to get there.

Here is the lineup, with build cost meaning the card plus a roughly $700 host PC. All hardware figures come from our own dataset, last price-checked on June 3, 2026.

Build (single card)	Build cost	VRAM (usable)	Memory bandwidth	Software stack	Maturity	vLLM
RTX 5090	~$4,500	32 GB (31)	1,792 GB/s	CUDA (Blackwell)	5 / 5	Production
RTX 3090 (used)	~$1,750	24 GB (23)	936 GB/s	CUDA	5 / 5	Production
AMD R9700	~$2,050	32 GB (31)	640 GB/s	ROCm (RDNA 4)	3 / 5	Community
Intel Arc Pro B70	~$1,800	32 GB (31)	608 GB/s	SYCL / Vulkan	2 / 5	Experimental
AMD MI50 32 GB (used)	~$700	32 GB (31)	1,024 GB/s	ROCm (gfx906) / Vulkan	2 / 5	None

Two things jump out. The cheapest new 32 GB cards (R9700 and B70) actually have less memory bandwidth than a five-year-old 3090, and the cheapest card on the list, a used MI50, has more bandwidth than either of them. Bandwidth is what sets the ceiling on how fast a dense model like the 27B spits out tokens, so keep that in mind as you read the speed section.

How fast is Qwen 3.6 27B on each card?

A dense 27B is bandwidth-bound when it generates text, because every one of its 27 billion parameters has to be read from memory for each token. Prompt processing (chewing through a long prompt before it answers) is the opposite, it is compute-bound, and that is where NVIDIA's Blackwell parts pull far ahead. Our data has the RTX 5090 ingesting prompts several times faster than the AMD and Intel cards.

For single-stream decode speed on the dense 27B, the most useful numbers come from people running it at home, so treat these as community-reported and config-dependent rather than lab-controlled:

RTX 5090: roughly 90 to 160 tokens per second. A one-click Windows build reports 158 tok/s, and a heavily optimized NVFP4 run measured about 92 tok/s while holding a 200K-token context at 575 watts. The wide range is the tell: you only hit the top end with speculative decoding (multi-token prediction) and aggressive KV-cache quantization.
RTX 3090 (used): roughly 70 to 85 tok/s with the same tricks. One published "overnight" stack reports 85 tok/s sustained at 125K context on a single 3090, and a one-click native build reports 72 tok/s. For a card you can buy used for under $1,200, that is the value story of the year.
R9700, B70 and MI50: slower at dense decode, and exactly in the order their bandwidth predicts. The MI50's 1 TB/s HBM2 gives it real headroom on paper, but its software situation (more on that below) is what stops most people from reaching it.

If decode speed on the cheaper cards matters more to you than peak quality, there is a shortcut worth knowing. Qwen shipped a sibling model, Qwen 3.6 35B-A3B, a mixture-of-experts that only activates 3 billion parameters per token. It fits the same 32 GB at Q4 and feels noticeably snappier on bandwidth-limited cards, because the card only has to read 3B of weights per token instead of 27B. If you want the 27B's coding quality, run the dense model. If you want speed on a budget card, the A3B MoE is the play. Our model picker shows both side by side with per-card numbers.

One more thing worth saying out loud, because it saves money: spending $2,700 to $4,700 on a big unified-memory box like a Strix Halo or DGX Spark does not unlock a meaningfully better coding model than this 27B on a 24 GB 3090. The dense 27B beats every open-weight MoE that fits in 128 GB or less. Capacity buys you the ability to run frontier MoE models that need 600 GB or more, not a better experience at the 27B tier.

Rated bandwidth vs what the software actually uses

Memory bandwidth is the ceiling, but you only reach it if the inference kernels are good enough to keep the memory bus busy every cycle. This is where the maturity column quietly turns into real tokens per second, and it is the single biggest reason two cards with similar bandwidth can decode at very different speeds.

On the CUDA cards the kernels are mature enough to run the bus close to flat out. A well-tuned llama.cpp or vLLM decode on an RTX 5090 or 3090 sustains something near the card's rated bandwidth, so the 27B's decode speed lands roughly where the 1,792 GB/s and 936 GB/s figures predict. Years of CUDA-first kernel work is exactly what you are paying for.

The younger stacks leave a big chunk on the table. The Intel Arc Pro B70's SYCL and Vulkan paths are improving fast but are still immature, and in practice they saturate only around half of its 608 GB/s on dense decode today. That is the real reason its tokens per second trail a 3090 by more than the raw bandwidth gap suggests: the 3090 turns most of its 936 GB/s into tokens, while the B70 turns roughly half of its 608 GB/s into tokens. The AMD R9700's ROCm path sits in between, and the used MI50 is the most under-saturated of all relative to its 1 TB/s HBM2, which is exactly why its on-paper bandwidth advantage so rarely shows up in real runs.

Card	Rated bandwidth	Bandwidth the software reaches today	Why
RTX 5090	1,792 GB/s	Near full	Mature CUDA / Blackwell kernels
RTX 3090	936 GB/s	Near full	Mature CUDA kernels
AMD R9700	640 GB/s	Partial	ROCm 7 improving, not yet CUDA-level
Intel Arc Pro B70	608 GB/s	~Half	Young SYCL / Vulkan backend
AMD MI50 32 GB	1,024 GB/s	Low fraction	Community Vulkan build, no vLLM path

These utilisation figures are rough and config-dependent — they move with driver version, quantization, batch size and context length. Treat them as the shape of the gap, not a benchmark. The point holds regardless: on the immature stacks, raw bandwidth overstates real decode speed.

The cards, one by one

RTX 5090: fastest, if you can stomach the price

The 5090 is the fastest single GPU under $5,000 and the only card here that stays fast as your context grows past 100K tokens. Its 32 GB fits the 27B with huge context headroom, the CUDA 13 stack is rock solid, and vLLM plus TensorRT-LLM both supported it on day one. The catch is price. The $1,999 MSRP is effectively a paper number; real cards run $3,500 to $3,999 on Amazon and Newegg, with premium SKUs past $4,500, so a full build lands around $4,500. Buy it if prompt speed and long-context decode are worth the premium. See the full RTX 5090 build page.

Buy on Amazon ↗

RTX 3090: the value baseline that quietly wins

The used 3090 is the build our data recommends for anyone running a 27B-class dense model at or under $1,500 of card cost. 24 GB of CUDA Ampere fits Qwen 3.6 27B at Q4 with 32K context, and because it is CUDA, every tool works on day one with no workarounds: vLLM, TensorRT-LLM, llama.cpp, ComfyUI. Used cards run $800 to $1,130 (about $1,050 average in late May), pushed up by a memory-chip price spike and 5090 prices doubling. If you want zero software friction for the least money, this is the answer, and you can drop in a second 3090 for 48 GB later.

Buy on Amazon (used) ↗

AMD R9700: the cheapest new 32 GB card

The Radeon AI Pro R9700 is the cheapest brand-new 32 GB workstation card by a wide margin, at a $1,299 launch price (partner boards like the ASRock Creator and PowerColor run roughly $1,350 to $1,520 as memory contract pricing pushes them up). It is RDNA 4, runs both ROCm 7 and Vulkan, and is Windows-friendly, which the data-center cards on this list are not. The trade-offs are real: 640 GB/s bandwidth means dense decode is slower, prompt processing is about three times slower than the 5090, and vLLM support is still community-grade rather than production. Good pick if you want a new, warrantied AMD card and value capacity over raw speed. R9700 build page.

Buy on Amazon ↗

Intel Arc Pro B70: cheapest 32 GB on the market, if you tolerate setup

At a $949 launch price (street cards now around $1,099, and it has been Newegg's number-one workstation GPU), the Arc Pro B70 is the cheapest way to get 32 GB of discrete VRAM, period. It is quiet, has ECC memory, and the software is improving fast: Intel's OpenVINO 2026.1 added a first-class llama.cpp backend, and the Vulkan path already runs Qwen 3.6 27B today. But the SYCL and IPEX-LLM stack is still two to three years behind CUDA, which is the honest reason NVIDIA keeps winning here, and why it reaches only about half its rated bandwidth on dense decode. Expect to spend more time on setup. Buy it for the best VRAM-per-dollar if you enjoy tinkering. Arc Pro B70 build page.

Buy on Amazon ↗

AMD MI50 32 GB: the homelab bargain with a software tax

The used MI50 is the best dollar-per-gigabyte-per-bandwidth card on Earth right now: about $150 to $250 for a 32 GB card with 1 TB/s HBM2, plus a blower kit and a host. That bandwidth is why homelabbers love it. The tax is software. AMD dropped the gfx906 silicon from ROCm 7, so the community keeps it alive with a llama.cpp Vulkan build and a maintained ROCm 6.4.4 fork, and there is no real vLLM path. Independent benchmarks land well below the most optimistic community claims, so set expectations and verify the 32 GB VBIOS before you buy, because some sellers reflash 16 GB cards. Buy it if the build is the hobby. MI50 build page.

Buy on Amazon ↗

Which should you pick?

You want it to just work, for the least money: used RTX 3090. Mature CUDA, fits the 27B at Q4, under $1,200 for the card.
You want the fastest answers and long context: RTX 5090. Pay for the bandwidth and the prefill speed.
You want a new 32 GB card with a warranty: AMD R9700. Cheapest new 32 GB, Windows-friendly, accept slower decode.
You want maximum VRAM per dollar and don't mind setup: Intel Arc Pro B70 at $949.
You want a homelab project and love a deal: used MI50, eyes open about the software.

Every one of these runs Qwen 3.6 27B. The right answer depends on your budget, how much you care about speed versus capacity, and how much software friction you will accept.

The fastest way to settle it for your exact situation is to run the numbers yourself. Our build picker ranks every card here for the 27B at your budget, and the compare view puts any two of them head to head on speed, VRAM and software maturity.

Pick your Qwen 3.6 27B build at llmrequirements.com.

Sources

Qwen 3.6-27B model card and launch (Qwen): qwen.ai/blog and huggingface.co/Qwen/Qwen3.6-27B
Qwen3.6-27B beats larger predecessor on coding (The Decoder): the-decoder.com
AMD Radeon AI PRO R9700 official $1,299 launch (WCCFTech): wccftech.com
R9700 availability and 32 GB Navi 48 (VideoCardz): videocardz.com
Intel Arc Pro B70 32 GB for $949, software caveats (XDA Developers): xda-developers.com
Intel Arc Pro B70 Linux performance review (Phoronix): phoronix.com
Running Qwen 3.6 27B on Intel Arc Pro B70 (community guide, Medium): bibek-poudel.medium.com
RTX 5090 price tracker, June 2026 (Best Value GPU): bestvaluegpu.com
Qwen3.6-27B one-click server, 158 tok/s on 5090 / 72 tok/s on 3090 (devnen, GitHub): github.com/devnen
Qwen3.6-27B at 85 TPS on a single RTX 3090 (community write-up, Medium): medium.com
vLLM docker-compose recipe for Qwen 3.6 27B on dual RTX 3090s (dzombak): dzombak.com
Vulkan vs ROCm 7 on the AMD MI50 (MegaOne AI): megaoneai.com
LLMRequirements hardware dataset and State of Local AI essay (internal).