Qwen 3.6 35B A3B Pricing, Benchmarks, Latency & Providers


DeepInfra	$0.15	$0.95	--	--	--	--

Venice	$0.2125	$1.375	$0.07	--	--	--

Alibaba Cloud	$0.248	$1.485	--	--	--	--

NovitaAI	$0.248	$1.485	--	--	--	--

SiliconFlow	$0.2	$1.6	--	--	--	--

AkashML	$0.23	$1.8	--	--	--	--

Ambient	--	--	--	--	--	--

Thinking Machines	$0.36	$0.89	--	--	--	--

Providers

API providers, route pricing, availability, and recent reliability signals.

Performance

Latency, throughput, and reliability signals from recent traffic.

Pricing

Effective prices over the last 30 days, with current provider list prices for context.

Benchmarks

Activity

Daily gateway activity over the last 30 days, with current UTC-day pace projection.

Apps Using This Model

Public apps observed in gateway usage for this model.

Model Uptime

Uptime trend for this model over the last 24 hours.

Quickstart

Start calling this model with endpoint-specific examples.

About

Key dates, capabilities, and model metadata.

Subscriptions

Commercial plans and bundled access that currently include this model.


AkashML	$0.23	$1.8	--
Alibaba Cloud	$0.248	$1.485	--
DeepInfra	$0.15	$0.95	--
NovitaAI	$0.248	$1.485	--
SiliconFlow	$0.2	$1.6	--
Venice	$0.2125	$1.375	--

import Phaseo from '@phaseo/sdk'; const client = new Phaseo({ apiKey: process.env.PHASEO_API_KEY, }); const response = await client.generateResponse({ "model": "qwen/qwen3.6-35b-a3b", "input": "Give me one fun fact about cURL." }); const outputText = response.output ?.flatMap((item) => item.content ?? []) .find((item) => item.type === "output_text") ?.text; console.log(outputText ?? response);

Parameter	Description
`temperature`	Controls how random token selection can be.
`top_p`	Applies nucleus sampling by limiting candidates to a probability mass threshold.
`max_tokens`	Caps output length on endpoints and providers that use the max_tokens field name.
`presence_penalty`	Encourages the model to explore new wording or topics after they first appear.
`seed`	Requests deterministic sampling when the upstream provider supports seeded generation.
`tools`	Defines callable tools or functions the model can invoke.
`response_format`	Requests plain text, JSON, or schema-constrained output formats.
`structured_outputs`	Capability signal for reliable schema-constrained output workflows.
`frequency_penalty`	Discourages repeated tokens in proportion to how often they already appeared.
`stop`	Defines one or more sequences that terminate generation early.
`logprobs`	Requests token-level probability data in the response.
`tool_choice`	Controls which tool, if any, the model should call.
`parallel_tool_calls`	Allows or restricts concurrent tool execution where supported.
`reasoning`	Provider-specific reasoning configuration for reasoning-capable APIs.
`reasoning_effort`	Requests a lower or higher reasoning budget when the endpoint exposes that control.
`include_reasoning`	Requests reasoning content or reasoning summaries in responses where supported.
`audio`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`metadata`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`modalities`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`prediction`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`service_tier`	Chooses a supported request tier such as priority or flex when the route supports it.
`store`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`stream_options_include_usage`	See the full parameter reference for endpoint-specific semantics and provider caveats.
`top_logprobs`	Limits how many alternative token probabilities are returned per position.

Parameter

Description

temperature

Controls how random token selection can be.

top_p

Applies nucleus sampling by limiting candidates to a probability mass threshold.

max_tokens

Caps output length on endpoints and providers that use the max_tokens field name.

presence_penalty

Encourages the model to explore new wording or topics after they first appear.

seed

Requests deterministic sampling when the upstream provider supports seeded generation.