GPT OSS 20b Pricing, Benchmarks, Latency & Providers


NovitaAI	$0.04	$0.15	--	--	--	--

SiliconFlow	$0.04	$0.18	--	--	--	--

Venice (E2EE)	$0.05	$0.19	--	--	--	--

Together	$0.05	$0.2	--	--	--	--

Weights & Biases	$0.05	$0.2	--	--	--	--

Fireworks	$0.07	$0.3	$0.04	--	--	--

Groq	$0.075	$0.3	$0.0375	--	--	--

Amazon Bedrock	$0.07	$0.3	--	--	--	--

Nvidia	--	--	--	--	--	--

Ambient	--	--	--	--	--	--

Cloudflare	$0.2	$0.3	--	--	--	--

DeepInfra	--	--	--	--	--	--

DigitalOcean	$0.05	$0.45	--	--	--	--

Nebius Token Factory	$0.05	$0.2	--	--	--	--

NextBit	--	--	--	--	--	--

Thinking Machines	$0.12	$0.3	--	--	--	--

GPT OSS 20b Pricing, Benchmarks, Latency & Providers | Phaseo

Pricing

Weighted provider pricing over the last 30 days, with recent route pricing history below.

Effective pricing

Weighted by routed usage over the last 30 days; external and non-routable providers are excluded.

Weighted input price

Per 1M tokens

Weighted output price

Per 1M tokens


Fireworks	$0.07	$0.3	--
Groq	$0.075	$0.3	--
NovitaAI	$0.04	$0.15	--
SiliconFlow	$0.04	$0.18	--
Together	$0.05	$0.2	--
Venice (E2EE)	$0.05	$0.19	--
Weights & Biases	$0.05	$0.2	--

No 7-day effective pricing is available for the selected service tier.

Quickstart

Start calling this model with endpoint-specific examples.

Get an API key

Create an API key inSettingsKeysand store it asPHASEO_API_KEY

Keep it server-side, never commit it, and rotate it immediately if exposed.

Send the request

Choose a supported endpoint, pick a main language, then select the example style you want to copy.

Supported endpoints

Supported API reference routes for this model.

POST

/v1/responses

Responses API reference

POST

/v1/chat/completions

Chat Completions API reference

POST

/v1/messages

Messages API reference

Streaming

import Phaseo from '@phaseo/sdk';

const client = new Phaseo({
  apiKey: process.env.PHASEO_API_KEY,
});

const response = await client.generateResponse({
    "model": "openai/gpt-oss-20b",
    "input": "Give me one fun fact about cURL."
});

const outputText = response.output
  ?.flatMap((item) => item.content ?? [])
  .find((item) => item.type === "output_text")
  ?.text;

console.log(outputText ?? response);

Accepted IDsClick to use and copy

Parameters

Aggregated across active providers for the responses route.

Routing will select a compatible provider when a parameter narrows availability, so this list stays model-facing instead of provider-facing.

View all parameters

Parameter	Description
`temperature`	Controls how random token selection can be.
`top_p`	Applies nucleus sampling by limiting candidates to a probability mass threshold.
`top_k`	Restricts sampling to the top-k candidate tokens on providers that expose it.
`max_tokens`	Caps output length on endpoints and providers that use the max_tokens field name.
`frequency_penalty`	Discourages repeated tokens in proportion to how often they already appeared.
`presence_penalty`	Encourages the model to explore new wording or topics after they first appear.
`repetition_penalty`	Applies provider-specific anti-repetition behavior outside the classic penalty fields.
`seed`	Requests deterministic sampling when the upstream provider supports seeded generation.
`stop`	Defines one or more sequences that terminate generation early.
`logprobs`	Requests token-level probability data in the response.
`structured_outputs`	Capability signal for reliable schema-constrained output workflows.
`include_reasoning`	Requests reasoning content or reasoning summaries in responses where supported.
`logit_bias`	Adjusts token selection bias directly when a provider exposes logit control.
`min_p`	Narrows sampling by discarding tokens below a minimum probability threshold.
`top_logprobs`	Limits how many alternative token probabilities are returned per position.

Docs:TypeScript SDK Responses API

OpenAI: GPT OSS 20b

Providers

Providers

Performance

Pricing

Benchmarks

Activity

Apps Using This Model

Model Uptime

Quickstart

About

Subscriptions

OpenAI: GPT OSS 20b

Performance