OpenAI-compatible chat completions endpoint. Accepts the standard OpenAI request schema and returns the standard OpenAI response schema. Supports both streaming (SSE) and non-streaming modes. Works as a drop-in replacement with the OpenAI Python and TypeScript SDKs.
Model to use for completion
A list of messages comprising the conversation
Sampling temperature
Nucleus sampling parameter
Number of completions to generate
Whether to stream partial results
Maximum tokens to generate
Maximum completion tokens (takes precedence over max_tokens)
Presence penalty
Frequency penalty
Unique identifier for the end-user