ZStack Logo

ZStack AIOS

推理模板兼容清单

AIOS 用户侧功能说明,覆盖 GPU、模型仓库、推理服务和场景实践。

不同推理模板适用的模型不同,部署模型时需注意推理模板兼容性。本章介绍系统推理模板兼容的模型。用户可将兼容列表中的模型上传到AI模型平台,并使用对应的系统推理模板部署。
说明: 如使用自定义推理模板,请参考对应模板的官方兼容性说明。

vLLM 0.20.2

以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
表1 纯文本语言模型 | 生成模型 | 文本生成
架构 模型 HuggingFace模型示例
AfmoeForCausalLM Afmoe TBA
ApertusForCausalLM Apertus swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc.
AquilaForCausalLM Aquila, Aquila2 BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc.
ArceeForCausalLM Arcee (AFM) arcee-ai/AFM-4.5B-Base, etc.
ArcticForCausalLM Arctic Snowflake/snowflake-arctic-base, Snowflake/snowflake-arctic-instruct, etc.
AXK1ForCausalLM A.X-K1 skt/A.X-K1, etc.
BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat, baichuan-inc/Baichuan-7B, etc.
BailingMoeForCausalLM Ling inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc.
BailingMoeV2ForCausalLM Ling inclusionAI/Ling-mini-2.0, etc.
BailingMoeV2_5ForCausalLM Ling inclusionAI/Ling-2.5-1T, inclusionAI/Ring-2.5-1T
BambaForCausalLM Bamba ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B
BloomForCausalLM BLOOM, BLOOMZ, BLOOMChat bigscience/bloom, bigscience/bloomz, etc.
ChatGLMModel, ChatGLMForConditionalGeneration ChatGLM zai-org/chatglm2-6b, zai-org/chatglm3-6b, thu-coai/ShieldLM-6B-chatglm3, etc.
CohereForCausalLM, Cohere2ForCausalLM Command-R, Command-A CohereLabs/c4ai-command-r-v01, CohereLabs/c4ai-command-r7b-12-2024, CohereLabs/c4ai-command-a-03-2025, CohereLabs/command-a-reasoning-08-2025, etc.
CwmForCausalLM CWM facebook/cwm, etc.
DbrxForCausalLM DBRX databricks/dbrx-base, databricks/dbrx-instruct, etc.
DeciLMForCausalLM DeciLM nvidia/Llama-3_3-Nemotron-Super-49B-v1, etc.
DeepseekForCausalLM DeepSeek deepseek-ai/deepseek-llm-67b-base, deepseek-ai/deepseek-llm-7b-chat, etc.
DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Chat, etc.
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3.1, etc.
DeepseekV4ForCausalLM DeepSeek-V4 deepseek-ai/DeepSeek-V4-Flash, deepseek-ai/DeepSeek-V4-Pro, etc.
Dots1ForCausalLM dots.llm1 rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc.
DotsOCRForCausalLM dots_ocr rednote-hilab/dots.ocr
Ernie4_5ForCausalLM Ernie4.5 baidu/ERNIE-4.5-0.3B-PT, etc.
Ernie4_5_MoeForCausalLM Ernie4.5MoE baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc.
ExaoneForCausalLM EXAONE-3 LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct, etc.
ExaoneMoEForCausalLM K-EXAONE LGAI-EXAONE/K-EXAONE-236B-A23B, etc.
Exaone4ForCausalLM EXAONE-4 LGAI-EXAONE/EXAONE-4.0-32B, etc.
Fairseq2LlamaForCausalLM Llama (fairseq2 format) mgleize/fairseq2-dummy-Llama-3.2-1B, etc.
FalconForCausalLM Falcon tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc.
FalconMambaForCausalLM FalconMamba tiiuae/falcon-mamba-7b, tiiuae/falcon-mamba-7b-instruct, etc.
FalconH1ForCausalLM Falcon-H1 tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc.
FlexOlmoForCausalLM FlexOlmo allenai/FlexOlmo-7x7B-1T, allenai/FlexOlmo-7x7B-1T-RT, etc.
GemmaForCausalLM Gemma google/gemma-2b, google/gemma-1.1-2b-it, etc.
Gemma2ForCausalLM Gemma 2 google/gemma-2-9b, google/gemma-2-27b, etc.
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it, etc.
Gemma3nForCausalLM Gemma 3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
Gemma4ForCausalLM Gemma 4 google/gemma-4-E2B-it, etc.
GlmForCausalLM GLM-4 zai-org/glm-4-9b-chat-hf, etc.
Glm4ForCausalLM GLM-4-0414 zai-org/GLM-4-32B-0414, etc.
Glm4MoeForCausalLM GLM-4.5, GLM-4.6, GLM-4.7 zai-org/GLM-4.5, etc.
Glm4MoeLiteForCausalLM GLM-4.7-Flash zai-org/GLM-4.7-Flash, etc.
GPT2LMHeadModel GPT-2 openai-community/gpt2, openai-community/gpt2-xl, etc.
GPTBigCodeForCausalLM StarCoder, SantaCoder, WizardCoder bigcode/starcoder, bigcode/gpt_bigcode-santacoder, WizardLM/WizardCoder-15B-V1.0, etc.
GPTJForCausalLM GPT-J EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc.
GPTNeoXForCausalLM GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM EleutherAI/gpt-neox-20b, EleutherAI/pythia-12b, OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc.
GptOssForCausalLM GPT-OSS openai/gpt-oss-120b, openai/gpt-oss-20b
GraniteForCausalLM Granite 3.0, Granite 3.1, PowerLM ibm-granite/granite-3.0-2b-base, ibm-granite/granite-3.1-8b-instruct, ibm/PowerLM-3b, etc.
GraniteMoeForCausalLM Granite 3.0 MoE, PowerMoE ibm-granite/granite-3.0-1b-a400m-base, ibm-granite/granite-3.0-3b-a800m-instruct, ibm/PowerMoE-3b, etc.
GraniteMoeHybridForCausalLM Granite 4.0 MoE Hybrid ibm-granite/granite-4.0-tiny-preview, etc.
GraniteMoeSharedForCausalLM Granite MoE Shared ibm-research/moe-7b-1b-active-shared-experts (test model)
GritLM GritLM parasail-ai/GritLM-7B-vllm.
Grok1ModelForCausalLM Grok1 hpcai-tech/grok-1.
Grok1ForCausalLM Grok2 xai-org/grok-2
HunYuanDenseV1ForCausalLM Hunyuan Dense tencent/Hunyuan-7B-Instruct
HunYuanMoEV1ForCausalLM Hunyuan-A13B tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc.
HYV3ForCausalLM HY3 tencent/Hy3-preview-Base, tencent/Hy3-preview
HyperCLOVAXForCausalLM HyperCLOVAX-SEED-Think-14B naver-hyperclovax/HyperCLOVAX-SEED-Think-14B
InternLMForCausalLM InternLM internlm/internlm-7b, internlm/internlm-chat-7b, etc.
InternLM2ForCausalLM InternLM2 internlm/internlm2-7b, internlm/internlm2-chat-7b, etc.
InternLM3ForCausalLM InternLM3 internlm/internlm3-8b-instruct, etc.
IQuestCoderForCausalLM IQuestCoderV1 IQuestLab/IQuest-Coder-V1-40B-Instruct, etc.
IQuestLoopCoderForCausalLM IQuestLoopCoderV1 IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct, etc.
JAISLMHeadModel Jais inceptionai/jais-13b, inceptionai/jais-13b-chat, inceptionai/jais-30b-v3, inceptionai/jais-30b-chat-v3, etc.
Jais2ForCausalLM Jais2 inceptionai/Jais-2-8B-Chat, inceptionai/Jais-2-70B-Chat, etc.
JambaForCausalLM Jamba ai21labs/AI21-Jamba-1.5-Large, ai21labs/AI21-Jamba-1.5-Mini, ai21labs/Jamba-v0.1, etc.
KimiLinearForCausalLM Kimi-Linear-48B-A3B-Base, Kimi-Linear-48B-A3B-Instruct moonshotai/Kimi-Linear-48B-A3B-Base, moonshotai/Kimi-Linear-48B-A3B-Instruct
Lfm2ForCausalLM LFM2 LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc.
Lfm2MoeForCausalLM LFM2MoE LiquidAI/LFM2-8B-A1B-preview, etc.
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc.
LongcatFlashForCausalLM LongCat-Flash meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8
MambaForCausalLM Mamba state-spaces/mamba-130m-hf, state-spaces/mamba-790m-hf, state-spaces/mamba-2.8b-hf, etc.
Mamba2ForCausalLM Mamba2 mistralai/Mamba-Codestral-7B-v0.1, etc.
MiMoForCausalLM MiMo XiaomiMiMo/MiMo-7B-RL, etc.
MiMoV2FlashForCausalLM MiMoV2Flash XiaomiMiMo/MiMo-V2-Flash, etc.
MiniCPMForCausalLM MiniCPM openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc.
MiniCPM3ForCausalLM MiniCPM3 openbmb/MiniCPM3-4B, etc.
MiniMaxForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01-hf, etc.
MiniMaxM2ForCausalLM MiniMax-M2, MiniMax-M2.1 MiniMaxAI/MiniMax-M2, etc.
MistralForCausalLM Ministral-3, Mistral, Mistral-Instruct mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.
MistralLarge3ForCausalLM Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 mistralai/Mistral-Large-3-675B-Base-2512, mistralai/Mistral-Large-3-675B-Instruct-2512, etc.
MixtralForCausalLM Mixtral-8x7B, Mixtral-8x7B-Instruct mistralai/Mixtral-8x7B-v0.1, mistralai/Mixtral-8x7B-Instruct-v0.1, mistral-community/Mixtral-8x22B-v0.1, etc.
MPTForCausalLM MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter mosaicml/mpt-7b, mosaicml/mpt-7b-storywriter, mosaicml/mpt-30b, etc.
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base, mgoin/Nemotron-4-340B-Base-hf-FP8, etc.
NemotronHForCausalLM Nemotron-H nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc.
OlmoForCausalLM OLMo allenai/OLMo-1B-hf, allenai/OLMo-7B-hf, etc.
Olmo2ForCausalLM OLMo2 allenai/OLMo-2-0425-1B, etc.
Olmo3ForCausalLM OLMo3 allenai/Olmo-3-7B-Instruct, allenai/Olmo-3-32B-Think, etc.
OlmoHybridForCausalLM OLMo Hybrid allenai/Olmo-Hybrid-7B
OlmoeForCausalLM OLMoE allenai/OLMoE-1B-7B-0924, allenai/OLMoE-1B-7B-0924-Instruct, etc.
OPTForCausalLM OPT, OPT-IML facebook/opt-66b, facebook/opt-iml-max-30b, etc.
OrionForCausalLM Orion OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc.
OuroForCausalLM ouro ByteDance/Ouro-1.4B, ByteDance/Ouro-2.6B, etc.
PanguEmbeddedForCausalLM openPangu-Embedded-7B FreedomIntelligence/openPangu-Embedded-7B-V1.1
PanguProMoEV2ForCausalLM openpangu-pro-moe-v2 N/A
PanguUltraMoEForCausalLM openpangu-ultra-moe-718b-model FreedomIntelligence/openPangu-Ultra-MoE-718B-V1.1
Param2MoEForCausalLM param2moe bharatgenai/Param2-17B-A2.4B-Thinking, etc.
PhiForCausalLM Phi microsoft/phi-1_5, microsoft/phi-2, etc.
Phi3ForCausalLM Phi-4, Phi-3 microsoft/Phi-4-mini-instruct, microsoft/Phi-4, microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, microsoft/Phi-3-medium-128k-instruct, etc.
PhiMoEForCausalLM Phi-3.5-MoE microsoft/Phi-3.5-MoE-instruct, etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base, adept/persimmon-8b-chat, etc.
Plamo2ForCausalLM PLaMo2 pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc.
Plamo3ForCausalLM PLaMo3 pfnet/plamo-3-nict-2b-base, pfnet/plamo-3-nict-8b-base, etc.
QWenLMHeadModel Qwen Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc.
Qwen2ForCausalLM QwQ, Qwen2 Qwen/QwQ-32B-Preview, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2-7B, etc.
Qwen2MoeForCausalLM Qwen2MoE Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat, etc.
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B, etc.
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-30B-A3B, etc.
Qwen3NextForCausalLM Qwen3NextMoE Qwen/Qwen3-Next-80B-A3B-Instruct, etc.
RWForCausalLM Falcon RW tiiuae/falcon-40b, etc.
Rnj1ForCausalLM Rnj1 EssentialAI/rnj-1-instruct, etc.
SarvamMoEForCausalLM Sarvam 2 sarvamai/sarvam2-30b-a3b, etc.
SarvamMLAForCausalLM Sarvam 2 sarvamai/sarvam2-105b-a9b, etc.
SeedOssForCausalLM SeedOss ByteDance-Seed/Seed-OSS-36B-Instruct, etc.
SolarForCausalLM Solar Pro upstage/solar-pro-preview-instruct, etc.
StableLmForCausalLM StableLM stabilityai/stablelm-3b-4e1t, stabilityai/stablelm-base-alpha-7b-v2, etc.
StableLMEpochForCausalLM StableLM Epoch stabilityai/stablelm-zephyr-3b, etc.
Starcoder2ForCausalLM Starcoder2 bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc.
Step1ForCausalLM Step-Audio stepfun-ai/Step-Audio-EditX, etc.
Step3p5ForCausalLM Step-3.5-flash stepfun-ai/Step-3.5-Flash, etc.
TeleChatForCausalLM TeleChat chuhac/TeleChat2-35B, etc.
TeleChat2ForCausalLM TeleChat2 Tele-AI/TeleChat2-3B, Tele-AI/TeleChat2-7B, Tele-AI/TeleChat2-35B, etc.
TeleChat3ForCausalLM TeleChat3 Tele-AI/TeleChat3-36B-Thinking, Tele-AI/TeleChat3-Coder-36B-Thinking, etc.
TeleFLMForCausalLM TeleFLM CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc.
XverseForCausalLM XVERSE xverse/XVERSE-7B-Chat, xverse/XVERSE-13B-Chat, xverse/XVERSE-65B-Chat, etc.
MiniMaxM1ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80k, etc.
MiniMaxText01ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01, etc.
Zamba2ForCausalLM Zamba2 Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc.
SmolLM3ForCausalLM SmolLM3 HuggingFaceTB/SmolLM3-3B
表2 纯文本语言模型 | 池化模型 | 嵌入
架构 模型 HuggingFace模型示例
BertModel BERT-based BAAI/bge-base-en-v1.5, Snowflake/snowflake-arctic-embed-xs, etc.
BertSpladeSparseEmbeddingModel SPLADE naver/splade-v3
ErnieModel BERT-like Chinese ERNIE shibing624/text2vec-base-chinese-sentence
Gemma2ModelC Gemma 2-based BAAI/bge-multilingual-gemma2, etc.
Gemma3TextModelC Gemma 3-based google/embeddinggemma-300m, etc.
GritLM GritLM parasail-ai/GritLM-7B-vllm.
GteModel Arctic-Embed-2.0-M Snowflake/snowflake-arctic-embed-m-v2.0.
GteNewModel mGTE-TRM (see note) Alibaba-NLP/gte-multilingual-base, etc.
JinaEmbeddingsV5ModelC Qwen3-based with task-specific LoRA adapters jinaai/jina-embeddings-v5-text-small (see note)
LlamaBidirectionalModelC Llama-based with bidirectional attention nvidia/llama-nemotron-embed-1b-v2, etc.
LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. Llama-based intfloat/e5-mistral-7b-instruct, etc.
ModernBertModel ModernBERT-based Alibaba-NLP/gte-modernbert-base, etc.
NomicBertModel Nomic BERT nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc.
Qwen2ModelC, Qwen2ForCausalLMC Qwen2-based ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc.
Qwen3ModelC, Qwen3ForCausalLMC Qwen3-based Qwen/Qwen3-Embedding-0.6B, etc.
RobertaModel, RobertaForMaskedLM RoBERTa-based sentence-transformers/all-roberta-large-v1, etc.
VoyageQwen3BidirectionalEmbedModelC Voyage Qwen3-based with bidirectional attention voyageai/voyage-4-nano, etc.
XLMRobertaModel XLMRobertaModel-based BAAI/bge-m3 (see note), intfloat/multilingual-e5-base, jinaai/jina-embeddings-v3 (see note), etc.
*ModelC, *ForCausalLMC, etc. Generative models N/A
表3 纯文本语言模型 | 池化模型 | 奖励
架构 模型 HuggingFace模型示例
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev, etc.
Qwen3ForSequenceClassificationC Qwen3-based Skywork/Skywork-Reward-V2-Qwen3-0.6B, etc.
LlamaForSequenceClassificationC Llama-based Skywork/Skywork-Reward-V2-Llama-3.2-1B, etc.
*ModelC, *ForCausalLMC, etc. Generative models N/A
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward, internlm/internlm2-7b-reward, etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B, etc.
LlamaForCausalLM Llama-based peiyi9979/math-shepherd-mistral-7b-prm, etc.
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B, etc.
表4 纯文本语言模型 | 池化模型 | 分类
架构 模型 HuggingFace模型示例
ErnieForSequenceClassification BERT-like Chinese ERNIE Forrest20231206/ernie-3.0-base-zh-cls
GPT2ForSequenceClassification GPT2 nie3e/sentiment-polish-gpt2-small
Qwen2ForSequenceClassificationC Qwen2-based jason9693/Qwen2.5-1.5B-apeach
*ModelC, *ForCausalLMC, etc. Generative models N/A
表5 纯文本语言模型 | 池化模型 | 交叉编码/重排序
架构 模型 HuggingFace模型示例
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2, etc.
GemmaForSequenceClassification Gemma-based BAAI/bge-reranker-v2-gemma(see note), etc.
GteNewForSequenceClassification mGTE-TRM (see note) Alibaba-NLP/gte-multilingual-reranker-base, etc.
LlamaBidirectionalForSequenceClassificationC Llama-based with bidirectional attention nvidia/llama-nemotron-rerank-1b-v2, etc.
Qwen2ForSequenceClassificationC Qwen2-based mixedbread-ai/mxbai-rerank-base-v2(see note), etc.
Qwen3ForSequenceClassificationC Qwen3-based tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B(see note), etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base, etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3, etc.
*ModelC, *ForCausalLMC, etc. Generative models N/A
表6 纯文本语言模型 | 池化模型 | Token分类
架构 模型 HuggingFace模型示例
BertForTokenClassification bert-based boltuix/NeuroBERT-NER (see note), etc.
ErnieForTokenClassification BERT-like Chinese ERNIE gyr66/Ernie-3.0-base-chinese-finetuned-ner
ModernBertForTokenClassification ModernBERT-based disham993/electrical-ner-ModernBERT-base
Qwen3ForTokenClassificationC Qwen3-based bd2lcco/Qwen3-0.6B-finetuned
*ModelC, *ForCausalLMC, etc. Generative models N/A
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward, internlm/internlm2-7b-reward, etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B, etc.
表7 多模态模型 | 生成模型 | 文本生成
架构 模型 输入 HuggingFace模型示例
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
AudioFlamingo3ForConditionalGeneration AudioFlamingo3 T + A nvidia/audio-flamingo-3-hf, nvidia/music-flamingo-hf
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereLabs/aya-vision-8b, CohereLabs/aya-vision-32b, etc.
BagelForConditionalGeneration BAGEL T + I+ ByteDance-Seed/BAGEL-7B-MoT
BeeForConditionalGeneration Bee-8B T + IE+ Open-Bee/Bee-8B-RL, Open-Bee/Bee-8B-SFT
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b, etc.
CheersForConditionalGeneration Cheers T + I ai9stars/Cheers
Cohere2VisionForConditionalGeneration Command A Vision T + I+ CohereLabs/command-a-vision-07-2025, etc.
DeepseekVLV2ForCausalLM DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2, etc.
DeepseekOCRForCausalLM DeepSeek-OCR T + I+ deepseek-ai/DeepSeek-OCR, etc.
DeepseekOCR2ForCausalLM DeepSeek-OCR-2 T + I+ deepseek-ai/DeepSeek-OCR-2, etc.
Eagle2_5_VLForConditionalGeneration Eagle2.5-VL T + IE+ nvidia/Eagle2.5-8B, etc.
Ernie4_5_VLMoeForConditionalGeneration Ernie4.5-VL T + I+/ V+ baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT
Exaone4_5_ForConditionalGeneration EXAONE-4.5 T + IE+ LGAI-EXAONE/EXAONE-4.5-33B, etc.
FuyuForCausalLM Fuyu T + I adept/fuyu-8b, etc.
Gemma3ForConditionalGeneration Gemma 3 T + IE+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
Gemma3nForConditionalGeneration Gemma 3n T + I + A google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
Gemma4ForConditionalGeneration Gemma 4 T + I+ + V + A* google/gemma-4-E2B-it, etc.
GLM4VForCausalLM^ GLM-4V T + I zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc.
Glm4vForConditionalGeneration GLM-4.1V-Thinking T + IE+ + VE+ zai-org/GLM-4.1V-9B-Thinking, etc.
Glm4vMoeForConditionalGeneration GLM-4.5V T + IE+ + VE+ zai-org/GLM-4.5V, etc.
GlmOcrForConditionalGeneration GLM-OCR T + IE+ zai-org/GLM-OCR, etc.
Granite4VisionForConditionalGeneration Granite 4 Vision T + IE+ ibm-granite/granite-4.1-3b-vision, etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
HCXVisionForCausalLM HyperCLOVAX-SEED-Vision-Instruct-3B T + I+ + V+ naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B
HCXVisionV2ForCausalLM HyperCLOVAX-SEED-Think-32B T + I+ + V+ naver-hyperclovax/HyperCLOVAX-SEED-Think-32B
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
HunYuanVLForConditionalGeneration HunyuanOCR T + IE+ tencent/HunyuanOCR, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3, etc.
IsaacForConditionalGeneration Isaac T + I+ PerceptronAI/Isaac-0.1
InternS1ForConditionalGeneration Intern-S1 T + IE+ + VE+ internlm/Intern-S1, internlm/Intern-S1-mini, etc.
InternS1ProForConditionalGeneration Intern-S1-Pro T + IE+ + VE+ internlm/Intern-S1-Pro, etc.
InternVLChatModel InternVL 3.5, InternVL 3.0, InternVideo 2.5, InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE+ + (VE+) OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
InternVLForConditionalGeneration InternVL 3.0 (HF format) T + IE+ + VE+ OpenGVLab/InternVL3-1B-hf, etc.
KananaVForConditionalGeneration Kanana-V T + I+ kakaocorp/kanana-1.5-v-3b-instruct, etc.
KeyeForConditionalGeneration Keye-VL-8B-Preview T + IE+ + VE+ Kwai-Keye/Keye-VL-8B-Preview
KeyeVL1_5ForConditionalGeneration Keye-VL-1_5-8B T + IE+ + VE+ Kwai-Keye/Keye-VL-1_5-8B
KimiAudioForConditionalGeneration Kimi-Audio T + A+ moonshotai/Kimi-Audio-7B-Instruct
KimiK25ForConditionalGeneration Kimi-K2.5 T + I+ moonshotai/Kimi-K2.5
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
LightOnOCRForConditionalGeneration LightOnOCR-1B T + I+ lightonai/LightOnOCR-1B, etc
Lfm2VlForConditionalGeneration LFM2-VL T + I+ LiquidAI/LFM2-VL-450M, LiquidAI/LFM2-VL-3B, LiquidAI/LFM2-VL-8B-A1B, etc.
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
Llama_Nemotron_Nano_VL Llama Nemotron Nano VL T + IE+ nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
LlavaForConditionalGeneration LLaVA-1.5, Pixtral (HF Transformers) T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), mistral-community/pixtral-12b, etc.
LlavaNextForConditionalGeneration LLaVA-NeXT, Granite Vision T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, ibm-granite/granite-vision-3.3-2b, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiDashengLMModel MiDashengLM T + A+ mispeech/midashenglm-7b
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, openbmb/MiniCPM-V-4, openbmb/MiniCPM-V-4_5, etc.
MiniMaxVL01ForConditionalGeneration MiniMax-VL T + IE+ MiniMaxAI/MiniMax-VL-01, etc.
Mistral3ForConditionalGeneration Mistral3 (HF Transformers) T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MolmoForCausalLM Molmo T + I+ allenai/Molmo-7B-D-0924, allenai/Molmo-7B-O-0924, etc.
Molmo2ForConditionalGeneration Molmo2 T + I+ / V allenai/Molmo2-4B, allenai/Molmo2-8B, allenai/Molmo2-O-7B
MusicFlamingoForConditionalGeneration MusicFlamingo T + A nvidia/music-flamingo-2601-hf, nvidia/music-flamingo-think-2601-hf
NVLM_D_Model NVLM-D 1.0 T + I+ nvidia/NVLM-D-72B, etc.
OpenCUAForConditionalGeneration OpenCUA-7B T + IE+ xlangai/OpenCUA-7B
OpenPanguVLForConditionalGeneration openpangu-VL T + IE+ + VE+ FreedomIntelligence/openPangu-VL-7B
Ovis Ovis2, Ovis1.6 T + I+ AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc.
Ovis2_5 Ovis2.5 T + I+ + V AIDC-AI/Ovis2.5-9B, etc.
Ovis2_6ForCausalLM Ovis2.6 T + I+ + V AIDC-AI/Ovis2.6-2B, etc.
Ovis2_6_MoeForCausalLM Ovis2.6 T + I+ + V AIDC-AI/Ovis2.6-30B-A3B, etc.
PaddleOCRVLForConditionalGeneration Paddle-OCR T + I+ PaddlePaddle/PaddleOCR-VL, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+ / I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
Phi4ForCausalLMV Phi-4-reasoning-vision T + I+ microsoft/Phi-4-reasoning-vision-15B, etc.
PixtralForConditionalGeneration Ministral 3 (Mistral format), Mistral 3 (Mistral format), Mistral Large 3 (Mistral format), Pixtral (Mistral format) T + I+ mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-Small-3.1-24B-Instruct-2503, mistralai/Mistral-Large-3-675B-Instruct-2512 mistralai/Pixtral-12B-2409 etc.
QwenVLForConditionalGeneration^ Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-3B, Qwen/Qwen2.5-Omni-7B
Qwen3_5ForConditionalGeneration Qwen3.5 T + IE+ + VE+ Qwen/Qwen3.5-9B-Instruct, etc.
Qwen3_5MoeForConditionalGeneration Qwen3.5-MOE T + IE+ + VE+ Qwen/Qwen3.5-35B-A3B-Instruct, etc.
Qwen3VLForConditionalGeneration Qwen3-VL T + IE+ + VE+ Qwen/Qwen3-VL-4B-Instruct, etc.
Qwen3VLMoeForConditionalGeneration Qwen3-VL-MOE T + IE+ + VE+ Qwen/Qwen3-VL-30B-A3B-Instruct, etc.
Qwen3OmniMoeThinkerForConditionalGeneration Qwen3-Omni T + IE+ + VE+ + A+ Qwen/Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking
Qwen3ASRForConditionalGeneration Qwen3-ASR T + A+ Qwen/Qwen3-ASR-1.7B
RForConditionalGeneration R-VL-4B T + IE+ YannQi/R-4B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
Step3VLForConditionalGeneration Step3-VL T + I+ stepfun-ai/step3
StepVLForConditionalGeneration Step3-VL-10B T + I+ stepfun-ai/Step3-VL-10B
TarsierForConditionalGeneration Tarsier T + IE+ omni-search/Tarsier-7b, omni-search/Tarsier-34b
Tarsier2ForConditionalGeneration^ Tarsier2 T + IE+ + VE+ omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115
UltravoxModel Ultravox T + AE+ fixie-ai/ultravox-v0_5-llama-3_2-1b
Emu3ForConditionalGeneration Emu3 T + I BAAI/Emu3-Chat-hf
表8 多模态模型 | 生成模型 | 文本转换
架构 模型 HuggingFace模型示例
CohereAsrForConditionalGeneration Cohere-Transcribe CohereLabs/cohere-transcribe-03-2026
FireRedASR2ForConditionalGeneration FireRedASR2 allendou/FireRedASR2-LLM-vllm, etc.
FireRedLIDForConditionalGeneration FireRedLID PatchyTisa/FireRedLID-vllm, etc.
FunASRForConditionalGeneration FunASR allendou/Fun-ASR-Nano-2512-vllm, etc.
Gemma3nForConditionalGeneration Gemma3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
GlmAsrForConditionalGeneration GLM-ASR zai-org/GLM-ASR-Nano-2512
GraniteSpeechForConditionalGeneration Granite Speech ibm-granite/granite-4.0-1b-speech, ibm-granite/granite-speech-3.3-2b, etc.
Qwen3ASRForConditionalGeneration Qwen3-ASR Qwen/Qwen3-ASR-1.7B, etc.
Qwen3OmniMoeThinkerForConditionalGeneration Qwen3-Omni Qwen/Qwen3-Omni-30B-A3B-Instruct, etc.
VoxtralForConditionalGeneration Voxtral (Mistral format) mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc.
WhisperForConditionalGeneration Whisper openai/whisper-small, openai/whisper-large-v3-turbo, etc.
表9 多模态模型 | 生成模型 | 实时文本转换
架构 模型 HuggingFace模型示例
VoxtralRealtimeGeneration Voxtral Realtime mistralai/Voxtral-Mini-4B-Realtime-2602
Qwen3ASRRealtimeGeneration Qwen3-ASR Realtime Qwen/Qwen3-ASR-0.6B
表10 多模态模型 | 池化模型 | 嵌入
架构 模型 输入 HuggingFace模型示例
CLIPModel CLIP T / I openai/clip-vit-base-patch32, openai/clip-vit-large-patch14, etc.
LlamaNemotronVLModel Llama Nemotron Embedding + SigLIP T + I nvidia/llama-nemotron-embed-vl-1b-v2
LlavaNextForConditionalGenerationC LLaVA-NeXT-based T / I royokong/e5-v
Phi3VForCausalLMC Phi-3-Vision-based T + I TIGER-Lab/VLM2Vec-Full
Qwen3VLForConditionalGenerationC Qwen3-VL T + I + V Qwen/Qwen3-VL-Embedding-2B, etc.
SiglipModel SigLIP, SigLIP2 T / I google/siglip-base-patch16-224, google/siglip2-base-patch16-224
*ForConditionalGenerationC, *ForCausalLMC, etc. Generative models * N/A
表11 多模态模型 | 池化模型 | 分类
架构 模型 输入 HuggingFace模型示例
Qwen2_5_VLForSequenceClassificationC Qwen2_5_VL-based T + IE+ + VE+ muziyongshixin/Qwen2.5-VL-7B-for-VideoCls
*ForConditionalGenerationC, *ForCausalLMC, etc. Generative models * N/A
表12 多模态模型 | 池化模型 | 交叉编码/重排序
架构 模型 输入 HuggingFace模型示例
JinaVLForSequenceClassification JinaVL-based T + IE+ jinaai/jina-reranker-m0, etc.
LlamaNemotronVLForSequenceClassification Llama Nemotron Reranker + SigLIP T + IE+ nvidia/llama-nemotron-rerank-vl-1b-v2
Qwen3VLForSequenceClassification Qwen3-VL-Reranker T + IE+ + VE+ Qwen/Qwen3-VL-Reranker-2B(see note), etc.
表13 多模态模型 | 池化模型 | Token分类
架构 模型 输入 HuggingFace模型示例
Qwen3ASRForcedAlignerForTokenClassification Qwen3-ForcedAligner T + A+ Qwen/Qwen3-ForcedAligner-0.6B (see note)
说明:
  • C表示该模型可通过--convert转换为对应池化任务。
  • *表示模型功能和原始模型一致。
  • 模态说明:Text表示文本,Image表示图片,Video表示视频,Audio表示音频。
  • +表示支持同时输入多种模态;/表示支持多种模态,但多种模态不可同时使用。
  • E表示可为该模态输入预计算嵌入。

vLLM 0.17.1

以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
表14 纯文本语言模型 | 生成模型 | 文本生成
架构 模型 HuggingFace模型示例
LongcatFlashForCausalLM LongCat-Flash meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8
Zamba2ForCausalLM Zamba2 Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc.
MiniMaxText01ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01, etc.
MiniMaxM1ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80k, etc.
XverseForCausalLM XVERSE xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc.
TeleFLMForCausalLM TeleFLM CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc.
TeleChat2ForCausalLM TeleChat2 TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc.
TeleChatForCausalLM TeleChat chuhac/TeleChat2-35B, etc.
Step1ForCausalLM Step-Audio stepfun-ai/Step-Audio-EditX, etc.
Step3p5ForCausalLM Step-3.5-flash stepfun-ai/step-3.5-flash, etc.
SolarForCausalLM Solar Pro upstage/solar-pro-preview-instruct , etc.
Starcoder2ForCausalLM Starcoder2 bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc.
StableLMEpochForCausalLM StableLM Epoch stabilityai/stablelm-zephyr-3b, etc.
StableLmForCausalLM StableLM stabilityai/stablelm-3b-4e1t, stabilityai/stablelm-base-alpha-7b-v2, etc.
SeedOssForCausalLM SeedOss ByteDance-Seed/Seed-OSS-36B-Instruct, etc.
RWForCausalLM Falcon RW tiiuae/falcon-40b, etc.
QWenLMHeadModel Qwen Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc.
Qwen2MoeForCausalLM Qwen2MoE Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc.
Qwen2ForCausalLM QwQ, Qwen2 Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc.
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B, etc.
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-MoE-15B-A2B, etc.
Qwen3NextForCausalLM Qwen3NextMoE Qwen/Qwen3-Next-80B-A3B-Instruct, etc.
Plamo3ForCausalLM PLaMo3 pfnet/plamo-3-nict-2b-base, pfnet/plamo-3-nict-8b-base, etc.
Plamo2ForCausalLM PLaMo2 pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base, adept/persimmon-8b-chat, etc.
PhiMoEForCausalLM Phi-3.5-MoE microsoft/Phi-3.5-MoE-instruct , etc.
PhiForCausalLM Phi microsoft/phi-1_5 , microsoft/phi-2 , etc.
Phi3ForCausalLM Phi-4, Phi-3 microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc.
PanguUltraMoEForCausalLM openpangu-ultra-moe-718b-model FreedomIntelligence/openPangu-Ultra-MoE-718B-V1.1
PanguProMoEV2ForCausalLM openpangu-pro-moe-v2 -
PanguEmbeddedForCausalLM openPangu-Embedded-7B FreedomIntelligence/openPangu-Embedded-7B-V1.1
OuroForCausalLM ouro OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc.
OrionForCausalLM Orion OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc.
OPTForCausalLM OPT, OPT-IML facebook/opt-66b , facebook/opt-iml-max-30b , etc.
OlmoForCausalLM OLMo allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc.
OlmoeForCausalLM OLMoE allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc.
Olmo2ForCausalLM OLMo2 allenai/OLMo2-7B-1124 , etc.
Olmo3ForCausalLM OLMo3 TBA
NemotronHForCausalLM Nemotron-H nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc.
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc.
MPTForCausalLM MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc.
MixtralForCausalLM Mixtral-8x7B, Mixtral-8x7B-Instruct mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc.
MistralLarge3ForCausalLM Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 mistralai/Mistral-Large-3-675B-Base-2512, mistralai/Mistral-Large-3-675B-Instruct-2512, etc.
MistralForCausalLM Ministral-3, Mistral, Mistral-Instruct mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc.
MiniMaxForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01-hf, etc.
MiniMaxM2ForCausalLM MiniMax-M2, MiniMax-M2.1 MiniMaxAI/MiniMax-M2, etc.
MiniCPM3ForCausalLM MiniCPM3 openbmb/MiniCPM3-4B , etc.
MiniCPMForCausalLM MiniCPM openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc.
MiMoV2FlashForCausalLM MiMoV2Flash XiaomiMiMo/MiMo-V2-Flash, etc.
MiMoForCausalLM MiMo XiaomiMiMo/MiMo-7B-RL, etc.
MambaForCausalLM Mamba state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc.
Mamba2ForCausalLM Mamba2 mistralai/Mamba-Codestral-7B-v0.1, etc.
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc.
Lfm2MoeForCausalLM LFM2MoE LiquidAI/LFM2-8B-A1B-preview, etc.
Lfm2ForCausalLM LFM2 LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc.
KimiLinearForCausalLM Kimi-Linear-48B-A3B-Base, Kimi-Linear-48B-A3B-Instruct moonshotai/Kimi-Linear-48B-A3B-Base, moonshotai/Kimi-Linear-48B-A3B-Instruct
JambaForCausalLM Jamba ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc.
Jais2ForCausalLM Jais2 inceptionai/Jais-2-8B-Chat, inceptionai/Jais-2-70B-Chat, etc.
JAISLMHeadModel Jais inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc.
IQuestCoderForCausalLM IQuestCoderV1 IQuestLab/IQuest-Coder-V1-40B-Instruct, etc.
IQuestLoopCoderForCausalLM IQuestLoopCoderV1 IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct, etc.
InternLM3ForCausalLM InternLM3 internlm/internlm3-8b-instruct , etc.
InternLM2ForCausalLM InternLM2 internlm/internlm2-7b , internlm/internlm2-chat-7b , etc.
InternLMForCausalLM InternLM internlm/internlm-7b, internlm/internlm-chat-7b, etc.
HunYuanMoEV1ForCausalLM Hunyuan-A13B tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc.
HunYuanDenseV1ForCausalLM Hunyuan Dense tencent/Hunyuan-7B-Instruct-0124
Grok1ForCausalLM Grok2 xai-org/grok-2
Grok1ModelForCausalLM Grok1 hpcai-tech/grok-1.
GritLM GritLM parasail-ai/GritLM-7B-vllm .
GraniteMoeSharedForCausalLM Granite MoE Shared ibm-research/moe-7b-1b-active-shared-experts (test model)
GraniteMoeHybridForCausalLM Granite 4.0 MoE Hybrid ibm-granite/granite-4.0-tiny-preview, etc.
GraniteMoeForCausalLM Granite 3.0 MoE, PowerMoE ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc.
GraniteForCausalLM Granite 3.0, Granite 3.1, PowerLM ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc.
GptOssForCausalLM GPT-OSS openai/gpt-oss-120b, openai/gpt-oss-20b
GPTNeoXForCausalLM GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc.
GPTJForCausalLM GPT-J EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc.
GPTBigCodeForCausalLM StarCoder, SantaCoder, WizardCoder bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc.
GPT2LMHeadModel GPT-2 gpt2 , gpt2-xl , etc.
Glm4MoeLiteForCausalLM GLM-4.7-Flash zai-org/GLM-4.7-Flash, etc.
Glm4MoeForCausalLM GLM-4.5, GLM-4.6, GLM-4.7 zai-org/GLM-4.5, etc.
Glm4ForCausalLM GLM-4-0414 THUDM/GLM-4-32B-0414, etc.
GlmForCausalLM GLM-4 THUDM/glm-4-9b-chat-hf , etc.
Gemma3nForCausalLM Gemma 3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it, etc.
Gemma2ForCausalLM Gemma 2 google/gemma-2-9b, google/gemma-2-27b, etc.
GemmaForCausalLM Gemma google/gemma-2b , google/gemma-7b , etc.
FlexOlmoForCausalLM FlexOlmo allenai/FlexOlmo-7x7B-1T, allenai/FlexOlmo-7x7B-1T-RT, etc.
FalconH1ForCausalLM Falcon-H1 tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc.
FalconMambaForCausalLM FalconMamba tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc.
FalconForCausalLM Falcon tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc.
Fairseq2LlamaForCausalLM Llama (fairseq2 format) mgleize/fairseq2-dummy-Llama-3.2-1B, etc.
Exaone4ForCausalLM EXAONE-4 LGAI-EXAONE/EXAONE-4.0-32B, etc.
ExaoneMoEForCausalLM K-EXAONE LGAI-EXAONE/K-EXAONE-236B-A23B, etc.
ExaoneForCausalLM EXAONE-3 LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc.
Ernie4_5_MoeForCausalLM Ernie4.5MoE baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc.
Ernie4_5ForCausalLM Ernie4.5 baidu/ERNIE-4.5-0.3B-PT, etc.
DotsOCRForCausalLM dots_ocr rednote-hilab/dots.ocr
Dots1ForCausalLM dots.llm1 rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc.
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc.
DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc.
DeepseekForCausalLM DeepSeek deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc.
DeciLMForCausalLM DeciLM Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc.
DbrxForCausalLM DBRX databricks/dbrx-base , databricks/dbrx-instruct , etc.
CwmForCausalLM CWM facebook/cwm, etc.
CohereForCausalLM , Cohere2ForCausalLM Command-R, Command-A CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc.
ChatGLMModel, ChatGLMForConditionalGeneration ChatGLM THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc.
BloomForCausalLM BLOOM, BLOOMZ, BLOOMChat bigscience/bloom , bigscience/bloomz , etc.
BambaForCausalLM Bamba ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B
BailingMoeV2ForCausalLM Ling inclusionAI/Ling-mini-2.0, etc.
BailingMoeForCausalLM Ling inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc.
BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc.
AXK1ForCausalLM A.X-K1 skt/A.X-K1, etc.
ArcticForCausalLM Arctic Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc.
ArceeForCausalLM Arcee (AFM) arcee-ai/AFM-4.5B-Base, etc.
AquilaForCausalLM Aquila, Aquila2 BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc.
ApertusForCausalLM Apertus swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc.
AfmoeForCausalLM Afmoe TBA
BailingMoeV2_5ForCausalLM Ling-V2.5 / Ring-V2.5 inclusionAI/Ling-mini-2.5, inclusionAI/Ring-mini-2.5, etc.
SmolLM3ForCausalLM SmolLM3 HuggingFaceTB/SmolLM3-3B, etc.
表15 纯文本语言模型 | 池化模型 | 嵌入
架构 模型 HuggingFace模型示例
BertModelC BERT-based BAAI/bge-base-en-v1.5 , etc.
BertSpladeSparseEmbeddingModel SPLADE naver/splade-v3
Gemma2ModelC Gemma 2-based BAAI/bge-multilingual-gemma2 , etc.
Gemma3TextModelC Gemma 3-based google/embeddinggemma-300m, etc.
GritLM GritLM parasail-ai/GritLM-7B-vllm.
GteModelC Arctic-Embed-2.0-M Snowflake/snowflake-arctic-embed-m-v2.0.
GteNewModelC mGTE-TRM Alibaba-NLP/gte-multilingual-base, etc.
ModernBertModelC ModernBERT-based Alibaba-NLP/gte-modernbert-base, etc.
NomicBertModelC Nomic BERT nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc.
LlamaBidirectionalModelC Llama-based with bidirectional attention nvidia/llama-nemotron-embed-1b-v2, etc.
LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. Llama-based intfloat/e5-mistral-7b-instruct , etc.
Qwen2ModelC, Qwen2ForCausalLMC Qwen2-based ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc.
Qwen3ModelC, Qwen3ForCausalLMC Qwen3-based Qwen/Qwen3-Embedding-0.6B, etc.
RobertaModel , RobertaForMaskedLM RoBERTa-based sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc.
VoyageQwen3BidirectionalEmbedModelC Voyage Qwen3-based with bidirectional attention voyageai/voyage-4-nano, etc.
*ModelC, *ForCausalLMCC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert embed转换为嵌入模型。
  • *表示模型功能和原始模型一致。
表16 纯文本语言模型 | 池化模型 | 奖励
架构 模型 HuggingFace模型示例
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc.
LlamaForCausalLMC Llama-based peiyi9979/math-shepherd-mistral-7b-prm , etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B , etc.
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc.
说明:
  • C表示该模型可通过--convert reward转换为奖励模型。
  • *表示模型功能和原始模型一致。
表17 纯文本语言模型 | 池化模型 | 分类 ( --task classify)
架构 模型 HuggingFace模型示例
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev , etc.
GPT2ForSequenceClassification GPT2 nie3e/sentiment-polish-gpt2-small
*ModelC, *ForCausalLMC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert classify转换为分类模型。
  • *表示模型功能和原始模型一致。
表18 纯文本语言模型 | 池化模型 | 交叉编码/重排序
架构 模型 HuggingFace模型示例
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2 , etc.
GemmaForSequenceClassification Gemma-based BAAI/bge-reranker-v2-gemma, etc.
GteNewForSequenceClassification mGTE-TRM Alibaba-NLP/gte-multilingual-reranker-base, etc.
LlamaBidirectionalForSequenceClassificationC Llama-based with bidirectional attention nvidia/llama-nemotron-rerank-1b-v2, etc.
Qwen2ForSequenceClassificationC Qwen2-based mixedbread-ai/mxbai-rerank-base-v2, etc.
Qwen3ForSequenceClassificationC Qwen3-based tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base , etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3 , etc.
*ModelC, *ForCausalLMC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert classify转换为分类模型。
  • *表示模型功能和原始模型一致。
表19 纯文本语言模型 | 池化模型 | Token分类
架构 模型 HuggingFace模型示例
BertForTokenClassification bert-based boltuix/NeuroBERT-NER, etc.
ModernBertForTokenClassification ModernBERT-based disham993/electrical-ner-ModernBERT-base
表20 多模态模型 | 生成模型 | 文本生成
架构 模型 输入 HuggingFace模型示例 说明
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
    • E:该模态下,支持输入预计算的嵌入
    • + :该模态下,每个文本 Prompt 支持输入多条
AudioFlamingo3ForConditionalGeneration AudioFlamingo3 T + A+ nvidia/audio-flamingo-3-hf, nvidia/music-flamingo-hf
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc.
BagelForConditionalGeneration BAGEL T + I+ ByteDance-Seed/BAGEL-7B-MoT
BeeForConditionalGeneration Bee-8B T + IE+
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b etc.
Cohere2VisionForConditionalGeneration Command A Vision T + I+ CohereLabs/command-a-vision-07-2025, etc.
DeepseekVLV2ForCausalLM DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc.
DeepseekOCRForCausalLM DeepSeek-OCR T + I+ deepseek-ai/DeepSeek-OCR, etc.
DeepseekOCR2ForCausalLM DeepSeek-OCR-2 T + I+ deepseek-ai/DeepSeek-OCR-2, etc.
Eagle2_5_VLForConditionalGeneration Eagle2.5-VL T + IE+ nvidia/Eagle2.5-8B, etc.
Ernie4_5_VLMoeForConditionalGeneration Ernie4.5-VL T + I+/ V+ baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT
FuyuForCausalLM Fuyu T + I adept/fuyu-8b etc.
Gemma3ForConditionalGeneration Gemma 3 T + I+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
Gemma3nForConditionalGeneration Gemma 3n T + I + A google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
GLM4VForCausalLM^ GLM-4V T + I zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc.
Glm4vForConditionalGeneration GLM-4.1V-Thinking T + IE+ + VE+ zai-org/GLM-4.1V-9B-Thinking, etc.
Glm4vMoeForConditionalGeneration GLM-4.5V T + IE+ + VE+ zai-org/GLM-4.5V, etc.
GlmOcrForConditionalGeneration GLM-OCR T + IE+ zai-org/GLM-OCR, etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
HCXVisionForCausalLM HyperCLOVAX-SEED-Vision-Instruct-3B T + I+ + V+ naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B
HunYuanVLForConditionalGeneration HunyuanOCR T + IE+ tencent/HunyuanOCR, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3 etc.
InternS1ForConditionalGeneration Intern-S1 T + IE+ + VE+ internlm/Intern-S1, etc.
InternS1ProForConditionalGeneration Intern-S1-Pro T + IE+ + VE+ internlm/Intern-S1-Pro, etc.
InternVLChatModel InternVL 3.5, InternVL 3.0, InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE++ (VE+) OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
InternVLForConditionalGeneration InternVL 3.0 (HF format) T + IE+ + VE+ OpenGVLab/InternVL3-1B-hf, etc.
IsaacForConditionalGeneration Isaac T + I+ PerceptronAI/Isaac-0.1
KananaVForConditionalGeneration Kanana-V T + I+ kakaocorp/kanana-1.5-v-3b-instruct, etc.
KeyeForConditionalGeneration Keye-VL-8B-Preview T + IE+ + VE+ Kwai-Keye/Keye-VL-8B-Preview
KeyeVL1_5ForConditionalGeneration Keye-VL-1_5-8B T + IE+ + VE+ Kwai-Keye/Keye-VL-1_5-8B
KimiK25ForConditionalGeneration Kimi-K2.5 T + I+ moonshotai/Kimi-K2.5
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
LightOnOCRForConditionalGeneration LightOnOCR-1B T + I+ lightonai/LightOnOCR-1B, etc
Lfm2VlForConditionalGeneration LFM2-VL T + I+ LiquidAI/LFM2-VL-450M, LiquidAI/LFM2-VL-3B, LiquidAI/LFM2-VL-8B-A1B, etc.
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
Llama_Nemotron_Nano_VL Llama Nemotron Nano VL T + IE+ nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
LlavaForConditionalGeneration LLaVA-1.5, Pixtral (HF Transformers) T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc.
LlavaNextForConditionalGeneration LLaVA-NeXT T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiDashengLMModel MiDashengLM T + A+ mispeech/midashenglm-7b
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc.
MiniMaxVL01ForConditionalGeneration MiniMax-VL T + IE+ MiniMaxAI/MiniMax-VL-01, etc.
Mistral3ForConditionalGeneration Mistral3 (HF Transformers) T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MolmoForCausalLM Molmo T + I+ allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc.
Molmo2ForConditionalGeneration Molmo2 T + I+/ V allenai/Molmo2-4B, allenai/Molmo2-8B, allenai/Molmo2-O-7B
NVLM_D_Model NVLM-D 1.0 T + IE+ nvidia/NVLM-D-72B, etc.
OpenCUAForConditionalGeneration OpenCUA-7B T + IE+ xlangai/OpenCUA-7B
OpenPanguVLForConditionalGeneration openpangu-VL T + IE+ + VE+ FreedomIntelligence/openPangu-VL-7B
Ovis Ovis2, Ovis1.6 T + I+ AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc.
Ovis2_5 Ovis2.5 T + I+ + V AIDC-AI/Ovis2.5-9B, etc.
Ovis2_6ForCausalLM Ovis2.6 T + I+ + V AIDC-AI/Ovis2.6-2B, etc.
Ovis2_6_MoeForCausalLM Ovis2.6 T + I+ + V AIDC-AI/Ovis2.6-30B-A3B, etc.
PaddleOCRVLForConditionalGeneration Paddle-OCR T + I+ PaddlePaddle/PaddleOCR-VL, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+/ I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
PixtralForConditionalGeneration Ministral 3 (Mistral format), Mistral 3 (Mistral format), Mistral Large 3 (Mistral format), Pixtral (Mistral format) T + I+ mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc.
QwenVLForConditionalGeneration Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-7B
Qwen3VLForConditionalGeneration Qwen3-VL T + IE+ + VE+ Qwen/Qwen3-VL-4B-Instruct, etc.
Qwen3VLMoeForConditionalGeneration Qwen3-VL-MOE T + IE+ + VE+ Qwen/Qwen3-VL-30B-A3B-Instruct, etc.
Qwen3OmniMoeThinkerForConditionalGeneration Qwen3-Omni T + IE+ + VE+ + A+ Qwen/Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking
Qwen3_5ForConditionalGeneration Qwen3.5 T + IE+ + VE+ Qwen/Qwen3.5-9B-Instruct, etc.
Qwen3_5MoeForConditionalGeneration Qwen3.5-MOE T + IE+ + VE+ Qwen/Qwen3.5-35B-A3B-Instruct, etc.
RForConditionalGeneration R-VL-4B T + IE+ YannQi/R-4B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
Step3VLForConditionalGeneration Step3-VL T + I+ stepfun-ai/step3
StepVLForConditionalGeneration Step3-VL-10B T + I+ stepfun-ai/Step3-VL-10B
TarsierForConditionalGeneration Tarsier T + IE+ omni-search/Tarsier-7b, omni-search/Tarsier-34b
Tarsier2ForConditionalGeneration^ Tarsier2 T + IE+ + VE+ omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115
UltravoxModel Ultravox T + AE+ fixie-ai/ultravox-v0_5-llama-3_2-1b
Emu3ForConditionalGeneration Emu3 T + I+ BAAI/Emu3-Chat
表21 多模态模型 | 生成模型 | 文本转换
架构 模型 HuggingFace模型示例
FireRedASR2ForConditionalGeneration FireRedASR2 allendou/FireRedASR2-LLM-vllm, etc.
FunASRForConditionalGeneration FunASR allendou/Fun-ASR-Nano-2512-vllm, etc.
Gemma3nForConditionalGeneration Gemma3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
GlmAsrForConditionalGeneration GLM-ASR zai-org/GLM-ASR-Nano-2512
GraniteSpeechForConditionalGeneration Granite Speech ibm-granite/granite-speech-3.3-2b, ibm-granite/granite-speech-3.3-8b, etc.
Qwen3ASRForConditionalGeneration Qwen3-ASR Qwen/Qwen3-ASR-1.7B, etc.
Qwen3OmniMoeThinkerForConditionalGeneration Qwen3-Omni Qwen/Qwen3-Omni-30B-A3B-Instruct, etc.
VoxtralForConditionalGeneration Voxtral (Mistral format) mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc.
WhisperForConditionalGeneration Whisper openai/whisper-small, openai/whisper-large-v3-turbo, etc.
表22 多模态模型 | 池化模型 | 嵌入
架构 模型 输入 HuggingFace模型示例 说明
CLIPModel CLIP T / I openai/clip-vit-base-patch32, openai/clip-vit-large-patch14, etc.
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
ColModernVBertForRetrieval ColModernVBERT T / I ModernVBERT/colmodernvbert-merged
LlamaNemotronVLModel Llama Nemotron Embedding + SigLIP T + I nvidia/llama-nemotron-embed-vl-1b-v2
LlavaNextForConditionalGenerationC LLaVA-NeXT-based T / I royokong/e5-v
Phi3VForCausalLMC Phi-3-Vision-based T + I TIGER-Lab/VLM2Vec-Full
Qwen3VLForConditionalGenerationC Qwen3-VL T + I + V Qwen/Qwen3-VL-Embedding-2B, etc.
SiglipModel SigLIP, SigLIP2 T / I google/siglip-base-patch16-224, google/siglip2-base-patch16-224
*ForConditionalGenerationC, *ForCausalLMC, etc. Generative models / N/A
说明:
  • C表示该模型可通过--convert embed转换为嵌入模型。
  • *表示模型功能和原始模型一致。
表23 多模态模型 | 池化模型 | 交叉编码/重排序
架构 模型 输入 HuggingFace模型示例 说明
JinaVLForSequenceClassification JinaVL-based T + IE+ jinaai/jina-reranker-m0, etc.
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
LlamaNemotronVLForSequenceClassification Llama Nemotron Reranker + SigLIP T + IE+ nvidia/llama-nemotron-rerank-vl-1b-v2
Qwen3VLForSequenceClassification Qwen3-VL-Reranker T + IE+ + VE+ Qwen/Qwen3-VL-Reranker-2B, etc.

vLLM 0.11.0

以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
表24 纯文本语言模型 | 生成模型 | 文本生成
架构 模型 HuggingFace模型示例
Zamba2ForCausalLM Zamba2 Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc.
LongcatFlashForCausalLM LongCat-Flash meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8
MiniMaxText01ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01, etc.
MiniMaxM1ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80k, etc.
XverseForCausalLM XVERSE xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc.
TeleFLMForCausalLM TeleFLM CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc.
TeleChat2ForCausalLM TeleChat2 TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc.
Starcoder2ForCausalLM Starcoder2 bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc.
StableLmForCausalLM StableLM stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc.
SolarForCausalLM Solar Pro upstage/solar-pro-preview-instruct , etc.
SeedOssForCausalLM SeedOss ByteDance-Seed/Seed-OSS-36B-Instruct, etc.
QWenLMHeadModel Qwen Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc.
Qwen2MoeForCausalLM Qwen2MoE Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc.
Qwen2ForCausalLM QwQ, Qwen2 Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc.
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B, etc.
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-MoE-15B-A2B, etc.
Qwen3NextForCausalLM Qwen3NextMoE Qwen/Qwen3-Next-80B-A3B-Instruct, etc.
Plamo2ForCausalLM PLaMo2 pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base, adept/persimmon-8b-chat, etc.
Phi4FlashForCausalLM Phi-4-mini-flash-reasoning microsoft/microsoft/Phi-4-mini-instruct, etc.
PhiMoEForCausalLM Phi-3.5-MoE microsoft/Phi-3.5-MoE-instruct , etc.
PhiForCausalLM Phi microsoft/phi-1_5 , microsoft/phi-2 , etc.
Phi3SmallForCausalLM Phi-3-Small microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc.
Phi3ForCausalLM Phi-4, Phi-3 microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base , adept/persimmon-8b-chat , etc.
OrionForCausalLM Orion OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc.
OPTForCausalLM OPT, OPT-IML facebook/opt-66b , facebook/opt-iml-max-30b , etc.
OlmoForCausalLM OLMo allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc.
OlmoeForCausalLM OLMoE allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc.
Olmo2ForCausalLM OLMo2 allenai/OLMo2-7B-1124 , etc.
Olmo3ForCausalLM OLMo3 TBA
NemotronHForCausalLM Nemotron-H nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc.
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc.
MPTForCausalLM MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc.
MotifForCausalLM Motif-1-Tiny Motif-Technologies/Motif-2.6B, Motif-Technologies/Motif-2.6b-v1.1-LC, etc.
MixtralForCausalLM Mixtral-8x7B, Mixtral-8x7B-Instruct mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc.
MistralForCausalLM Mistral, Mistral-Instruct mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc.
MiniCPM3ForCausalLM MiniCPM3 openbmb/MiniCPM3-4B , etc.
MiniCPMForCausalLM MiniCPM openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc.
MambaForCausalLM Mamba state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc.
Mamba2ForCausalLM Mamba2 mistralai/Mamba-Codestral-7B-v0.1, etc.
MiMoForCausalLM MiMo XiaomiMiMo/MiMo-7B-RL, etc.
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc.
Lfm2ForCausalLM LFM2 LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc.
JambaForCausalLM Jamba ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc.
JAISLMHeadModel Jais inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc.
InternLM3ForCausalLM InternLM3 internlm/internlm3-8b-instruct , etc.
InternLM2ForCausalLM InternLM2 internlm/internlm2-7b , internlm/internlm2-chat-7b , etc.
InternLMForCausalLM InternLM internlm/internlm-7b, internlm/internlm-chat-7b, etc.
HCXVisionForCausalLM HyperCLOVAX-SEED-Vision-Instruct-3B naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B
HunYuanMoEV1ForCausalLM Hunyuan-80B-A13B tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc.
HunYuanDenseV1ForCausalLM Hunyuan-7B-Instruct-0124 tencent/Hunyuan-7B-Instruct-0124
Grok1ModelForCausalLM Grok1 hpcai-tech/grok-1.
GritLM GritLM parasail-ai/GritLM-7B-vllm .
GraniteMoeSharedForCausalLM Granite MoE Shared ibm-research/moe-7b-1b-active-shared-experts (test model)
GraniteMoeHybridForCausalLM Granite 4.0 MoE Hybrid ibm-granite/granite-4.0-tiny-preview, etc.
GraniteMoeForCausalLM Granite 3.0 MoE, PowerMoE ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc.
GraniteForCausalLM Granite 3.0, Granite 3.1, PowerLM ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc.
GptOssForCausalLM GPT-OSS openai/gpt-oss-120b, openai/gpt-oss-20b
GPTNeoXForCausalLM GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc.
GPTJForCausalLM GPT-J EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc.
GPTBigCodeForCausalLM StarCoder, SantaCoder, WizardCoder bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc.
GPT2LMHeadModel GPT-2 gpt2 , gpt2-xl , etc.
Glm4MoeForCausalLM GLM-4.5 zai-org/GLM-4.5, etc.
Glm4ForCausalLM GLM-4-0414 THUDM/GLM-4-32B-0414, etc.
GlmForCausalLM GLM-4 THUDM/glm-4-9b-chat-hf , etc.
Gemma3nForCausalLM Gemma 3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it, etc.
Gemma2ForCausalLM Gemma 2 google/gemma-2-9b, google/gemma-2-27b, etc.
GemmaForCausalLM Gemma google/gemma-2b , google/gemma-7b , etc.
FalconH1ForCausalLM Falcon-H1 tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc.
FalconMambaForCausalLM FalconMamba tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc.
FalconForCausalLM Falcon tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc.
Fairseq2LlamaForCausalLM Llama (fairseq2 format) mgleize/fairseq2-dummy-Llama-3.2-1B, etc.
Exaone4ForCausalLM EXAONE-4 LGAI-EXAONE/EXAONE-4.0-32B, etc.
ExaoneForCausalLM EXAONE-3 LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc.
Ernie4_5_MoeForCausalLM Ernie4.5MoE baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc.
Ernie4_5ForCausalLM Ernie4.5 baidu/ERNIE-4.5-0.3B-PT, etc.
DotsOCRForCausalLM dots_ocr rednote-hilab/dots.ocr
Dots1ForCausalLM dots.llm1 rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc.
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc.
DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc.
DeepseekForCausalLM DeepSeek deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc.
DeciLMForCausalLM DeciLM Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc.
DbrxForCausalLM DBRX databricks/dbrx-base , databricks/dbrx-instruct , etc.
CohereForCausalLM , Cohere2ForCausalLM Command-R CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc.
ChatGLMModel, ChatGLMForConditionalGeneration ChatGLM THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc.
BloomForCausalLM BLOOM, BLOOMZ, BLOOMChat bigscience/bloom , bigscience/bloomz , etc.
BambaForCausalLM Bamba ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B
BailingMoeV2ForCausalLM Ling inclusionAI/Ling-mini-2.0, etc.
BailingMoeForCausalLM Ling inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc.
BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc.
ArcticForCausalLM Arctic Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc.
ArceeForCausalLM Arcee (AFM) arcee-ai/AFM-4.5B-Base, etc.
AquilaForCausalLM Aquila, Aquila2 BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc.
ApertusForCausalLM Apertus swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc.
表25 纯文本语言模型 | 池化模型 | 嵌入
架构 模型 HuggingFace模型示例
BertModelC BERT-based BAAI/bge-base-en-v1.5 , etc.
Gemma2ModelC Gemma2-based BAAI/bge-multilingual-gemma2 , etc.
Gemma3TextModelC Gemma 3-based google/embeddinggemma-300m, etc.
GritLM GritLM parasail-ai/GritLM-7B-vllm.
GteModelC Arctic-Embed-2.0-M Snowflake/snowflake-arctic-embed-m-v2.0.
GteNewModelC mGTE-TRM Alibaba-NLP/gte-multilingual-base, etc.
ModernBertModelC ModernBERT-based Alibaba-NLP/gte-modernbert-base, etc.
NomicBertModelC Nomic BERT nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc.
LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. Llama-based intfloat/e5-mistral-7b-instruct , etc.
Qwen2ModelC, Qwen2ForCausalLMC Qwen2-based ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc.
Qwen3ModelC, Qwen3ForCausalLMC Qwen3-based Qwen/Qwen3-Embedding-0.6B, etc.
RobertaModel , RobertaForMaskedLM RoBERTa-based sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc.
*ModelC, *ForCausalLMCC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert embed转换为嵌入模型。
  • *表示模型功能和原始模型一致。
表26 纯文本语言模型 | 池化模型 | 奖励
架构 模型 HuggingFace模型示例
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc.
LlamaForCausalLM Llama-based peiyi9979/math-shepherd-mistral-7b-prm , etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B , etc.
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc.
*ModelCC, *ForCausalLMCC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert reward转换为奖励模型。
  • *表示模型功能和原始模型一致。
表27 纯文本语言模型 | 池化模型 | 分类 ( --task classify)
架构 模型 HuggingFace模型示例
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev , etc.
GPT2ForSequenceClassification GPT2 nie3e/sentiment-polish-gpt2-small
*ModelC, *ForCausalLMC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert classify转换为分类模型。
  • *表示模型功能和原始模型一致。
表28 纯文本语言模型 | 池化模型 | 交叉编码/重排序
架构 模型 HuggingFace模型示例
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2 , etc.
GemmaForSequenceClassification Gemma-based BAAI/bge-reranker-v2-gemma, etc.
GteNewForSequenceClassification mGTE-TRM Alibaba-NLP/gte-multilingual-reranker-base, etc.
Qwen2ForSequenceClassification Qwen2-based mixedbread-ai/mxbai-rerank-base-v2, etc.
Qwen3ForSequenceClassification Qwen3-based tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base , etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3 , etc.
*ModelC, *ForCausalLMC, etc. Generative models N/A
说明:
  • C表示该模型可通过--convert classify转换为分类模型。
  • *表示模型功能和原始模型一致。
表29 纯文本语言模型 | 池化模型 | Token分类
架构 模型 HuggingFace模型示例
BertForTokenClassification bert-based boltuix/NeuroBERT-NER, etc.
表30 多模态模型 | 生成模型 | 文本生成
架构 模型 输入 HuggingFace模型示例 说明
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
    • E:该模态下,支持输入预计算的嵌入
    • + :该模态下,每个文本 Prompt 支持输入多条
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc.
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b etc.
Cohere2VisionForConditionalGeneration Command A Vision T + I+ CohereLabs/command-a-vision-07-2025, etc.
DeepseekVLV2ForCausalLM DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc.
Ernie4_5_VLMoeForConditionalGeneration Ernie4.5-VL T + I+/ V+ baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT
FuyuForCausalLM Fuyu T + I adept/fuyu-8b etc.
Gemma3ForConditionalGeneration Gemma 3 T + I+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
Gemma3nForConditionalGeneration Gemma 3n T + I + A google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
GLM4VForCausalLM^ GLM-4V T + I zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc.
Glm4vForConditionalGeneration GLM-4.1V-Thinking T + IE+ + VE+ zai-org/GLM-4.1V-9B-Thinking, etc.
Glm4vMoeForConditionalGeneration GLM-4.5V T + IE+ + VE+ zai-org/GLM-4.5V, etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3 etc.
InternS1ForConditionalGeneration Intern-S1 T + IE+ + VE+ internlm/Intern-S1, etc.
InternVLChatModel InternVL 3.5, InternVL 3.0, InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE++ (VE+) OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
InternVLForConditionalGeneration InternVL 3.0 (HF format) T + IE+ + VE+ OpenGVLab/InternVL3-1B-hf, etc.
KeyeForConditionalGeneration Keye-VL-8B-Preview T + IE+ + VE+ Kwai-Keye/Keye-VL-8B-Preview
KeyeVL1_5ForConditionalGeneration Keye-VL-1_5-8B T + IE+ + VE+ Kwai-Keye/Keye-VL-1_5-8B
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
Llama_Nemotron_Nano_VL Llama Nemotron Nano VL T + IE+ nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1
LlavaForConditionalGeneration LLaVA-1.5 T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc.
LlavaNextForConditionalGeneration LLaVA-NeXT T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiDashengLMModel MiDashengLM T + A+ mispeech/midashenglm-7b
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc.
MiniMaxVL01ForConditionalGeneration MiniMax-VL T + IE+ MiniMaxAI/MiniMax-VL-01, etc.
Mistral3ForConditionalGeneration Mistral3 (HF Transformers) T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MolmoForCausalLM Molmo T + I allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc.
NVLM_D_Model NVLM-D 1.0 T + IE+ nvidia/NVLM-D-72B, etc.
Ovis Ovis2, Ovis1.6 T + I+ AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc.
Ovis2_5 Ovis2.5 T + I+ + V AIDC-AI/Ovis2.5-9B, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+/ I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
PixtralForConditionalGeneration Pixtral T + I+ mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc.
QwenVLForConditionalGeneration Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-7B
Qwen3VLForConditionalGeneration Qwen3-VL T + IE+ + VE+ Qwen/Qwen3-VL-4B-Instruct, etc.
Qwen3VLMoeForConditionalGeneration Qwen3-VL-MOE T + IE+ + VE+ Qwen/Qwen3-VL-30B-A3B-Instruct, etc.
RForConditionalGeneration R-VL-4B T + IE+ YannQi/R-4B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
Step3VLForConditionalGeneration Step3-VL T + I+ stepfun-ai/step3
TarsierForConditionalGeneration Tarsier T + IE+ omni-search/Tarsier-7b, omni-search/Tarsier-34b
Tarsier2ForConditionalGeneration^ Tarsier2 T + IE+ + VE+ omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115
表31 多模态模型 | 生成模型 | 文本转换
架构 模型 HuggingFace模型示例
WhisperForConditionalGeneration Whisper openai/whisper-small, openai/whisper-large-v3-turbo, etc.
VoxtralForConditionalGeneration Voxtral (Mistral format) mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc.
Gemma3nForConditionalGeneration Gemma3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
表32 多模态模型 | 池化模型 | 嵌入
架构 模型 输入 HuggingFace模型示例 说明
LlavaNextForConditionalGenerationC LLaVA-NeXT-based T / I royokong/e5-v
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
Phi3VForCausalLMC Phi-3-Vision-based T + I TIGER-Lab/VLM2Vec-Full
*ForConditionalGenerationC, *ForCausalLMC, etc. Generative models / N/A
说明:
  • C表示该模型可通过--convert embed转换为嵌入模型。
  • *表示模型功能和原始模型一致。
表33 多模态模型 | 池化模型 | 交叉编码/重排序
架构 模型 输入 HuggingFace模型示例 说明
JinaVLForSequenceClassification JinaVL-based T + IE+ jinaai/jina-reranker-m0, etc.
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入

vLLM 0.9.2

以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
表34 纯文本语言模型 | 生成模型 | 文本生成 ( --task generate)
架构 模型 HuggingFace模型示例
Zamba2ForCausalLM Zamba2 Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc.
MiniMaxText01ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01, etc.
MiniMaxM1ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80ketc.
XverseForCausalLM XVERSE xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc.
TeleFLMForCausalLM TeleFLM CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc.
TeleChat2ForCausalLM TeleChat2 TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc.
Starcoder2ForCausalLM Starcoder2 bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc.
StableLmForCausalLM StableLM stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc.
SolarForCausalLM Solar Pro upstage/solar-pro-preview-instruct , etc.
QWenLMHeadModel Qwen Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc.
Qwen2MoeForCausalLM Qwen2MoE Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc.
Qwen2ForCausalLM QwQ, Qwen2 Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc.
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B, etc.
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-MoE-15B-A2B, etc.
Plamo2ForCausalLM PLaMo2 pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base, adept/persimmon-8b-chat, etc.
PhiMoEForCausalLM Phi-3.5-MoE microsoft/Phi-3.5-MoE-instruct , etc.
PhiForCausalLM Phi microsoft/phi-1_5 , microsoft/phi-2 , etc.
Phi3SmallForCausalLM Phi-3-Small microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc.
Phi3ForCausalLM Phi-4, Phi-3 microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc.
OrionForCausalLM Orion OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc.
OPTForCausalLM OPT, OPT-IML facebook/opt-66b , facebook/opt-iml-max-30b , etc.
OlmoForCausalLM OLMo allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc.
OlmoeForCausalLM OLMoE allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc.
Olmo2ForCausalLM OLMo2 allenai/OLMo2-7B-1124 , etc.
NemotronHForCausalLM Nemotron-H nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc.
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc.
MPTForCausalLM MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc.
MixtralForCausalLM Mixtral-8x7B, Mixtral-8x7B-Instruct mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc.
MistralForCausalLM Mistral, Mistral-Instruct mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc.
MiniCPM3ForCausalLM MiniCPM3 openbmb/MiniCPM3-4B , etc.
MiniCPMForCausalLM MiniCPM openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc.
Mamba2ForCausalLM Mamba2 mistralai/Mamba-Codestral-7B-v0.1, etc.
MambaForCausalLM Mamba state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc.
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc.
JambaForCausalLM Jamba ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc.
JAISLMHeadModel Jais inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc.
InternLMForCausalLM InternLM internlm/internlm-7b , internlm/internlm-chat-7b , etc.
InternLM3ForCausalLM InternLM3 internlm/internlm3-8b-instruct , etc.
InternLM2ForCausalLM InternLM2 internlm/internlm2-7b , internlm/internlm2-chat-7b , etc.
HunYuanMoEV1ForCausalLM Hunyuan-80B-A13B tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8etc.
Grok1ModelForCausalLM Grok1 hpcai-tech/grok-1.
GritLM GritLM parasail-ai/GritLM-7B-vllm .
GraniteMoeSharedForCausalLM Granite MoE Shared ibm-research/moe-7b-1b-active-shared-experts (test model)
GraniteMoeHybridForCausalLM Granite 4.0 MoE Hybrid ibm-granite/granite-4.0-tiny-preview, etc.
GraniteMoeForCausalLM Granite 3.0 MoE, PowerMoE ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc.
GraniteForCausalLM Granite 3.0, Granite 3.1, PowerLM ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc.
GPTNeoXForCausalLM GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc.
GPTJForCausalLM GPT-J EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc.
GPTBigCodeForCausalLM StarCoder, SantaCoder, WizardCoder bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc.
GPT2LMHeadModel GPT-2 gpt2 , gpt2-xl , etc.
Glm4ForCausalLM GLM-4-0414 THUDM/GLM-4-32B-0414, etc.
GlmForCausalLM GLM-4 THUDM/glm-4-9b-chat-hf , etc.
Gemma3nForConditionalGeneration Gemma 3n google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc.
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it, etc.
Gemma2ForCausalLM Gemma 2 google/gemma-2-9b, google/gemma-2-27b, etc.
GemmaForCausalLM Gemma google/gemma-2b , google/gemma-7b , etc.
FalconH1ForCausalLM Falcon-H1 tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc.
FalconMambaForCausalLM FalconMamba tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc.
FalconForCausalLM Falcon tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc.
ExaoneForCausalLM EXAONE-3 LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc.
Ernie4_5_MoeForCausalLM Ernie4.5MoE baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc.
Ernie4_5_ForCausalLM Ernie4.5 baidu/ERNIE-4.5-0.3B-PT,etc.
Dots1ForCausalLM dots.llm1 rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst etc.
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc.
DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc.
DeepseekForCausalLM DeepSeek deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc.
DeciLMForCausalLM DeciLM Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc.
DbrxForCausalLM DBRX databricks/dbrx-base , databricks/dbrx-instruct , etc.
CohereForCausalLM , Cohere2ForCausalLM Command-R CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc.
ChatGLMModel, ChatGLMForConditionalGeneration ChatGLM THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc.
BloomForCausalLM BLOOM, BLOOMZ, BLOOMChat bigscience/bloom , bigscience/bloomz , etc.
BartForConditionalGeneration BART facebook/bart-base , facebook/bart-large-cnn , etc.
BambaForCausalLM Bamba ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B
BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc.
ArcticForCausalLM Arctic Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc.
AquilaForCausalLM Aquila, Aquila2 BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc.
表35 纯文本语言模型 | 池化模型 | 文本嵌入 ( --task embed)
架构 模型 HuggingFace模型示例
BertModel BERT-based BAAI/bge-base-en-v1.5 , etc.
Gemma2Model Gemma2-based BAAI/bge-multilingual-gemma2 , etc.
GritLM GritLM parasail-ai/GritLM-7B-vllm.
GteModel Arctic-Embed-2.0-M Snowflake/snowflake-arctic-embed-m-v2.0.
GteNewModel mGTE-TRM Alibaba-NLP/gte-multilingual-base, etc.
ModernBertModel ModernBERT-based Alibaba-NLP/gte-modernbert-base, etc.
NomicBertModel Nomic BERT nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc.
LlamaModel , LlamaForCausalLM , MistralModel , etc. Llama-based intfloat/e5-mistral-7b-instruct , etc.
Qwen2Model , Qwen2ForCausalLM Qwen2-based ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc.
RobertaModel, RobertaForMaskedLM RoBERTa-based sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc.
Qwen3Model, Qwen3ForCausalLM Qwen3-based Qwen/Qwen3-Embedding-0.6B, etc.
表36 纯文本语言模型 | 池化模型 | 奖励模型 ( --task reward)
架构 模型 HuggingFace模型示例
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc.
LlamaForCausalLM Llama-based peiyi9979/math-shepherd-mistral-7b-prm , etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B , etc.
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc.
表37 纯文本语言模型 | 池化模型 | 分类 ( --task classify)
架构 模型 HuggingFace模型示例
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev , etc.
Qwen2ForSequenceClassification Qwen2-based jason9693/Qwen2.5-1.5B-apeach , etc.
表38 纯文本语言模型 | 池化模型 | 句子对评分 ( --task score)
架构 模型 HuggingFace模型示例
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2 , etc.
Qwen2ForSequenceClassification Qwen2-based mixedbread-ai/mxbai-rerank-base-v2, etc.
Qwen3ForSequenceClassification Qwen3-based tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base , etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3 , etc.
表39 多模态模型 | 生成模型 | 文本生成
架构 模型 输入 HuggingFace模型示例 说明
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
    • E:该模态下,支持输入预计算的嵌入
    • + :该模态下,每个文本 Prompt 支持输入多条
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc.
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b etc.
DeepseekVLV2ForCausalLM^ DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc.
Florence2ForConditionalGeneration Florence-2 T + I microsoft/Florence-2-base, microsoft/Florence-2-large etc.
FuyuForCausalLM Fuyu T + I adept/fuyu-8b etc.
Gemma3ForConditionalGeneration Gemma 3 T + I+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
GLM4VForCausalLM^ GLM-4V T + I THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc.
Glm4vForConditionalGeneration GLM-4.1V-Thinking T + IE+ + VE+ THUDM/GLM-4.1V-9B-Thinkg, etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3 etc.
InternVLChatModel InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE+ OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
KeyeForConditionalGeneration Keye-VL-8B-Preview T + IE+ + VE+ Kwai-Keye/Keye-VL-8B-Preview
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
LlavaForConditionalGeneration LLaVA-1.5 T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc.
LlavaNextForConditionalGeneration LLaVA-NeXT T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc.
MiniMaxVL01ForConditionalGeneration MiniMax-VL T + IE+ MiniMaxAI/MiniMax-VL-01, etc.
Mistral3ForConditionalGeneration Mistral3 T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MllamaForConditionalGeneration Llama 3.2 T + I+ meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc.
MolmoForCausalLM Molmo T + I allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc.
NVLM_D_Model NVLM-D 1.0 T + IE+ nvidia/NVLM-D-72B, etc.
Ovis Ovis2, Ovis1.6 T + I+ AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+/ I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
PixtralForConditionalGeneration Pixtral T + I+ mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc.
QwenVLForConditionalGeneration^ Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-7B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
TarsierForConditionalGeneration Tarsier T + IE+ omni-search/Tarsier-7b, omni-search/Tarsier-34b
Tarsier2ForConditionalGeneration^ Tarsier2 T + IE+ + VE+ omni-research/Tarsier2-Recap-7b,omni-research/Tarsier2-7b-0115
表40 多模态模型 | 生成模型 | 文本生成 (--task generate)
架构 模型 输入 HuggingFace模型示例 说明
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
    • E:该模态下,支持输入预计算的嵌入
    • + :该模态下,每个文本 Prompt 支持输入多条
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc.
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b etc.
DeepseekVLV2ForCausalLM^ DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc.
Florence2ForConditionalGeneration Florence-2 T + I microsoft/Florence-2-base, microsoft/Florence-2-large etc.
FuyuForCausalLM Fuyu T + I adept/fuyu-8b etc.
Gemma3ForConditionalGeneration Gemma 3 T + I+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
GLM4VForCausalLM^ GLM-4V T + I THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc.
Glm4vForConditionalGeneration GLM-4.1V-Thinking T + IE+ + VE+ THUDM/GLM-4.1V-9B-Thinkg, etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3 etc.
InternVLChatModel InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE+ OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
KeyeForConditionalGeneration Keye-VL-8B-Preview T + IE+ + VE+ Kwai-Keye/Keye-VL-8B-Preview
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
LlavaForConditionalGeneration LLaVA-1.5 T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc.
LlavaNextForConditionalGeneration LLaVA-NeXT T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc.
MiniMaxVL01ForConditionalGeneration MiniMax-VL T + IE+ MiniMaxAI/MiniMax-VL-01, etc.
Mistral3ForConditionalGeneration Mistral3 T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MllamaForConditionalGeneration Llama 3.2 T + I+ meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc.
MolmoForCausalLM Molmo T + I allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc.
NVLM_D_Model NVLM-D 1.0 T + IE+ nvidia/NVLM-D-72B, etc.
Ovis Ovis2, Ovis1.6 T + I+ AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+/ I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
PixtralForConditionalGeneration Pixtral T + I+ mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc.
QwenVLForConditionalGeneration^ Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-7B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
TarsierForConditionalGeneration Tarsier T + IE+ omni-search/Tarsier-7b, omni-search/Tarsier-34b
Tarsier2ForConditionalGeneration^ Tarsier2 T + IE+ + VE+ omni-research/Tarsier2-Recap-7b,omni-research/Tarsier2-7b-0115
表41 多模态模型 | 生成模型 | 文本转换 (--task transcription)
架构 模型 HuggingFace模型示例
WhisperForConditionalGeneration Whisper openai/whisper-small, openai/whisper-large-v3-turbo, etc.
表42 多模态模型 | 池化模型 | 文本嵌入 (--task embed)
架构 模型 输入 HuggingFace模型示例 说明
LlavaNextForConditionalGeneration LLaVA-NeXT-based T / I royokong/e5-v
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
Phi3VForCausalLM Phi-3-Vision-based T + I TIGER-Lab/VLM2Vec-Full

vLLM 0.8.5

以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
表43 纯文本语言模型 | 生成模型 | 文本生成 ( --task generate)
架构 模型 HuggingFace模型示例
Zamba2ForCausalLM Zamba2 Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc.
MiniMaxText01ForCausalLM MiniMax-Text MiniMaxAI/MiniMax-Text-01, etc.
XverseForCausalLM XVERSE xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc.
TeleFLMForCausalLM TeleFLM CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc.
TeleChat2ForCausalLM TeleChat2 TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc.
Starcoder2ForCausalLM Starcoder2 bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc.
StableLmForCausalLM StableLM stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc.
SolarForCausalLM Solar Pro upstage/solar-pro-preview-instruct , etc.
QWenLMHeadModel Qwen Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc.
Qwen2MoeForCausalLM Qwen2MoE Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc.
Qwen2ForCausalLM QwQ, Qwen2 Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc.
Qwen3ForCausalLM Qwen3 Qwen/Qwen3-8B, etc.
Qwen3MoeForCausalLM Qwen3MoE Qwen/Qwen3-MoE-15B-A2B, etc.
Plamo2ForCausalLM PLaMo2 pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base, adept/persimmon-8b-chat, etc.
PhiMoEForCausalLM Phi-3.5-MoE microsoft/Phi-3.5-MoE-instruct , etc.
PhiForCausalLM Phi microsoft/phi-1_5 , microsoft/phi-2 , etc.
Phi3SmallForCausalLM Phi-3-Small microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc.
Phi3ForCausalLM Phi-4, Phi-3 microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc.
PersimmonForCausalLM Persimmon adept/persimmon-8b-base , adept/persimmon-8b-chat , etc.
OrionForCausalLM Orion OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc.
OPTForCausalLM OPT, OPT-IML facebook/opt-66b , facebook/opt-iml-max-30b , etc.
OlmoForCausalLM OLMo allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc.
OlmoeForCausalLM OLMoE allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc.
Olmo2ForCausalLM OLMo2 allenai/OLMo2-7B-1124 , etc.
NemotronForCausalLM Nemotron-3, Nemotron-4, Minitron nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc.
MPTForCausalLM MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc.
MixtralForCausalLM Mixtral-8x7B, Mixtral-8x7B-Instruct mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc.
MistralForCausalLM Mistral, Mistral-Instruct mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc.
MiniCPM3ForCausalLM MiniCPM3 openbmb/MiniCPM3-4B , etc.
MiniCPMForCausalLM MiniCPM openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc.
MambaForCausalLM Mamba state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc.
LlamaForCausalLM Llama 3.1, Llama 3, Llama 2, LLaMA, Yi meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc.
JambaForCausalLM Jamba ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc.
JAISLMHeadModel Jais inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc.
InternLMForCausalLM InternLM internlm/internlm-7b , internlm/internlm-chat-7b , etc.
InternLM3ForCausalLM InternLM3 internlm/internlm3-8b-instruct , etc.
InternLM2ForCausalLM InternLM2 internlm/internlm2-7b , internlm/internlm2-chat-7b , etc.
Grok1ModelForCausalLM Grok1 hpcai-tech/grok-1.
GritLM GritLM parasail-ai/GritLM-7B-vllm.
GritLM GritLM parasail-ai/GritLM-7B-vllm .
GraniteMoeSharedForCausalLM Granite MoE Shared ibm-research/moe-7b-1b-active-shared-experts (test model)
GraniteMoeForCausalLM Granite 3.0 MoE, PowerMoE ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc.
GraniteForCausalLM Granite 3.0, Granite 3.1, PowerLM ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc.
GPTNeoXForCausalLM GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc.
GPTJForCausalLM GPT-J EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc.
GPTBigCodeForCausalLM StarCoder, SantaCoder, WizardCoder bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc.
GPT2LMHeadModel GPT-2 gpt2 , gpt2-xl , etc.
Glm4ForCausalLM GLM-4-0414 THUDM/GLM-4-32B-0414, etc.
GlmForCausalLM GLM-4 THUDM/glm-4-9b-chat-hf , etc.
Gemma3ForCausalLM Gemma 3 google/gemma-3-1b-it, etc.
Gemma2ForCausalLM Gemma 2 google/gemma-2-9b, google/gemma-2-27b, etc.
GemmaForCausalLM Gemma google/gemma-2b , google/gemma-7b , etc.
FalconMambaForCausalLM FalconMamba tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc.
FalconForCausalLM Falcon tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc.
ExaoneForCausalLM EXAONE-3 LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc.
DeepseekV3ForCausalLM DeepSeek-V3 deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc.
DeepseekV2ForCausalLM DeepSeek-V2 deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc.
DeepseekForCausalLM DeepSeek deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc.
DeciLMForCausalLM DeciLM Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc.
DbrxForCausalLM DBRX databricks/dbrx-base , databricks/dbrx-instruct , etc.
CohereForCausalLM , Cohere2ForCausalLM Command-R CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc.
ChatGLMModel, ChatGLMForConditionalGeneration ChatGLM THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc.
BloomForCausalLM BLOOM, BLOOMZ, BLOOMChat bigscience/bloom , bigscience/bloomz , etc.
BartForConditionalGeneration BART facebook/bart-base , facebook/bart-large-cnn , etc.
BambaForCausalLM Bamba ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B
BaiChuanForCausalLM Baichuan2, Baichuan baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc.
ArcticForCausalLM Arctic Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc.
AquilaForCausalLM Aquila, Aquila2 BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc.
表44 纯文本语言模型 | 池化模型 | 文本嵌入 ( --task embed)
架构 模型 HuggingFace模型示例
BertModel BERT-based BAAI/bge-base-en-v1.5 , etc.
Gemma2Model Gemma2-based BAAI/bge-multilingual-gemma2 , etc.
GritLM GritLM parasail-ai/GritLM-7B-vllm .
LlamaModel , LlamaForCausalLM , MistralModel , etc. Llama-based intfloat/e5-mistral-7b-instruct , etc.
Qwen2Model , Qwen2ForCausalLM Qwen2-based ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc.
RobertaModel , RobertaForMaskedLM RoBERTa-based sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc.
XLMRobertaModel XLM-RoBERTa-based intfloat/multilingual-e5-large , etc.
表45 纯文本语言模型 | 池化模型 | 奖励建模 ( --task reward)
架构 模型 HuggingFace模型示例
InternLM2ForRewardModel InternLM2-based internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc.
LlamaForCausalLM Llama-based peiyi9979/math-shepherd-mistral-7b-prm , etc.
Qwen2ForRewardModel Qwen2-based Qwen/Qwen2.5-Math-RM-72B , etc.
Qwen2ForProcessRewardModel Qwen2-based Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc.
表46 纯文本语言模型 | 池化模型 | 分类 ( --task classify)
架构 模型 HuggingFace模型示例
JambaForSequenceClassification Jamba ai21labs/Jamba-tiny-reward-dev , etc.
Qwen2ForSequenceClassification Qwen2-based jason9693/Qwen2.5-1.5B-apeach , etc.
表47 纯文本语言模型 | 池化模型 | 句子对评分 ( --task score)
架构 模型 HuggingFace模型示例
BertForSequenceClassification BERT-based cross-encoder/ms-marco-MiniLM-L-6-v2 , etc.
RobertaForSequenceClassification RoBERTa-based cross-encoder/quora-roberta-base , etc.
XLMRobertaForSequenceClassification XLM-RoBERTa-based BAAI/bge-reranker-v2-m3 , etc.
ModernBertForSequenceClassification ModernBert-based Alibaba-NLP/gte-reranker-modernbert-base, etc.
表48 多模态模型 | 生成模型 | 文本生成
架构 模型 输入 HuggingFace模型示例 说明
AriaForConditionalGeneration Aria T + I+ rhymes-ai/Aria
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
    • E:该模态下,支持输入预计算的嵌入
    • + :该模态下,每个文本 Prompt 支持输入多条
AyaVisionForConditionalGeneration Aya Vision T + I+ CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc.
Blip2ForConditionalGeneration BLIP-2 T + IE Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc.
ChameleonForConditionalGeneration Chameleon T + I facebook/chameleon-7b etc.
DeepseekVLV2ForCausalLM DeepSeek-VL2 T + I+ deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc.
Florence2ForConditionalGeneration Florence-2 T + I microsoft/Florence-2-base, microsoft/Florence-2-large etc.
FuyuForCausalLM Fuyu T + I adept/fuyu-8b etc.
Gemma3ForConditionalGeneration Gemma 3 T + I+ google/gemma-3-4b-it, google/gemma-3-27b-it, etc.
GLM4VForCausalLM^ GLM-4V T + I THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc.
GraniteSpeechForConditionalGeneration Granite Speech T + A ibm-granite/granite-speech-3.3-8b
H2OVLChatModel H2OVL T + IE+ h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc.
Idefics3ForConditionalGeneration Idefics3 T + I HuggingFaceM4/Idefics3-8B-Llama3 etc.
InternVLChatModel InternVL 2.5, Mono-InternVL, InternVL 2.0 T + IE+ OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc.
KimiVLForConditionalGeneration Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking T + I+ moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking
Llama4ForConditionalGeneration Llama 4 T + I+ meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc.
LlavaForConditionalGeneration LLaVA-1.5 T + IE+ llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc.
LlavaNextForConditionalGeneration LLaVA-NeXT T + IE+ llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc.
LlavaNextVideoForConditionalGeneration LLaVA-NeXT-Video T + V llava-hf/LLaVA-NeXT-Video-7B-hf, etc.
LlavaOnevisionForConditionalGeneration LLaVA-Onevision T + I+ + V+ llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc.
MiniCPMO MiniCPM-O T + IE+ + VE+ + AE+ openbmb/MiniCPM-o-2_6, etc.
MiniCPMV MiniCPM-V T + IE+ + VE+ openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc.
Mistral3ForConditionalGeneration Mistral3 T + I+ mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc.
MllamaForConditionalGeneration Llama 3.2 T + I+ meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc.
MolmoForCausalLM Molmo T + I allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc.
NVLM_D_Model NVLM-D 1.0 T + IE+ nvidia/NVLM-D-72B, etc.
PaliGemmaForConditionalGeneration PaliGemma, PaliGemma 2 T + IE google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc.
Phi3VForCausalLM Phi-3-Vision, Phi-3.5-Vision T + IE+ microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc.
Phi4MMForCausalLM Phi-4-multimodal T + I+ / T + A+/ I+ + A+ microsoft/Phi-4-multimodal-instruct, etc.
PixtralForConditionalGeneration Pixtral T + I+ mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc.
QwenVLForConditionalGeneration Qwen-VL T + IE+ Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc.
Qwen2AudioForConditionalGeneration Qwen2-Audio T + A+ Qwen/Qwen2-Audio-7B-Instruct
Qwen2VLForConditionalGeneration QVQ, Qwen2-VL T + IE+ + VE+ Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc.
Qwen2_5_VLForConditionalGeneration Qwen2.5-VL T + IE+ + VE+ Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc.
Qwen2_5OmniThinkerForConditionalGeneration Qwen2.5-Omni T + IE+ + VE+ + A+ Qwen/Qwen2.5-Omni-7B
SkyworkR1VChatModel Skywork-R1V-38B T + I Skywork/Skywork-R1V-38B
SmolVLMForConditionalGeneration SmolVLM2 T + I SmolVLM2-2.2B-Instruct
UltravoxModel Ultravox T + AE+ fixie-ai/ultravox-v0_3
表49 多模态模型 | 池化模型 | 文本嵌入
架构 模型 输入 HuggingFace模型示例 说明
LlavaNextForConditionalGeneration LLaVA-NeXT-based T / I royokong/e5-v
  • 模态说明:
    • Text:文本
    • Image:图片
    • Video:视频
    • Audio:音频
  • 特殊字符含义:
    • +:支持同时输入两种模态。例如 T+I 表示:支持纯文本输入、纯图片输入,或文本+图片输入
    • /:支持多种模态,但多种模态不可同时使用。例如 T/I表示:支持纯文本输入或纯图片输入,不支持文本+图片输入
Phi3VForCausalLM Phi-3-Vision-based T + I TIGER-Lab/VLM2Vec-Full
Qwen2VLForConditionalGeneration Qwen2-VL-based T + I MrLight/dse-qwen2-2b-mrl-v1

vllm-ascend-v0.17.0rc1

以下列举该模板支持的模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考vllm-ascend官方文档
Models
Aria
Baichuan
Baichuan2
Bert
DeepSeek Distill (Qwen/Llama)
DeepSeek R1
DeepSeek V3.2
DeepSeek V3/3.1
Ernie4.5
Ernie4.5-Moe
Gemma-2
Gemma-3
Gemma3
GLM-4.x
GLM-5
Internlm
Kimi-K2-Thinking
DeepseekOCR2
MiniMax-M2.5
Llama2/3/3.1/3.2
Llama3.2
LLaVA-Next
LLaVA-Next-Video
MiniCPM
MiniCPM-V
MiniCPM3
Mistral/Mistral-Instruct
DeepSeek V2.5
Mllama
MiniMax-Text
Mistral3
Molmo
PaddleOCR-VL
Llama4
Keye-VL-8B-Preview
Florence-2
GLM-4V
InternVL2.0/2.5/3.0InternVideo2.5/Mono-InternVL
Whisper
Ultravox
Phi-3-Vision/Phi-3.5-Vision
Phi-3/4
Phi-4-mini
QVQ
Qwen2
Qwen2-Audio
Qwen2-based
Qwen2-VL
Qwen2.5
Qwen2.5-Omni
Qwen2.5-VL
Qwen3
Qwen3-based
Qwen3-Coder
Qwen3-Embedding
Qwen3-Moe
Qwen3-Next
Qwen3-Omni
Qwen3-Omni-30B-A3B-Thinking
Qwen3-Reranker
Qwen3-VL
Qwen3-VL-Embedding
Qwen3-VL-MOE
Qwen3.5-397B-A17B
Qwen3.5-27B
Qwen3-VL-Reranker
QwQ-32B
XLM-RoBERTa-based

vllm-ascend-v0.18.0

以下列举该模板支持的模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考vllm-ascend官方文档
Models
Aria
Baichuan
Baichuan2
Bert
DeepSeek Distill (Qwen/Llama)
DeepSeek R1
DeepSeek V3.2
DeepSeek V3/3.1
Ernie4.5
Ernie4.5-Moe
Gemma-2
Gemma-3
Gemma3
GLM-4.x
GLM-5
Internlm
Kimi-K2-Thinking
DeepseekOCR2
MiniMax-M2.5
MiniMax-M2.7
Llama2/3/3.1/3.2
Llama3.2
LLaVA-Next
LLaVA-Next-Video
MiniCPM
MiniCPM-V
MiniCPM3
Mistral/Mistral-Instruct
DeepSeek V2.5
Mllama
MiniMax-Text
Mistral3
Molmo
PaddleOCR-VL
Llama4
Keye-VL-8B-Preview
Florence-2
GLM-4V
InternVL2.0/2.5/3.0 InternVideo2.5/Mono-InternVL
Whisper
Ultravox
Phi-3-Vision/Phi-3.5-Vision
Phi-3/4
Phi-4-mini
QVQ
Qwen2
Qwen2-Audio
Qwen2-based
Qwen2-VL
Qwen2.5
Qwen2.5-Omni
Qwen2.5-VL
Qwen3
Qwen3-based
Qwen3-Coder
Qwen3-Embedding
Qwen3-Moe
Qwen3-Next
Qwen3-Omni
Qwen3-Omni-30B-A3B-Thinking
Qwen3-Reranker
Qwen3-VL
Qwen3-VL-Embedding
Qwen3-VL-MOE
Qwen3.5-397B-A17B
Qwen3.5-27B
Qwen3.5-35B-A3B
Qwen3.6-27B
Qwen3.6-35B-A3B
Qwen3-VL-Reranker
QwQ-32B
XLM-RoBERTa-based

Diffusers 0.37.0

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Diffuser官方文档
模型
AutoModel
ControlNetModel
ControlNetUnionModel
FluxControlNetModel
HunyuanDiT2DControlNetModel
SanaControlNetModel
SD3ControlNetModel
SparseControlNetModel
AllegroTransformer3DModel
AuraFlowTransformer2DModel
BriaFiboTransformer2DModel
BriaTransformer2DModel
ChromaTransformer2DModel
ChronoEditTransformer3DModel
CogVideoXTransformer3DModel
CogView3PlusTransformer2DModel
CogView4Transformer2DModel
ConsisIDTransformer3DModel
CosmosTransformer3DModel
DiTTransformer2DModel
EasyAnimateTransformer3DModel
FluxTransformer2DModel
Flux2Transformer2DModel
GlmImageTransformer2DModel
HeliosTransformer3DModel
HiDreamImageTransformer2DModel
HunyuanDiT2DModel
HunyuanImageTransformer2DModel
HunyuanVideo15Transformer3DModel
HunyuanVideoTransformer3DModel
LatteTransformer3DModel
LTX2VideoTransformer3DModel
LTXVideoTransformer3DModel
LongCatImageTransformer2DModel
Lumina2Transformer2DModel
LuminaNextDiT2DModel
MochiTransformer3DModel
OmniGenTransformer2DModel
OvisImageTransformer2DModel
PixArtTransformer2DModel
PriorTransformer
QwenImageTransformer2DModel
SanaTransformer2DModel
SanaVideoTransformer3DModel
SD3Transformer2DModel
SkyReelsV2Transformer3DModel
StableAudioDiTModel
Transformer2DModel
TransformerTemporalModel
WanAnimateTransformer3DModel
WanTransformer3DModel
ZImageTransformer2DModel
StableCascadeUNet
UNet1DModel
UNet2DConditionModel
UNet2DModel
UNet3DConditionModel
UNetMotionModel
UVit2DModel
AsymmetricAutoencoderKL
AutoencoderDC
AutoencoderKL
AutoencoderKLAllegro
AutoencoderKLCogVideoX
AutoencoderKLCosmos
AutoencoderKLHunyuanImage
AutoencoderKLHunyuanImageRefiner
AutoencoderKLHunyuanVideo
AutoencoderKLHunyuanVideo15
AutoencoderKLLTX2Audio
AutoencoderKLLTX2Video
AutoencoderKLLTXVideo
AutoencoderKLMagvit
AutoencoderKLMochi
AutoencoderKLQwenImage
AutoencoderKLWan
ConsistencyDecoderVAE
AutoencoderOobleck
AutoencoderRAE
AutoencoderTiny
VQModel

Transformers 5.3.0

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Transformers官方文档
模型
AFMoE
Aimv2
ALBERT
ALIGN
AltCLIP
Apertus
Arcee
Aria
Audio Spectrogram Transformer
AudioFlamingo3
Autoformer
AyaVision
Bamba
Bark
BART
BARThez
BARTpho
BEiT
BERT
Bert Generation
BertJapanese
BERTweet
BigBird
BigBird-Pegasus
BioGpt
BiT
BitNet
Blenderbot
BlenderbotSmall
BLIP
BLIP-2
BLOOM
BLT
BridgeTower
BROS
ByT5
CamemBERT
CANINE
Chameleon
Chinese-CLIP
CLAP
CLIP
CLIPSeg
CLVP
Code World Model (CWM)
CodeGen
CodeLlama
Cohere
Cohere2
Cohere2Vision
ColModernVBert
ColPali
ColQwen2
Conditional DETR
ConvBERT
ConvNeXT
ConvNeXTV2
CPM
CPM-Ant
CSM
CTRL
CvT
D-FINE
DAB-DETR
dac
Data2VecAudio
Data2VecText
Data2VecVision
DBRX
DeBERTa
DeBERTa-v2
Decision Transformer
DeepSeek-V2
DeepSeek-V3
DeepseekVL
DeepseekVLHybrid
Deformable DETR
DeiT
DePlot
Depth Anything
Depth Anything V2
DepthPro
DETR
Dia
DialoGPT
DiffLlama
DiNAT
DINOv2
DINOv2 with Registers
DINOv3
DistilBERT
DiT
Doge
DonutSwin
dots1
DPR
DPT
EdgeTAM
EdgeTamVideo
EfficientLoFTR
EfficientNet
ELECTRA
Emu3
EnCodec
Encoder decoder
EoMT
EoMT-DINOv3
ERNIE
Ernie4_5
Ernie4_5_MoE
ernie4_5_vl_moe
ESM
EuroBERT
Evolla
EXAONE-4.0
EXAONE-MoE
FairSeq Machine-Translation
Falcon
Falcon3
FalconH1
FalconMamba
FastSpeech2Conformer
FastVLM
FLAN-T5
FLAN-UL2
FlauBERT
FLAVA
FlexOlmo
Florence2
FNet
FocalNet
Funnel Transformer
Fuyu
Gemma
Gemma2
Gemma3
Gemma3n
GIT
GLM-4
GLM-4-0414
GLM-4.5, GLM-4.6, GLM-4.7
GLM-4.7-Flash
GLM-ASR
GLM-Image
Glm46V
glm4v
glm4v_moe
GlmMoeDsa
GlmOcr
GLPN
GOT-OCR2
GPT Neo
GPT NeoX
GPT NeoX Japanese
GPT-J
GPT-Sw3
GPTBigCode
GptOss
Granite
GraniteMoe
GraniteMoeHybrid
GraniteMoeShared
GraniteSpeech
GraniteVision
Grounding DINO
GroupViT
Helium
HerBERT
HGNet-V2
Hiera
Higgs Audio V2
Hubert
HunYuanDenseV1
HunYuanMoEV1
I-BERT
I-JEPA
IDEFICS
Idefics2
Idefics3
ImageGPT
Informer
InstructBLIP
InstructBlipVideo
InternVL
Jais2
Jamba
Janus
JetMoe
KOSMOS-2
KOSMOS-2.5
Kyutai Speech-To-Text
LASR
LayoutLM
LayoutLMv2
LayoutLMv3
LayoutXLM
LED
LeViT
LFM2
LFM2-VL
LFM2Moe
LightGlue
LightOnOcr
LiLT
LLaMA
Llama2
Llama3
Llama4
LLaVa
LLaVA-NeXT
LLaVa-NeXT-Video
LLaVA-Onevision
LongCatFlash
Longformer
LongT5
LUKE
LW-DETR
LXMERT
M2M100
MADLAD-400
Mamba
Mamba2
Marian
MarkupLM
Mask2Former
MaskFormer
MatCha
mBART
mBART-50
Megatron-BERT
Megatron-GPT2
MetaCLIP 2
MGP-STR
Mimi
MiniMax
MiniMax-M2
Ministral
Ministral3
Mistral
Mistral3
Mixtral
MLCD
mllama
mLUKE
MM Grounding DINO
MMS
MobileBERT
MobileNetV1
MobileNetV2
MobileViT
MobileViTV2
ModernBert
ModernBERTDecoder
ModernVBert
Moonshine
Moonshine Streaming
Moshi
MPNet
MPT
MRA
MT5
MusicGen
MusicGen Melody
MVP
myt5
NanoChat
Nemotron
nemotron_h
NLLB
NLLB-MOE
Nougat
Nyströmformer
OLMo
OLMo Hybrid
OLMo2
Olmo3
OLMoE
OmDet-Turbo
OneFormer
OpenAI GPT
OpenAI GPT-2
OPT
Ovis2
OWL-ViT
OWLv2
PaddleOCRVL
PaliGemma
Parakeet
PatchTSMixer
PatchTST
PE Audio
PE Audio Video
PE Video
Pegasus
PEGASUS-X
Perceiver
PerceptionLM
Persimmon
Phi
Phi-3
Phi4 Multimodal
PhiMoE
PhoBERT
Pix2Struct
Pixio
Pixtral
PLBart
PoolFormer
Pop2Piano
PP-DocLayoutV2
PP-DocLayoutV3
Prompt Depth Anything
ProphetNet
PVT
Pyramid Vision Transformer v2 (PVTv2)
Qwen2
Qwen2.5-Omni
Qwen2.5-VL
Qwen2Audio
Qwen2MoE
Qwen2VL
Qwen3
Qwen3-Omni-MoE
Qwen3.5
Qwen3.5 Moe
Qwen3MoE
Qwen3Next
Qwen3VL
Qwen3VLMoe
RAG
RecurrentGemma
Reformer
RegNet
RemBERT
ResNet
RoBERTa
RoBERTa-PreLayerNorm
RoCBert
RoFormer
RT-DETR
RT-DETRv2
RWKV
SAM
SAM2
SAM2 Video
SAM3
SAM3 Video
Sam3Tracker
Sam3TrackerVideo
SeamlessM4T
SeamlessM4Tv2
Seed-Oss
SegFormer
SegGPT
Segment Anything High Quality
SEW
SEW-D
ShieldGemma2
SigLIP
SigLIP2
SmolLM3
SmolVLM
SolarOpen
Speech Encoder decoder
Speech2Text
SpeechT5
Splinter
SqueezeBERT
StableLm
Starcoder2
SuperGlue
SuperPoint
SwiftFormer
Swin Transformer
Swin Transformer V2
Swin2SR
SwitchTransformers
T5
T5Gemma
T5Gemma2
T5v1.1
Table Transformer
TAPAS
TextNet
Time Series Transformer
TimesFM
TimesFM 2.5
TimeSformer
Timm Wrapper
TrOCR
TVP
UDOP
UL2
UMT5
UniSpeech
UniSpeechSat
UnivNet
UPerNet
V-JEPA 2
VaultGemma
VibeVoice ASR
VideoLlama3
VideoLlava
VideoMAE
ViLT
VipLlava
Vision Encoder decoder
VisionTextDualEncoder
VisualBERT
ViT
VitDet
ViTMAE
ViTMatte
ViTMSN
ViTPose
VITS
ViViT
Voxtral
VoxtralRealtime
Wav2Vec2
Wav2Vec2-BERT
Wav2Vec2-Conformer
Wav2Vec2Phoneme
WavLM
Whisper
X-CLIP
X-Codec
X-MOD
XGLM
XLM
XLM-RoBERTa
XLM-RoBERTa-XL
XLM-V
XLNet
XLS-R
XLSR-Wav2Vec2
xLSTM
YOLOS
YOSO
Youtu-LLM
Zamba
Zamba2
ZoeDepth

Sentence Transformers 5.3.0

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Sentence Transformers官方文档
模型
all-MiniLM-L12-v1
all-MiniLM-L12-v2
all-MiniLM-L6-v1
all-MiniLM-L6-v2
all-distilroberta-v1
all-mpnet-base-v1
all-mpnet-base-v2
all-roberta-large-v1
average_word_embeddings_glove.6B.300d
average_word_embeddings_komninos
distiluse-base-multilingual-cased-v1
distiluse-base-multilingual-cased-v2
gtr-t5-base
gtr-t5-large
gtr-t5-xxl
gtr-t5-xl
msmarco-bert-base-dot-v5
msmarco-distilbert-dot-v5
msmarco-distilbert-base-tas-b
msmarco-distilbert-cos-v5
msmarco-MiniLM-L12-cos-v5
msmarco-MiniLM-L6-cos-v5
multi-qa-MiniLM-L6-cos-v1
multi-qa-MiniLM-L6-dot-v1
multi-qa-distilbert-cos-v1
multi-qa-distilbert-dot-v1
multi-qa-mpnet-base-cos-v1
multi-qa-mpnet-base-dot-v1
paraphrase-MiniLM-L12-v2
paraphrase-MiniLM-L3-v2
paraphrase-MiniLM-L6-v2
paraphrase-TinyBERT-L6-v2
paraphrase-albert-small-v2
paraphrase-distilroberta-base-v2
paraphrase-mpnet-base-v2
paraphrase-multilingual-MiniLM-L12-v2
paraphrase-multilingual-mpnet-base-v2
LaBSE
sentence-t5-base
sentence-t5-large
sentence-t5-xl
sentence-t5-xxl
clip-ViT-L-14
clip-ViT-B-16
clip-ViT-B-32
clip-ViT-B-32-multilingual-v1
Qwen/Qwen3-VL-Embedding-2B
hkunlp/instructor-base
hkunlp/instructor-large
hkunlp/instructor-xl
allenai-specter
说明: 以上是Sentence Transformers提供的官方模型,查看更多支持的社区模型,请参考Sentence Transformers社区模型

llama.cpp-b6152

SGLang-0.5.11

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考SGLang官方文档

模型类型 模型
大语言模型 DeepSeek (v1, v2, v3/R1)
Kimi K2 (Thinking, Instruct)
Kimi Linear (48B-A3B)
GPT-OSS
Qwen (3.5, 3, 3MoE, 3Next, 2.5, 2 series)
Llama (2, 3.x, 4 series)
Mistral (Mixtral, NeMo, Small3)
Gemma (v1, v2, v3)
Phi (Phi-1.5, Phi-2, Phi-3, Phi-4, Phi-MoE series)
MiniCPM (v3, 4B)
OLMo (2, 3)
OLMoE (Open MoE)
MiniMax-M2 (M2, M2.1, M2.5)
StableLM (3B, 7B)
Command-(R,A) (Cohere)
DBRX (Databricks)
Grok (xAI)
ChatGLM (GLM-130B family)
InternLM 2 (7B, 20B)
ExaONE 3 (Korean-English)
Baichuan 2 (7B, 13B)
XVERSE (MoE)
SmolLM (135M–1.7B)
GLM-4 (Multilingual 9B)
MiMo (7B series)
ERNIE-4.5 (4.5, 4.5MoE series)
Arcee AFM-4.5B
Persimmon (8B)
Solar (10.7B)
Tele FLM (52B-1T)
Ling (16.8B–290B)
Granite 3.0, 3.1 (IBM)
Granite 3.0 MoE (IBM)
GPT-J (6B)
Orion (14B)
Llama Nemotron Super (v1, v1.5, NVIDIA)
Llama Nemotron Ultra (v1, NVIDIA)
NVIDIA Nemotron Nano 2.0
NVIDIA Nemotron 3 Super (NVIDIA)
NVIDIA Nemotron 3 Nano (NVIDIA)
StarCoder2 (3B-15B)
Jet-Nemotron
Trinity (Nano, Mini)
LFM2 (350M, 1.2B)
LFM2-MoE (8B-A1B, 24B-A2B)
Falcon-H1 (0.5B–34B)
Hunyuan-Large (389B, MoE)
IBM Granite 4.0 (Hybrid, Dense)
Sarvam 2 (30B-A2B, 105B-A10B)
Laguna XS.2 (poolside)
多模态模型 Qwen-VL (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3-Omni)
DeepSeek-VL2
DeepSeek-OCR / OCR-2
Janus-Pro (1B, 7B)
MiniCPM-V / MiniCPM-o
Llama 3.2 Vision (11B)
LLaVA (v1.5 & v1.6)
LLaVA-NeXT (8B, 72B)
LLaVA-OneVision
Gemma 3 (Multimodal)
Kimi-VL (A3B)
Mistral-Small-3.1-24B
Phi-4-multimodal-instruct
MiMo-VL (7B)
GLM-4.5V (106B) / GLM-4.1V(9B)
GLM-OCR
DotsVLM (General/OCR)
DotsVLM-OCR
NVILA (8B, 15B, Lite-2B, Lite-8B, Lite-15B)
NVIDIA Nemotron Nano 2.0 VL
Ernie4.5-VL
JetVLM
Step3-VL (10B)
Qwen3-ASR (0.6B, 1.7B)
Qwen3-Omni
LFM2-VL
音频转写模型 Whisper
Qwen3-ASR (0.6B, 1.7B)
扩散语言模型 LLaDA2.0 (mini, flash)
SDAR (JetLM, dense/MoE)
嵌入模型 E5 (Llama/Mistral based)
GTE-Qwen2
Qwen3-Embedding
BGE
GME (Multimodal)
CLIP
奖励模型 Llama (3.1 Reward / LlamaForSequenceClassification)
Gemma 2 (27B Reward / Gemma2ForSequenceClassification)
InternLM 2 (Reward / InternLM2ForRewardModel)
Qwen2.5 (Reward - Math / Qwen2ForRewardModel)
Qwen2.5 (Reward - Sequence / Qwen2ForSequenceClassification)
重排序模型 BGE-Reranker (BgeRerankModel)
Qwen3-Reranker (decoder-only yes/no)
Qwen3-VL-Reranker (multimodal yes/no)
分类模型 LlamaForSequenceClassification
Qwen2ForSequenceClassification
Qwen3ForSequenceClassification
BertForSequenceClassification
Gemma2ForSequenceClassification

MindIE 2.3.0

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考MindIE官方文档

模型类型 模型
大语言模型 Qwen3-235B-A22B
Qwen3-30B-A3B
DeepSeek-R1-0528
DeepSeek-V2-236B
DeepSeek-V3-0324
DeepSeek-V3.1
Mixtral-8x7B-Instruct-V0.1
Mixtral-8x22B-Instruct-V0.1
Kimi K2
GLM4.5
Ernie 4.5
DeepSeek-R1-Distill-Llama-8B
DeepSeek-R1-Distill-Llama-70B
DeepSeek-R1-Distill-Qwen-1.5B
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Qwen-14B
Qwen2-7B-Instruct
Qwen2-72B-Instruct
Qwen2.5-7B-Instruct
Qwen2.5-14B-Instruct
Qwen2.5-32B-Instruct
Qwen2.5-72B-Instruct
Qwen3-4B
Qwen3-8B
Qwen3-14B
Qwen3-32B
LLaMA3-8B
LLaMA3-70B
LLaMA3.1-8B
LLaMA3.1-70B
LLaMA3.1-405B
ChatGLM3-6B
GLM4-9B
Baichuan2-7B
Baichuan2-13B
Bloom-7B
多模态理解模型 GLM-4V-9B
MiniCPM-V2.6-8B
InternVL2-8B
InternVL2-40B
InternVL2.5-8B
InternVL2.5-78B
Qwen2-Audio-7B-Instruct
Qwen2-VL-7B-Instruct
Qwen2-VL-72B-Instruct
Qwen2.5-VL-7B-Instruct
Qwen2.5-VL-32B-Instruct
Qwen2.5-VL-72B-Instruct
VITA1.5-8B
多模态生成模型 Stable Diffusion 1.5
Stable Diffusion 2.1
Stable Diffusion XL
Stable Diffusion XL_lighting
Stable Diffusion 3
Stable Video Diffusion
Stable Audio Open v1.0
OpenSora v1.2
OpenSoraPlan v1.2
OpenSoraPlan v1.3
DiT
sd-webui
CogView3-Plus-3B
CogVideoX-2B
CogVideoX-5B
FLUX.1-dev
HunyuanDiT
HunyuanVideo
Wan2.1-T2V-14B
Wan2.1-I2V-14B
Wan2.2-T2V-A14B
Wan2.2-I2V-A14B
Wan2.2-TI2V-5B

MindIE 1.0.0

以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考MindIE官方文档

模型类型 模型
大语言模型 DeepSeek-V2-Lite-16B
DeepSeek-V2-236B
Qwen2.5-72B
Qwen2.5-32B
Qwen2.5-14B
Qwen2.5-7B
Qwen2-57B-A14B
Qwen2-72B
Qwen2-7B
Qwen1.5-0.5B
Qwen1.5-1.8B
Qwen1.5-4B
Qwen-7B
Qwen-14B
Qwen-72B
LLaMA3-8B
LLaMA3-70B
LLaMA3.1-8B
LLaMA3.1-70B
LLaMA3.1-405B
LLaMA-7B
LLaMA-13B
LLaMA-33B
LLaMA-65B
LLaMA2-7B
LLaMA2-13B
LLaMA2-70B
ChatGLM2-6B
ChatGLM3-6B
ChatGLM3-6B-32K
GLM4-9B-Chat
Baichuan2-7B
Baichuan2-13B
Bloom-7B
Bloom-176B
CodeLLaMA-34B
StarCoder-15.5B
StarCoder2-15B
Yi-6B-200K
Yi-34B-200K
CodeGeeX2-6B
CodeShell-7B
Gemma-7B
GPT-NEOX-20B
Ziya-Coding-34B
InternLM2-20B
InternLM-20B
InternLM2-7B
InternLM2-20B
Mixtral-8x7B-Instruct-V0.1
Mixtral-8x22B-Instruct-V0.1
Vicuna-13B
嵌入模型 bge-large-zh-v1.5
bge-reranker-large
bge-m3
多模态理解模型 InternVL-Chat-V1-2
InternVL-Chat-V1-5
InternVL2-8B
InternVL2-40B
Qwen-VL-9.6B
Qwen2-Audio-7B-Instruct
Qwen2-VL-7B-Instruct
internLM-xcomposer2-vl-7B
internLM-XComposer2-4KHD-7B
LLava-1.6-mistral-7B
LLava-1.6-vicuna-7B
LLava-1.6-vicuna-13B
LLava-v1.6-34b-hf
LLava-next-video-34b
LLava-next-video-7b
LLava-v1.5-13B
LLava-v1.5-7B
MiniCPM-Llama3-V-2_5
MiniCPM-V-2
多模态生成模型 Stable Diffusion 1.5
Stable Diffusion 2.1
Stable Diffusion XL
Stable Diffusion XL_controlnet
Stable Diffusion XL_inpainting
Stable Diffusion XL_prompt_weight
Stable Diffusion 3
Stable Video Diffusion
Stable Audio Open v1.0
OpenSora v1.2
DiT
sd-webui
CogView3-Plus-3B
HunyuanDiT