不同推理模板适用的模型不同,部署模型时需注意推理模板兼容性。本章介绍系统推理模板兼容的模型。用户可将兼容列表中的模型上传到AI模型平台,并使用对应的系统推理模板部署。
说明: 如使用自定义推理模板,请参考对应模板的官方兼容性说明。
vLLM 0.20.2
以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| AfmoeForCausalLM | Afmoe | TBA |
| ApertusForCausalLM | Apertus | swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc. |
| AquilaForCausalLM | Aquila, Aquila2 | BAAI/Aquila-7B, BAAI/AquilaChat-7B, etc. |
| ArceeForCausalLM | Arcee (AFM) | arcee-ai/AFM-4.5B-Base, etc. |
| ArcticForCausalLM | Arctic | Snowflake/snowflake-arctic-base, Snowflake/snowflake-arctic-instruct, etc. |
| AXK1ForCausalLM | A.X-K1 | skt/A.X-K1, etc. |
| BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat, baichuan-inc/Baichuan-7B, etc. |
| BailingMoeForCausalLM | Ling | inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc. |
| BailingMoeV2ForCausalLM | Ling | inclusionAI/Ling-mini-2.0, etc. |
| BailingMoeV2_5ForCausalLM | Ling | inclusionAI/Ling-2.5-1T, inclusionAI/Ring-2.5-1T |
| BambaForCausalLM | Bamba | ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B |
| BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom, bigscience/bloomz, etc. |
| ChatGLMModel, ChatGLMForConditionalGeneration | ChatGLM | zai-org/chatglm2-6b, zai-org/chatglm3-6b, thu-coai/ShieldLM-6B-chatglm3, etc. |
| CohereForCausalLM, Cohere2ForCausalLM | Command-R, Command-A | CohereLabs/c4ai-command-r-v01, CohereLabs/c4ai-command-r7b-12-2024, CohereLabs/c4ai-command-a-03-2025, CohereLabs/command-a-reasoning-08-2025, etc. |
| CwmForCausalLM | CWM | facebook/cwm, etc. |
| DbrxForCausalLM | DBRX | databricks/dbrx-base, databricks/dbrx-instruct, etc. |
| DeciLMForCausalLM | DeciLM | nvidia/Llama-3_3-Nemotron-Super-49B-v1, etc. |
| DeepseekForCausalLM | DeepSeek | deepseek-ai/deepseek-llm-67b-base, deepseek-ai/deepseek-llm-7b-chat, etc. |
| DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2, deepseek-ai/DeepSeek-V2-Chat, etc. |
| DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3, deepseek-ai/DeepSeek-R1, deepseek-ai/DeepSeek-V3.1, etc. |
| DeepseekV4ForCausalLM | DeepSeek-V4 | deepseek-ai/DeepSeek-V4-Flash, deepseek-ai/DeepSeek-V4-Pro, etc. |
| Dots1ForCausalLM | dots.llm1 | rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc. |
| DotsOCRForCausalLM | dots_ocr | rednote-hilab/dots.ocr |
| Ernie4_5ForCausalLM | Ernie4.5 | baidu/ERNIE-4.5-0.3B-PT, etc. |
| Ernie4_5_MoeForCausalLM | Ernie4.5MoE | baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc. |
| ExaoneForCausalLM | EXAONE-3 | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct, etc. |
| ExaoneMoEForCausalLM | K-EXAONE | LGAI-EXAONE/K-EXAONE-236B-A23B, etc. |
| Exaone4ForCausalLM | EXAONE-4 | LGAI-EXAONE/EXAONE-4.0-32B, etc. |
| Fairseq2LlamaForCausalLM | Llama (fairseq2 format) | mgleize/fairseq2-dummy-Llama-3.2-1B, etc. |
| FalconForCausalLM | Falcon | tiiuae/falcon-7b, tiiuae/falcon-40b, tiiuae/falcon-rw-7b, etc. |
| FalconMambaForCausalLM | FalconMamba | tiiuae/falcon-mamba-7b, tiiuae/falcon-mamba-7b-instruct, etc. |
| FalconH1ForCausalLM | Falcon-H1 | tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc. |
| FlexOlmoForCausalLM | FlexOlmo | allenai/FlexOlmo-7x7B-1T, allenai/FlexOlmo-7x7B-1T-RT, etc. |
| GemmaForCausalLM | Gemma | google/gemma-2b, google/gemma-1.1-2b-it, etc. |
| Gemma2ForCausalLM | Gemma 2 | google/gemma-2-9b, google/gemma-2-27b, etc. |
| Gemma3ForCausalLM | Gemma 3 | google/gemma-3-1b-it, etc. |
| Gemma3nForCausalLM | Gemma 3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| Gemma4ForCausalLM | Gemma 4 | google/gemma-4-E2B-it, etc. |
| GlmForCausalLM | GLM-4 | zai-org/glm-4-9b-chat-hf, etc. |
| Glm4ForCausalLM | GLM-4-0414 | zai-org/GLM-4-32B-0414, etc. |
| Glm4MoeForCausalLM | GLM-4.5, GLM-4.6, GLM-4.7 | zai-org/GLM-4.5, etc. |
| Glm4MoeLiteForCausalLM | GLM-4.7-Flash | zai-org/GLM-4.7-Flash, etc. |
| GPT2LMHeadModel | GPT-2 | openai-community/gpt2, openai-community/gpt2-xl, etc. |
| GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder, bigcode/gpt_bigcode-santacoder, WizardLM/WizardCoder-15B-V1.0, etc. |
| GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b, nomic-ai/gpt4all-j, etc. |
| GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b, EleutherAI/pythia-12b, OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5, databricks/dolly-v2-12b, stabilityai/stablelm-tuned-alpha-7b, etc. |
| GptOssForCausalLM | GPT-OSS | openai/gpt-oss-120b, openai/gpt-oss-20b |
| GraniteForCausalLM | Granite 3.0, Granite 3.1, PowerLM | ibm-granite/granite-3.0-2b-base, ibm-granite/granite-3.1-8b-instruct, ibm/PowerLM-3b, etc. |
| GraniteMoeForCausalLM | Granite 3.0 MoE, PowerMoE | ibm-granite/granite-3.0-1b-a400m-base, ibm-granite/granite-3.0-3b-a800m-instruct, ibm/PowerMoE-3b, etc. |
| GraniteMoeHybridForCausalLM | Granite 4.0 MoE Hybrid | ibm-granite/granite-4.0-tiny-preview, etc. |
| GraniteMoeSharedForCausalLM | Granite MoE Shared | ibm-research/moe-7b-1b-active-shared-experts (test model) |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| Grok1ModelForCausalLM | Grok1 | hpcai-tech/grok-1. |
| Grok1ForCausalLM | Grok2 | xai-org/grok-2 |
| HunYuanDenseV1ForCausalLM | Hunyuan Dense | tencent/Hunyuan-7B-Instruct |
| HunYuanMoEV1ForCausalLM | Hunyuan-A13B | tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc. |
| HYV3ForCausalLM | HY3 | tencent/Hy3-preview-Base, tencent/Hy3-preview |
| HyperCLOVAXForCausalLM | HyperCLOVAX-SEED-Think-14B | naver-hyperclovax/HyperCLOVAX-SEED-Think-14B |
| InternLMForCausalLM | InternLM | internlm/internlm-7b, internlm/internlm-chat-7b, etc. |
| InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b, internlm/internlm2-chat-7b, etc. |
| InternLM3ForCausalLM | InternLM3 | internlm/internlm3-8b-instruct, etc. |
| IQuestCoderForCausalLM | IQuestCoderV1 | IQuestLab/IQuest-Coder-V1-40B-Instruct, etc. |
| IQuestLoopCoderForCausalLM | IQuestLoopCoderV1 | IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct, etc. |
| JAISLMHeadModel | Jais | inceptionai/jais-13b, inceptionai/jais-13b-chat, inceptionai/jais-30b-v3, inceptionai/jais-30b-chat-v3, etc. |
| Jais2ForCausalLM | Jais2 | inceptionai/Jais-2-8B-Chat, inceptionai/Jais-2-70B-Chat, etc. |
| JambaForCausalLM | Jamba | ai21labs/AI21-Jamba-1.5-Large, ai21labs/AI21-Jamba-1.5-Mini, ai21labs/Jamba-v0.1, etc. |
| KimiLinearForCausalLM | Kimi-Linear-48B-A3B-Base, Kimi-Linear-48B-A3B-Instruct | moonshotai/Kimi-Linear-48B-A3B-Base, moonshotai/Kimi-Linear-48B-A3B-Instruct |
| Lfm2ForCausalLM | LFM2 | LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc. |
| Lfm2MoeForCausalLM | LFM2MoE | LiquidAI/LFM2-8B-A1B-preview, etc. |
| LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct, meta-llama/Meta-Llama-3.1-70B, meta-llama/Meta-Llama-3-70B-Instruct, meta-llama/Llama-2-70b-hf, 01-ai/Yi-34B, etc. |
| LongcatFlashForCausalLM | LongCat-Flash | meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8 |
| MambaForCausalLM | Mamba | state-spaces/mamba-130m-hf, state-spaces/mamba-790m-hf, state-spaces/mamba-2.8b-hf, etc. |
| Mamba2ForCausalLM | Mamba2 | mistralai/Mamba-Codestral-7B-v0.1, etc. |
| MiMoForCausalLM | MiMo | XiaomiMiMo/MiMo-7B-RL, etc. |
| MiMoV2FlashForCausalLM | MiMoV2Flash | XiaomiMiMo/MiMo-V2-Flash, etc. |
| MiniCPMForCausalLM | MiniCPM | openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc. |
| MiniCPM3ForCausalLM | MiniCPM3 | openbmb/MiniCPM3-4B, etc. |
| MiniMaxForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01-hf, etc. |
| MiniMaxM2ForCausalLM | MiniMax-M2, MiniMax-M2.1 | MiniMaxAI/MiniMax-M2, etc. |
| MistralForCausalLM | Ministral-3, Mistral, Mistral-Instruct | mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc. |
| MistralLarge3ForCausalLM | Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 | mistralai/Mistral-Large-3-675B-Base-2512, mistralai/Mistral-Large-3-675B-Instruct-2512, etc. |
| MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1, mistralai/Mixtral-8x7B-Instruct-v0.1, mistral-community/Mixtral-8x22B-v0.1, etc. |
| MPTForCausalLM | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | mosaicml/mpt-7b, mosaicml/mpt-7b-storywriter, mosaicml/mpt-30b, etc. |
| NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base, mgoin/Nemotron-4-340B-Base-hf-FP8, etc. |
| NemotronHForCausalLM | Nemotron-H | nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc. |
| OlmoForCausalLM | OLMo | allenai/OLMo-1B-hf, allenai/OLMo-7B-hf, etc. |
| Olmo2ForCausalLM | OLMo2 | allenai/OLMo-2-0425-1B, etc. |
| Olmo3ForCausalLM | OLMo3 | allenai/Olmo-3-7B-Instruct, allenai/Olmo-3-32B-Think, etc. |
| OlmoHybridForCausalLM | OLMo Hybrid | allenai/Olmo-Hybrid-7B |
| OlmoeForCausalLM | OLMoE | allenai/OLMoE-1B-7B-0924, allenai/OLMoE-1B-7B-0924-Instruct, etc. |
| OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b, facebook/opt-iml-max-30b, etc. |
| OrionForCausalLM | Orion | OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc. |
| OuroForCausalLM | ouro | ByteDance/Ouro-1.4B, ByteDance/Ouro-2.6B, etc. |
| PanguEmbeddedForCausalLM | openPangu-Embedded-7B | FreedomIntelligence/openPangu-Embedded-7B-V1.1 |
| PanguProMoEV2ForCausalLM | openpangu-pro-moe-v2 | N/A |
| PanguUltraMoEForCausalLM | openpangu-ultra-moe-718b-model | FreedomIntelligence/openPangu-Ultra-MoE-718B-V1.1 |
| Param2MoEForCausalLM | param2moe | bharatgenai/Param2-17B-A2.4B-Thinking, etc. |
| PhiForCausalLM | Phi | microsoft/phi-1_5, microsoft/phi-2, etc. |
| Phi3ForCausalLM | Phi-4, Phi-3 | microsoft/Phi-4-mini-instruct, microsoft/Phi-4, microsoft/Phi-3-mini-4k-instruct, microsoft/Phi-3-mini-128k-instruct, microsoft/Phi-3-medium-128k-instruct, etc. |
| PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct, etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base, adept/persimmon-8b-chat, etc. |
| Plamo2ForCausalLM | PLaMo2 | pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc. |
| Plamo3ForCausalLM | PLaMo3 | pfnet/plamo-3-nict-2b-base, pfnet/plamo-3-nict-8b-base, etc. |
| QWenLMHeadModel | Qwen | Qwen/Qwen-7B, Qwen/Qwen-7B-Chat, etc. |
| Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/QwQ-32B-Preview, Qwen/Qwen2-7B-Instruct, Qwen/Qwen2-7B, etc. |
| Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B, Qwen/Qwen1.5-MoE-A2.7B-Chat, etc. |
| Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B, etc. |
| Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-30B-A3B, etc. |
| Qwen3NextForCausalLM | Qwen3NextMoE | Qwen/Qwen3-Next-80B-A3B-Instruct, etc. |
| RWForCausalLM | Falcon RW | tiiuae/falcon-40b, etc. |
| Rnj1ForCausalLM | Rnj1 | EssentialAI/rnj-1-instruct, etc. |
| SarvamMoEForCausalLM | Sarvam 2 | sarvamai/sarvam2-30b-a3b, etc. |
| SarvamMLAForCausalLM | Sarvam 2 | sarvamai/sarvam2-105b-a9b, etc. |
| SeedOssForCausalLM | SeedOss | ByteDance-Seed/Seed-OSS-36B-Instruct, etc. |
| SolarForCausalLM | Solar Pro | upstage/solar-pro-preview-instruct, etc. |
| StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t, stabilityai/stablelm-base-alpha-7b-v2, etc. |
| StableLMEpochForCausalLM | StableLM Epoch | stabilityai/stablelm-zephyr-3b, etc. |
| Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc. |
| Step1ForCausalLM | Step-Audio | stepfun-ai/Step-Audio-EditX, etc. |
| Step3p5ForCausalLM | Step-3.5-flash | stepfun-ai/Step-3.5-Flash, etc. |
| TeleChatForCausalLM | TeleChat | chuhac/TeleChat2-35B, etc. |
| TeleChat2ForCausalLM | TeleChat2 | Tele-AI/TeleChat2-3B, Tele-AI/TeleChat2-7B, Tele-AI/TeleChat2-35B, etc. |
| TeleChat3ForCausalLM | TeleChat3 | Tele-AI/TeleChat3-36B-Thinking, Tele-AI/TeleChat3-Coder-36B-Thinking, etc. |
| TeleFLMForCausalLM | TeleFLM | CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc. |
| XverseForCausalLM | XVERSE | xverse/XVERSE-7B-Chat, xverse/XVERSE-13B-Chat, xverse/XVERSE-65B-Chat, etc. |
| MiniMaxM1ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80k, etc. |
| MiniMaxText01ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01, etc. |
| Zamba2ForCausalLM | Zamba2 | Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc. |
| SmolLM3ForCausalLM | SmolLM3 | HuggingFaceTB/SmolLM3-3B |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertModel | BERT-based | BAAI/bge-base-en-v1.5, Snowflake/snowflake-arctic-embed-xs, etc. |
| BertSpladeSparseEmbeddingModel | SPLADE | naver/splade-v3 |
| ErnieModel | BERT-like Chinese ERNIE | shibing624/text2vec-base-chinese-sentence |
| Gemma2ModelC | Gemma 2-based | BAAI/bge-multilingual-gemma2, etc. |
| Gemma3TextModelC | Gemma 3-based | google/embeddinggemma-300m, etc. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| GteModel | Arctic-Embed-2.0-M | Snowflake/snowflake-arctic-embed-m-v2.0. |
| GteNewModel | mGTE-TRM (see note) | Alibaba-NLP/gte-multilingual-base, etc. |
| JinaEmbeddingsV5ModelC | Qwen3-based with task-specific LoRA adapters | jinaai/jina-embeddings-v5-text-small (see note) |
| LlamaBidirectionalModelC | Llama-based with bidirectional attention | nvidia/llama-nemotron-embed-1b-v2, etc. |
| LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. | Llama-based | intfloat/e5-mistral-7b-instruct, etc. |
| ModernBertModel | ModernBERT-based | Alibaba-NLP/gte-modernbert-base, etc. |
| NomicBertModel | Nomic BERT | nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc. |
| Qwen2ModelC, Qwen2ForCausalLMC | Qwen2-based | ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc. |
| Qwen3ModelC, Qwen3ForCausalLMC | Qwen3-based | Qwen/Qwen3-Embedding-0.6B, etc. |
| RobertaModel, RobertaForMaskedLM | RoBERTa-based | sentence-transformers/all-roberta-large-v1, etc. |
| VoyageQwen3BidirectionalEmbedModelC | Voyage Qwen3-based with bidirectional attention | voyageai/voyage-4-nano, etc. |
| XLMRobertaModel | XLMRobertaModel-based | BAAI/bge-m3 (see note), intfloat/multilingual-e5-base, jinaai/jina-embeddings-v3 (see note), etc. |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| JambaForSequenceClassification | Jamba | ai21labs/Jamba-tiny-reward-dev, etc. |
| Qwen3ForSequenceClassificationC | Qwen3-based | Skywork/Skywork-Reward-V2-Qwen3-0.6B, etc. |
| LlamaForSequenceClassificationC | Llama-based | Skywork/Skywork-Reward-V2-Llama-3.2-1B, etc. |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward, internlm/internlm2-7b-reward, etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B, etc. |
| LlamaForCausalLM | Llama-based | peiyi9979/math-shepherd-mistral-7b-prm, etc. |
| Qwen2ForProcessRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-PRM-7B, etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| ErnieForSequenceClassification | BERT-like Chinese ERNIE | Forrest20231206/ernie-3.0-base-zh-cls |
| GPT2ForSequenceClassification | GPT2 | nie3e/sentiment-polish-gpt2-small |
| Qwen2ForSequenceClassificationC | Qwen2-based | jason9693/Qwen2.5-1.5B-apeach |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForSequenceClassification | BERT-based | cross-encoder/ms-marco-MiniLM-L-6-v2, etc. |
| GemmaForSequenceClassification | Gemma-based | BAAI/bge-reranker-v2-gemma(see note), etc. |
| GteNewForSequenceClassification | mGTE-TRM (see note) | Alibaba-NLP/gte-multilingual-reranker-base, etc. |
| LlamaBidirectionalForSequenceClassificationC | Llama-based with bidirectional attention | nvidia/llama-nemotron-rerank-1b-v2, etc. |
| Qwen2ForSequenceClassificationC | Qwen2-based | mixedbread-ai/mxbai-rerank-base-v2(see note), etc. |
| Qwen3ForSequenceClassificationC | Qwen3-based | tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B(see note), etc. |
| RobertaForSequenceClassification | RoBERTa-based | cross-encoder/quora-roberta-base, etc. |
| XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3, etc. |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForTokenClassification | bert-based | boltuix/NeuroBERT-NER (see note), etc. |
| ErnieForTokenClassification | BERT-like Chinese ERNIE | gyr66/Ernie-3.0-base-chinese-finetuned-ner |
| ModernBertForTokenClassification | ModernBERT-based | disham993/electrical-ner-ModernBERT-base |
| Qwen3ForTokenClassificationC | Qwen3-based | bd2lcco/Qwen3-0.6B-finetuned |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward, internlm/internlm2-7b-reward, etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 |
|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
| AudioFlamingo3ForConditionalGeneration | AudioFlamingo3 | T + A | nvidia/audio-flamingo-3-hf, nvidia/music-flamingo-hf |
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereLabs/aya-vision-8b, CohereLabs/aya-vision-32b, etc. |
| BagelForConditionalGeneration | BAGEL | T + I+ | ByteDance-Seed/BAGEL-7B-MoT |
| BeeForConditionalGeneration | Bee-8B | T + IE+ | Open-Bee/Bee-8B-RL, Open-Bee/Bee-8B-SFT |
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b, etc. |
| CheersForConditionalGeneration | Cheers | T + I | ai9stars/Cheers |
| Cohere2VisionForConditionalGeneration | Command A Vision | T + I+ | CohereLabs/command-a-vision-07-2025, etc. |
| DeepseekVLV2ForCausalLM | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2, etc. |
| DeepseekOCRForCausalLM | DeepSeek-OCR | T + I+ | deepseek-ai/DeepSeek-OCR, etc. |
| DeepseekOCR2ForCausalLM | DeepSeek-OCR-2 | T + I+ | deepseek-ai/DeepSeek-OCR-2, etc. |
| Eagle2_5_VLForConditionalGeneration | Eagle2.5-VL | T + IE+ | nvidia/Eagle2.5-8B, etc. |
| Ernie4_5_VLMoeForConditionalGeneration | Ernie4.5-VL | T + I+/ V+ | baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT |
| Exaone4_5_ForConditionalGeneration | EXAONE-4.5 | T + IE+ | LGAI-EXAONE/EXAONE-4.5-33B, etc. |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b, etc. |
| Gemma3ForConditionalGeneration | Gemma 3 | T + IE+ | google/gemma-3-4b-it, google/gemma-3-27b-it, etc. |
| Gemma3nForConditionalGeneration | Gemma 3n | T + I + A | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| Gemma4ForConditionalGeneration | Gemma 4 | T + I+ + V + A* | google/gemma-4-E2B-it, etc. |
| GLM4VForCausalLM^ | GLM-4V | T + I | zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc. |
| Glm4vForConditionalGeneration | GLM-4.1V-Thinking | T + IE+ + VE+ | zai-org/GLM-4.1V-9B-Thinking, etc. |
| Glm4vMoeForConditionalGeneration | GLM-4.5V | T + IE+ + VE+ | zai-org/GLM-4.5V, etc. |
| GlmOcrForConditionalGeneration | GLM-OCR | T + IE+ | zai-org/GLM-OCR, etc. |
| Granite4VisionForConditionalGeneration | Granite 4 Vision | T + IE+ | ibm-granite/granite-4.1-3b-vision, etc. |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b |
| HCXVisionForCausalLM | HyperCLOVAX-SEED-Vision-Instruct-3B | T + I+ + V+ | naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B |
| HCXVisionV2ForCausalLM | HyperCLOVAX-SEED-Think-32B | T + I+ + V+ | naver-hyperclovax/HyperCLOVAX-SEED-Think-32B |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. |
| HunYuanVLForConditionalGeneration | HunyuanOCR | T + IE+ | tencent/HunyuanOCR, etc. |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3, etc. |
| IsaacForConditionalGeneration | Isaac | T + I+ | PerceptronAI/Isaac-0.1 |
| InternS1ForConditionalGeneration | Intern-S1 | T + IE+ + VE+ | internlm/Intern-S1, internlm/Intern-S1-mini, etc. |
| InternS1ProForConditionalGeneration | Intern-S1-Pro | T + IE+ + VE+ | internlm/Intern-S1-Pro, etc. |
| InternVLChatModel | InternVL 3.5, InternVL 3.0, InternVideo 2.5, InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE+ + (VE+) | OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. |
| InternVLForConditionalGeneration | InternVL 3.0 (HF format) | T + IE+ + VE+ | OpenGVLab/InternVL3-1B-hf, etc. |
| KananaVForConditionalGeneration | Kanana-V | T + I+ | kakaocorp/kanana-1.5-v-3b-instruct, etc. |
| KeyeForConditionalGeneration | Keye-VL-8B-Preview | T + IE+ + VE+ | Kwai-Keye/Keye-VL-8B-Preview |
| KeyeVL1_5ForConditionalGeneration | Keye-VL-1_5-8B | T + IE+ + VE+ | Kwai-Keye/Keye-VL-1_5-8B |
| KimiAudioForConditionalGeneration | Kimi-Audio | T + A+ | moonshotai/Kimi-Audio-7B-Instruct |
| KimiK25ForConditionalGeneration | Kimi-K2.5 | T + I+ | moonshotai/Kimi-K2.5 |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking |
| LightOnOCRForConditionalGeneration | LightOnOCR-1B | T + I+ | lightonai/LightOnOCR-1B, etc |
| Lfm2VlForConditionalGeneration | LFM2-VL | T + I+ | LiquidAI/LFM2-VL-450M, LiquidAI/LFM2-VL-3B, LiquidAI/LFM2-VL-8B-A1B, etc. |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. |
| Llama_Nemotron_Nano_VL | Llama Nemotron Nano VL | T + IE+ | nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 |
| LlavaForConditionalGeneration | LLaVA-1.5, Pixtral (HF Transformers) | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), mistral-community/pixtral-12b, etc. |
| LlavaNextForConditionalGeneration | LLaVA-NeXT, Granite Vision | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, ibm-granite/granite-vision-3.3-2b, etc. |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. |
| MiDashengLMModel | MiDashengLM | T + A+ | mispeech/midashenglm-7b |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, openbmb/MiniCPM-V-4, openbmb/MiniCPM-V-4_5, etc. |
| MiniMaxVL01ForConditionalGeneration | MiniMax-VL | T + IE+ | MiniMaxAI/MiniMax-VL-01, etc. |
| Mistral3ForConditionalGeneration | Mistral3 (HF Transformers) | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. |
| MolmoForCausalLM | Molmo | T + I+ | allenai/Molmo-7B-D-0924, allenai/Molmo-7B-O-0924, etc. |
| Molmo2ForConditionalGeneration | Molmo2 | T + I+ / V | allenai/Molmo2-4B, allenai/Molmo2-8B, allenai/Molmo2-O-7B |
| MusicFlamingoForConditionalGeneration | MusicFlamingo | T + A | nvidia/music-flamingo-2601-hf, nvidia/music-flamingo-think-2601-hf |
| NVLM_D_Model | NVLM-D 1.0 | T + I+ | nvidia/NVLM-D-72B, etc. |
| OpenCUAForConditionalGeneration | OpenCUA-7B | T + IE+ | xlangai/OpenCUA-7B |
| OpenPanguVLForConditionalGeneration | openpangu-VL | T + IE+ + VE+ | FreedomIntelligence/openPangu-VL-7B |
| Ovis | Ovis2, Ovis1.6 | T + I+ | AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc. |
| Ovis2_5 | Ovis2.5 | T + I+ + V | AIDC-AI/Ovis2.5-9B, etc. |
| Ovis2_6ForCausalLM | Ovis2.6 | T + I+ + V | AIDC-AI/Ovis2.6-2B, etc. |
| Ovis2_6_MoeForCausalLM | Ovis2.6 | T + I+ + V | AIDC-AI/Ovis2.6-30B-A3B, etc. |
| PaddleOCRVLForConditionalGeneration | Paddle-OCR | T + I+ | PaddlePaddle/PaddleOCR-VL, etc. |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+ / I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. |
| Phi4ForCausalLMV | Phi-4-reasoning-vision | T + I+ | microsoft/Phi-4-reasoning-vision-15B, etc. |
| PixtralForConditionalGeneration | Ministral 3 (Mistral format), Mistral 3 (Mistral format), Mistral Large 3 (Mistral format), Pixtral (Mistral format) | T + I+ | mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-Small-3.1-24B-Instruct-2503, mistralai/Mistral-Large-3-675B-Instruct-2512 mistralai/Pixtral-12B-2409 etc. |
| QwenVLForConditionalGeneration^ | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-3B, Qwen/Qwen2.5-Omni-7B |
| Qwen3_5ForConditionalGeneration | Qwen3.5 | T + IE+ + VE+ | Qwen/Qwen3.5-9B-Instruct, etc. |
| Qwen3_5MoeForConditionalGeneration | Qwen3.5-MOE | T + IE+ + VE+ | Qwen/Qwen3.5-35B-A3B-Instruct, etc. |
| Qwen3VLForConditionalGeneration | Qwen3-VL | T + IE+ + VE+ | Qwen/Qwen3-VL-4B-Instruct, etc. |
| Qwen3VLMoeForConditionalGeneration | Qwen3-VL-MOE | T + IE+ + VE+ | Qwen/Qwen3-VL-30B-A3B-Instruct, etc. |
| Qwen3OmniMoeThinkerForConditionalGeneration | Qwen3-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking |
| Qwen3ASRForConditionalGeneration | Qwen3-ASR | T + A+ | Qwen/Qwen3-ASR-1.7B |
| RForConditionalGeneration | R-VL-4B | T + IE+ | YannQi/R-4B |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct |
| Step3VLForConditionalGeneration | Step3-VL | T + I+ | stepfun-ai/step3 |
| StepVLForConditionalGeneration | Step3-VL-10B | T + I+ | stepfun-ai/Step3-VL-10B |
| TarsierForConditionalGeneration | Tarsier | T + IE+ | omni-search/Tarsier-7b, omni-search/Tarsier-34b |
| Tarsier2ForConditionalGeneration^ | Tarsier2 | T + IE+ + VE+ | omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115 |
| UltravoxModel | Ultravox | T + AE+ | fixie-ai/ultravox-v0_5-llama-3_2-1b |
| Emu3ForConditionalGeneration | Emu3 | T + I | BAAI/Emu3-Chat-hf |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| CohereAsrForConditionalGeneration | Cohere-Transcribe | CohereLabs/cohere-transcribe-03-2026 |
| FireRedASR2ForConditionalGeneration | FireRedASR2 | allendou/FireRedASR2-LLM-vllm, etc. |
| FireRedLIDForConditionalGeneration | FireRedLID | PatchyTisa/FireRedLID-vllm, etc. |
| FunASRForConditionalGeneration | FunASR | allendou/Fun-ASR-Nano-2512-vllm, etc. |
| Gemma3nForConditionalGeneration | Gemma3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| GlmAsrForConditionalGeneration | GLM-ASR | zai-org/GLM-ASR-Nano-2512 |
| GraniteSpeechForConditionalGeneration | Granite Speech | ibm-granite/granite-4.0-1b-speech, ibm-granite/granite-speech-3.3-2b, etc. |
| Qwen3ASRForConditionalGeneration | Qwen3-ASR | Qwen/Qwen3-ASR-1.7B, etc. |
| Qwen3OmniMoeThinkerForConditionalGeneration | Qwen3-Omni | Qwen/Qwen3-Omni-30B-A3B-Instruct, etc. |
| VoxtralForConditionalGeneration | Voxtral (Mistral format) | mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc. |
| WhisperForConditionalGeneration | Whisper | openai/whisper-small, openai/whisper-large-v3-turbo, etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| VoxtralRealtimeGeneration | Voxtral Realtime | mistralai/Voxtral-Mini-4B-Realtime-2602 |
| Qwen3ASRRealtimeGeneration | Qwen3-ASR Realtime | Qwen/Qwen3-ASR-0.6B |
| 架构 | 模型 | 输入 | HuggingFace模型示例 |
|---|---|---|---|
| CLIPModel | CLIP | T / I | openai/clip-vit-base-patch32, openai/clip-vit-large-patch14, etc. |
| LlamaNemotronVLModel | Llama Nemotron Embedding + SigLIP | T + I | nvidia/llama-nemotron-embed-vl-1b-v2 |
| LlavaNextForConditionalGenerationC | LLaVA-NeXT-based | T / I | royokong/e5-v |
| Phi3VForCausalLMC | Phi-3-Vision-based | T + I | TIGER-Lab/VLM2Vec-Full |
| Qwen3VLForConditionalGenerationC | Qwen3-VL | T + I + V | Qwen/Qwen3-VL-Embedding-2B, etc. |
| SiglipModel | SigLIP, SigLIP2 | T / I | google/siglip-base-patch16-224, google/siglip2-base-patch16-224 |
| *ForConditionalGenerationC, *ForCausalLMC, etc. | Generative models | * | N/A |
| 架构 | 模型 | 输入 | HuggingFace模型示例 |
|---|---|---|---|
| Qwen2_5_VLForSequenceClassificationC | Qwen2_5_VL-based | T + IE+ + VE+ | muziyongshixin/Qwen2.5-VL-7B-for-VideoCls |
| *ForConditionalGenerationC, *ForCausalLMC, etc. | Generative models | * | N/A |
| 架构 | 模型 | 输入 | HuggingFace模型示例 |
|---|---|---|---|
| JinaVLForSequenceClassification | JinaVL-based | T + IE+ | jinaai/jina-reranker-m0, etc. |
| LlamaNemotronVLForSequenceClassification | Llama Nemotron Reranker + SigLIP | T + IE+ | nvidia/llama-nemotron-rerank-vl-1b-v2 |
| Qwen3VLForSequenceClassification | Qwen3-VL-Reranker | T + IE+ + VE+ | Qwen/Qwen3-VL-Reranker-2B(see note), etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 |
|---|---|---|---|
| Qwen3ASRForcedAlignerForTokenClassification | Qwen3-ForcedAligner | T + A+ | Qwen/Qwen3-ForcedAligner-0.6B (see note) |
说明:
- C表示该模型可通过
--convert转换为对应池化任务。 - *表示模型功能和原始模型一致。
- 模态说明:Text表示文本,Image表示图片,Video表示视频,Audio表示音频。
- +表示支持同时输入多种模态;/表示支持多种模态,但多种模态不可同时使用。
- E表示可为该模态输入预计算嵌入。
vLLM 0.17.1
以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| LongcatFlashForCausalLM | LongCat-Flash | meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8 |
| Zamba2ForCausalLM | Zamba2 | Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc. |
| MiniMaxText01ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01, etc. |
| MiniMaxM1ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-M1-40k,
MiniMaxAI/MiniMax-M1-80k, etc. |
| XverseForCausalLM | XVERSE | xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc. |
| TeleFLMForCausalLM | TeleFLM | CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc. |
| TeleChat2ForCausalLM | TeleChat2 | TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc. |
| TeleChatForCausalLM | TeleChat | chuhac/TeleChat2-35B, etc. |
| Step1ForCausalLM | Step-Audio | stepfun-ai/Step-Audio-EditX, etc. |
| Step3p5ForCausalLM | Step-3.5-flash | stepfun-ai/step-3.5-flash, etc. |
| SolarForCausalLM | Solar Pro | upstage/solar-pro-preview-instruct , etc. |
| Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b, bigcode/starcoder2-7b, bigcode/starcoder2-15b, etc. |
| StableLMEpochForCausalLM | StableLM Epoch | stabilityai/stablelm-zephyr-3b, etc. |
| StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t, stabilityai/stablelm-base-alpha-7b-v2, etc. |
| SeedOssForCausalLM | SeedOss | ByteDance-Seed/Seed-OSS-36B-Instruct, etc. |
| RWForCausalLM | Falcon RW | tiiuae/falcon-40b, etc. |
| QWenLMHeadModel | Qwen | Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc. |
| Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc. |
| Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc. |
| Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B, etc. |
| Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-MoE-15B-A2B, etc. |
| Qwen3NextForCausalLM | Qwen3NextMoE | Qwen/Qwen3-Next-80B-A3B-Instruct, etc. |
| Plamo3ForCausalLM | PLaMo3 | pfnet/plamo-3-nict-2b-base, pfnet/plamo-3-nict-8b-base, etc. |
| Plamo2ForCausalLM | PLaMo2 | pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base, adept/persimmon-8b-chat, etc. |
| PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct , etc. |
| PhiForCausalLM | Phi | microsoft/phi-1_5 , microsoft/phi-2 , etc. |
| Phi3ForCausalLM | Phi-4, Phi-3 | microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc. |
| PanguUltraMoEForCausalLM | openpangu-ultra-moe-718b-model | FreedomIntelligence/openPangu-Ultra-MoE-718B-V1.1 |
| PanguProMoEV2ForCausalLM | openpangu-pro-moe-v2 | - |
| PanguEmbeddedForCausalLM | openPangu-Embedded-7B | FreedomIntelligence/openPangu-Embedded-7B-V1.1 |
| OuroForCausalLM | ouro | OrionStarAI/Orion-14B-Base, OrionStarAI/Orion-14B-Chat, etc. |
| OrionForCausalLM | Orion | OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc. |
| OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b , facebook/opt-iml-max-30b , etc. |
| OlmoForCausalLM | OLMo | allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc. |
| OlmoeForCausalLM | OLMoE | allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc. |
| Olmo2ForCausalLM | OLMo2 | allenai/OLMo2-7B-1124 , etc. |
| Olmo3ForCausalLM | OLMo3 | TBA |
| NemotronHForCausalLM | Nemotron-H | nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc. |
| NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc. |
| MPTForCausalLM | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc. |
| MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc. |
| MistralLarge3ForCausalLM | Mistral-Large-3-675B-Base-2512, Mistral-Large-3-675B-Instruct-2512 | mistralai/Mistral-Large-3-675B-Base-2512, mistralai/Mistral-Large-3-675B-Instruct-2512, etc. |
| MistralForCausalLM | Ministral-3, Mistral, Mistral-Instruct | mistralai/Ministral-3-3B-Instruct-2512, mistralai/Mistral-7B-v0.1, mistralai/Mistral-7B-Instruct-v0.1, etc. |
| MiniMaxForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01-hf, etc. |
| MiniMaxM2ForCausalLM | MiniMax-M2, MiniMax-M2.1 | MiniMaxAI/MiniMax-M2, etc. |
| MiniCPM3ForCausalLM | MiniCPM3 | openbmb/MiniCPM3-4B , etc. |
| MiniCPMForCausalLM | MiniCPM | openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc. |
| MiMoV2FlashForCausalLM | MiMoV2Flash | XiaomiMiMo/MiMo-V2-Flash, etc. |
| MiMoForCausalLM | MiMo | XiaomiMiMo/MiMo-7B-RL, etc. |
| MambaForCausalLM | Mamba | state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc. |
| Mamba2ForCausalLM | Mamba2 | mistralai/Mamba-Codestral-7B-v0.1, etc. |
| LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc. |
| Lfm2MoeForCausalLM | LFM2MoE | LiquidAI/LFM2-8B-A1B-preview, etc. |
| Lfm2ForCausalLM | LFM2 | LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc. |
| KimiLinearForCausalLM | Kimi-Linear-48B-A3B-Base, Kimi-Linear-48B-A3B-Instruct | moonshotai/Kimi-Linear-48B-A3B-Base, moonshotai/Kimi-Linear-48B-A3B-Instruct |
| JambaForCausalLM | Jamba | ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc. |
| Jais2ForCausalLM | Jais2 | inceptionai/Jais-2-8B-Chat, inceptionai/Jais-2-70B-Chat, etc. |
| JAISLMHeadModel | Jais | inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc. |
| IQuestCoderForCausalLM | IQuestCoderV1 | IQuestLab/IQuest-Coder-V1-40B-Instruct, etc. |
| IQuestLoopCoderForCausalLM | IQuestLoopCoderV1 | IQuestLab/IQuest-Coder-V1-40B-Loop-Instruct, etc. |
| InternLM3ForCausalLM | InternLM3 | internlm/internlm3-8b-instruct , etc. |
| InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b , internlm/internlm2-chat-7b , etc. |
| InternLMForCausalLM | InternLM | internlm/internlm-7b, internlm/internlm-chat-7b, etc. |
| HunYuanMoEV1ForCausalLM | Hunyuan-A13B | tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc. |
| HunYuanDenseV1ForCausalLM | Hunyuan Dense | tencent/Hunyuan-7B-Instruct-0124 |
| Grok1ForCausalLM | Grok2 | xai-org/grok-2 |
| Grok1ModelForCausalLM | Grok1 | hpcai-tech/grok-1. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm . |
| GraniteMoeSharedForCausalLM | Granite MoE Shared | ibm-research/moe-7b-1b-active-shared-experts (test model) |
| GraniteMoeHybridForCausalLM | Granite 4.0 MoE Hybrid | ibm-granite/granite-4.0-tiny-preview, etc. |
| GraniteMoeForCausalLM | Granite 3.0 MoE, PowerMoE | ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc. |
| GraniteForCausalLM | Granite 3.0, Granite 3.1, PowerLM | ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc. |
| GptOssForCausalLM | GPT-OSS | openai/gpt-oss-120b, openai/gpt-oss-20b |
| GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc. |
| GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc. |
| GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc. |
| GPT2LMHeadModel | GPT-2 | gpt2 , gpt2-xl , etc. |
| Glm4MoeLiteForCausalLM | GLM-4.7-Flash | zai-org/GLM-4.7-Flash, etc. |
| Glm4MoeForCausalLM | GLM-4.5, GLM-4.6, GLM-4.7 | zai-org/GLM-4.5, etc. |
| Glm4ForCausalLM | GLM-4-0414 | THUDM/GLM-4-32B-0414, etc. |
| GlmForCausalLM | GLM-4 | THUDM/glm-4-9b-chat-hf , etc. |
| Gemma3nForCausalLM | Gemma 3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| Gemma3ForCausalLM | Gemma 3 | google/gemma-3-1b-it, etc. |
| Gemma2ForCausalLM | Gemma 2 | google/gemma-2-9b, google/gemma-2-27b, etc. |
| GemmaForCausalLM | Gemma | google/gemma-2b , google/gemma-7b , etc. |
| FlexOlmoForCausalLM | FlexOlmo | allenai/FlexOlmo-7x7B-1T, allenai/FlexOlmo-7x7B-1T-RT, etc. |
| FalconH1ForCausalLM | Falcon-H1 | tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc. |
| FalconMambaForCausalLM | FalconMamba | tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc. |
| FalconForCausalLM | Falcon | tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc. |
| Fairseq2LlamaForCausalLM | Llama (fairseq2 format) | mgleize/fairseq2-dummy-Llama-3.2-1B, etc. |
| Exaone4ForCausalLM | EXAONE-4 | LGAI-EXAONE/EXAONE-4.0-32B, etc. |
| ExaoneMoEForCausalLM | K-EXAONE | LGAI-EXAONE/K-EXAONE-236B-A23B, etc. |
| ExaoneForCausalLM | EXAONE-3 | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc. |
| Ernie4_5_MoeForCausalLM | Ernie4.5MoE | baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc. |
| Ernie4_5ForCausalLM | Ernie4.5 | baidu/ERNIE-4.5-0.3B-PT, etc. |
| DotsOCRForCausalLM | dots_ocr | rednote-hilab/dots.ocr |
| Dots1ForCausalLM | dots.llm1 | rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc. |
| DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc. |
| DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc. |
| DeepseekForCausalLM | DeepSeek | deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc. |
| DeciLMForCausalLM | DeciLM | Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc. |
| DbrxForCausalLM | DBRX | databricks/dbrx-base , databricks/dbrx-instruct , etc. |
| CwmForCausalLM | CWM | facebook/cwm, etc. |
| CohereForCausalLM , Cohere2ForCausalLM | Command-R, Command-A | CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc. |
| ChatGLMModel, ChatGLMForConditionalGeneration | ChatGLM | THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc. |
| BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom , bigscience/bloomz , etc. |
| BambaForCausalLM | Bamba | ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B |
| BailingMoeV2ForCausalLM | Ling | inclusionAI/Ling-mini-2.0, etc. |
| BailingMoeForCausalLM | Ling | inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc. |
| BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc. |
| AXK1ForCausalLM | A.X-K1 | skt/A.X-K1, etc. |
| ArcticForCausalLM | Arctic | Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc. |
| ArceeForCausalLM | Arcee (AFM) | arcee-ai/AFM-4.5B-Base, etc. |
| AquilaForCausalLM | Aquila, Aquila2 | BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc. |
| ApertusForCausalLM | Apertus | swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc. |
| AfmoeForCausalLM | Afmoe | TBA |
| BailingMoeV2_5ForCausalLM | Ling-V2.5 / Ring-V2.5 | inclusionAI/Ling-mini-2.5, inclusionAI/Ring-mini-2.5, etc. |
| SmolLM3ForCausalLM | SmolLM3 | HuggingFaceTB/SmolLM3-3B, etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertModelC | BERT-based | BAAI/bge-base-en-v1.5 , etc. |
| BertSpladeSparseEmbeddingModel | SPLADE | naver/splade-v3 |
| Gemma2ModelC | Gemma 2-based | BAAI/bge-multilingual-gemma2 , etc. |
| Gemma3TextModelC | Gemma 3-based | google/embeddinggemma-300m, etc. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| GteModelC | Arctic-Embed-2.0-M | Snowflake/snowflake-arctic-embed-m-v2.0. |
| GteNewModelC | mGTE-TRM | Alibaba-NLP/gte-multilingual-base, etc. |
| ModernBertModelC | ModernBERT-based | Alibaba-NLP/gte-modernbert-base, etc. |
| NomicBertModelC | Nomic BERT | nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc. |
| LlamaBidirectionalModelC | Llama-based with bidirectional attention | nvidia/llama-nemotron-embed-1b-v2, etc. |
| LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. | Llama-based | intfloat/e5-mistral-7b-instruct , etc. |
| Qwen2ModelC, Qwen2ForCausalLMC | Qwen2-based | ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc. |
| Qwen3ModelC, Qwen3ForCausalLMC | Qwen3-based | Qwen/Qwen3-Embedding-0.6B, etc. |
| RobertaModel , RobertaForMaskedLM | RoBERTa-based | sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc. |
| VoyageQwen3BidirectionalEmbedModelC | Voyage Qwen3-based with bidirectional attention | voyageai/voyage-4-nano, etc. |
| *ModelC, *ForCausalLMCC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert embed转换为嵌入模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc. |
| LlamaForCausalLMC | Llama-based | peiyi9979/math-shepherd-mistral-7b-prm , etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B , etc. |
| Qwen2ForProcessRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc. |
说明:
- C表示该模型可通过
--convert reward转换为奖励模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| JambaForSequenceClassification | Jamba | ai21labs/Jamba-tiny-reward-dev , etc. |
| GPT2ForSequenceClassification | GPT2 | nie3e/sentiment-polish-gpt2-small |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert classify转换为分类模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForSequenceClassification | BERT-based | cross-encoder/ms-marco-MiniLM-L-6-v2 , etc. |
| GemmaForSequenceClassification | Gemma-based | BAAI/bge-reranker-v2-gemma, etc. |
| GteNewForSequenceClassification | mGTE-TRM | Alibaba-NLP/gte-multilingual-reranker-base, etc. |
| LlamaBidirectionalForSequenceClassificationC | Llama-based with bidirectional attention | nvidia/llama-nemotron-rerank-1b-v2, etc. |
| Qwen2ForSequenceClassificationC | Qwen2-based | mixedbread-ai/mxbai-rerank-base-v2, etc. |
| Qwen3ForSequenceClassificationC | Qwen3-based | tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc. |
| RobertaForSequenceClassification | RoBERTa-based | cross-encoder/quora-roberta-base , etc. |
| XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3 , etc. |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert classify转换为分类模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForTokenClassification | bert-based | boltuix/NeuroBERT-NER, etc. |
| ModernBertForTokenClassification | ModernBERT-based | disham993/electrical-ner-ModernBERT-base |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
|
| AudioFlamingo3ForConditionalGeneration | AudioFlamingo3 | T + A+ | nvidia/audio-flamingo-3-hf, nvidia/music-flamingo-hf | |
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc. | |
| BagelForConditionalGeneration | BAGEL | T + I+ | ByteDance-Seed/BAGEL-7B-MoT | |
| BeeForConditionalGeneration | Bee-8B | T + IE+ | ||
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. | |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b etc. | |
| Cohere2VisionForConditionalGeneration | Command A Vision | T + I+ | CohereLabs/command-a-vision-07-2025, etc. | |
| DeepseekVLV2ForCausalLM | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc. | |
| DeepseekOCRForCausalLM | DeepSeek-OCR | T + I+ | deepseek-ai/DeepSeek-OCR, etc. | |
| DeepseekOCR2ForCausalLM | DeepSeek-OCR-2 | T + I+ | deepseek-ai/DeepSeek-OCR-2, etc. | |
| Eagle2_5_VLForConditionalGeneration | Eagle2.5-VL | T + IE+ | nvidia/Eagle2.5-8B, etc. | |
| Ernie4_5_VLMoeForConditionalGeneration | Ernie4.5-VL | T + I+/ V+ | baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT | |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b etc. | |
| Gemma3ForConditionalGeneration | Gemma 3 | T + I+ | google/gemma-3-4b-it, google/gemma-3-27b-it, etc. | |
| Gemma3nForConditionalGeneration | Gemma 3n | T + I + A | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. | |
| GLM4VForCausalLM^ | GLM-4V | T + I | zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc. | |
| Glm4vForConditionalGeneration | GLM-4.1V-Thinking | T + IE+ + VE+ | zai-org/GLM-4.1V-9B-Thinking, etc. | |
| Glm4vMoeForConditionalGeneration | GLM-4.5V | T + IE+ + VE+ | zai-org/GLM-4.5V, etc. | |
| GlmOcrForConditionalGeneration | GLM-OCR | T + IE+ | zai-org/GLM-OCR, etc. | |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b | |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. | |
| HCXVisionForCausalLM | HyperCLOVAX-SEED-Vision-Instruct-3B | T + I+ + V+ | naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B | |
| HunYuanVLForConditionalGeneration | HunyuanOCR | T + IE+ | tencent/HunyuanOCR, etc. | |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3 etc. | |
| InternS1ForConditionalGeneration | Intern-S1 | T + IE+ + VE+ | internlm/Intern-S1, etc. | |
| InternS1ProForConditionalGeneration | Intern-S1-Pro | T + IE+ + VE+ | internlm/Intern-S1-Pro, etc. | |
| InternVLChatModel | InternVL 3.5, InternVL 3.0, InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE++ (VE+) | OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. | |
| InternVLForConditionalGeneration | InternVL 3.0 (HF format) | T + IE+ + VE+ | OpenGVLab/InternVL3-1B-hf, etc. | |
| IsaacForConditionalGeneration | Isaac | T + I+ | PerceptronAI/Isaac-0.1 | |
| KananaVForConditionalGeneration | Kanana-V | T + I+ | kakaocorp/kanana-1.5-v-3b-instruct, etc. | |
| KeyeForConditionalGeneration | Keye-VL-8B-Preview | T + IE+ + VE+ | Kwai-Keye/Keye-VL-8B-Preview | |
| KeyeVL1_5ForConditionalGeneration | Keye-VL-1_5-8B | T + IE+ + VE+ | Kwai-Keye/Keye-VL-1_5-8B | |
| KimiK25ForConditionalGeneration | Kimi-K2.5 | T + I+ | moonshotai/Kimi-K2.5 | |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking | |
| LightOnOCRForConditionalGeneration | LightOnOCR-1B | T + I+ | lightonai/LightOnOCR-1B, etc | |
| Lfm2VlForConditionalGeneration | LFM2-VL | T + I+ | LiquidAI/LFM2-VL-450M, LiquidAI/LFM2-VL-3B, LiquidAI/LFM2-VL-8B-A1B, etc. | |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. | |
| Llama_Nemotron_Nano_VL | Llama Nemotron Nano VL | T + IE+ | nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 | |
| LlavaForConditionalGeneration | LLaVA-1.5, Pixtral (HF Transformers) | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc. | |
| LlavaNextForConditionalGeneration | LLaVA-NeXT | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. | |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. | |
| MiDashengLMModel | MiDashengLM | T + A+ | mispeech/midashenglm-7b | |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. | |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc. | |
| MiniMaxVL01ForConditionalGeneration | MiniMax-VL | T + IE+ | MiniMaxAI/MiniMax-VL-01, etc. | |
| Mistral3ForConditionalGeneration | Mistral3 (HF Transformers) | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. | |
| MolmoForCausalLM | Molmo | T + I+ | allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc. | |
| Molmo2ForConditionalGeneration | Molmo2 | T + I+/ V | allenai/Molmo2-4B, allenai/Molmo2-8B, allenai/Molmo2-O-7B | |
| NVLM_D_Model | NVLM-D 1.0 | T + IE+ | nvidia/NVLM-D-72B, etc. | |
| OpenCUAForConditionalGeneration | OpenCUA-7B | T + IE+ | xlangai/OpenCUA-7B | |
| OpenPanguVLForConditionalGeneration | openpangu-VL | T + IE+ + VE+ | FreedomIntelligence/openPangu-VL-7B | |
| Ovis | Ovis2, Ovis1.6 | T + I+ | AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc. | |
| Ovis2_5 | Ovis2.5 | T + I+ + V | AIDC-AI/Ovis2.5-9B, etc. | |
| Ovis2_6ForCausalLM | Ovis2.6 | T + I+ + V | AIDC-AI/Ovis2.6-2B, etc. | |
| Ovis2_6_MoeForCausalLM | Ovis2.6 | T + I+ + V | AIDC-AI/Ovis2.6-30B-A3B, etc. | |
| PaddleOCRVLForConditionalGeneration | Paddle-OCR | T + I+ | PaddlePaddle/PaddleOCR-VL, etc. | |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. | |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. | |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+/ I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. | |
| PixtralForConditionalGeneration | Ministral 3 (Mistral format), Mistral 3 (Mistral format), Mistral Large 3 (Mistral format), Pixtral (Mistral format) | T + I+ | mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc. | |
| QwenVLForConditionalGeneration | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct | |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. | |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-7B | |
| Qwen3VLForConditionalGeneration | Qwen3-VL | T + IE+ + VE+ | Qwen/Qwen3-VL-4B-Instruct, etc. | |
| Qwen3VLMoeForConditionalGeneration | Qwen3-VL-MOE | T + IE+ + VE+ | Qwen/Qwen3-VL-30B-A3B-Instruct, etc. | |
| Qwen3OmniMoeThinkerForConditionalGeneration | Qwen3-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen3-Omni-30B-A3B-Instruct, Qwen/Qwen3-Omni-30B-A3B-Thinking | |
| Qwen3_5ForConditionalGeneration | Qwen3.5 | T + IE+ + VE+ | Qwen/Qwen3.5-9B-Instruct, etc. | |
| Qwen3_5MoeForConditionalGeneration | Qwen3.5-MOE | T + IE+ + VE+ | Qwen/Qwen3.5-35B-A3B-Instruct, etc. | |
| RForConditionalGeneration | R-VL-4B | T + IE+ | YannQi/R-4B | |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B | |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct | |
| Step3VLForConditionalGeneration | Step3-VL | T + I+ | stepfun-ai/step3 | |
| StepVLForConditionalGeneration | Step3-VL-10B | T + I+ | stepfun-ai/Step3-VL-10B | |
| TarsierForConditionalGeneration | Tarsier | T + IE+ | omni-search/Tarsier-7b, omni-search/Tarsier-34b | |
| Tarsier2ForConditionalGeneration^ | Tarsier2 | T + IE+ + VE+ | omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115 | |
| UltravoxModel | Ultravox | T + AE+ | fixie-ai/ultravox-v0_5-llama-3_2-1b | |
| Emu3ForConditionalGeneration | Emu3 | T + I+ | BAAI/Emu3-Chat |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| FireRedASR2ForConditionalGeneration | FireRedASR2 | allendou/FireRedASR2-LLM-vllm, etc. |
| FunASRForConditionalGeneration | FunASR | allendou/Fun-ASR-Nano-2512-vllm, etc. |
| Gemma3nForConditionalGeneration | Gemma3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| GlmAsrForConditionalGeneration | GLM-ASR | zai-org/GLM-ASR-Nano-2512 |
| GraniteSpeechForConditionalGeneration | Granite Speech | ibm-granite/granite-speech-3.3-2b, ibm-granite/granite-speech-3.3-8b, etc. |
| Qwen3ASRForConditionalGeneration | Qwen3-ASR | Qwen/Qwen3-ASR-1.7B, etc. |
| Qwen3OmniMoeThinkerForConditionalGeneration | Qwen3-Omni | Qwen/Qwen3-Omni-30B-A3B-Instruct, etc. |
| VoxtralForConditionalGeneration | Voxtral (Mistral format) | mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc. |
| WhisperForConditionalGeneration | Whisper | openai/whisper-small, openai/whisper-large-v3-turbo, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| CLIPModel | CLIP | T / I | openai/clip-vit-base-patch32, openai/clip-vit-large-patch14, etc. |
|
| ColModernVBertForRetrieval | ColModernVBERT | T / I | ModernVBERT/colmodernvbert-merged | |
| LlamaNemotronVLModel | Llama Nemotron Embedding + SigLIP | T + I | nvidia/llama-nemotron-embed-vl-1b-v2 | |
| LlavaNextForConditionalGenerationC | LLaVA-NeXT-based | T / I | royokong/e5-v | |
| Phi3VForCausalLMC | Phi-3-Vision-based | T + I | TIGER-Lab/VLM2Vec-Full | |
| Qwen3VLForConditionalGenerationC | Qwen3-VL | T + I + V | Qwen/Qwen3-VL-Embedding-2B, etc. | |
| SiglipModel | SigLIP, SigLIP2 | T / I | google/siglip-base-patch16-224, google/siglip2-base-patch16-224 | |
| *ForConditionalGenerationC, *ForCausalLMC, etc. | Generative models | / | N/A |
说明:
- C表示该模型可通过
--convert embed转换为嵌入模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| JinaVLForSequenceClassification | JinaVL-based | T + IE+ | jinaai/jina-reranker-m0,
etc. |
|
| LlamaNemotronVLForSequenceClassification | Llama Nemotron Reranker + SigLIP | T + IE+ | nvidia/llama-nemotron-rerank-vl-1b-v2 | |
| Qwen3VLForSequenceClassification | Qwen3-VL-Reranker | T + IE+ + VE+ | Qwen/Qwen3-VL-Reranker-2B, etc. |
vLLM 0.11.0
以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| Zamba2ForCausalLM | Zamba2 | Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc. |
| LongcatFlashForCausalLM | LongCat-Flash | meituan-longcat/LongCat-Flash-Chat, meituan-longcat/LongCat-Flash-Chat-FP8 |
| MiniMaxText01ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01, etc. |
| MiniMaxM1ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-M1-40k,
MiniMaxAI/MiniMax-M1-80k, etc. |
| XverseForCausalLM | XVERSE | xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc. |
| TeleFLMForCausalLM | TeleFLM | CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc. |
| TeleChat2ForCausalLM | TeleChat2 | TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc. |
| Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc. |
| StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc. |
| SolarForCausalLM | Solar Pro | upstage/solar-pro-preview-instruct , etc. |
| SeedOssForCausalLM | SeedOss | ByteDance-Seed/Seed-OSS-36B-Instruct, etc. |
| QWenLMHeadModel | Qwen | Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc. |
| Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc. |
| Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc. |
| Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B, etc. |
| Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-MoE-15B-A2B, etc. |
| Qwen3NextForCausalLM | Qwen3NextMoE | Qwen/Qwen3-Next-80B-A3B-Instruct, etc. |
| Plamo2ForCausalLM | PLaMo2 | pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base, adept/persimmon-8b-chat, etc. |
| Phi4FlashForCausalLM | Phi-4-mini-flash-reasoning | microsoft/microsoft/Phi-4-mini-instruct, etc. |
| PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct , etc. |
| PhiForCausalLM | Phi | microsoft/phi-1_5 , microsoft/phi-2 , etc. |
| Phi3SmallForCausalLM | Phi-3-Small | microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc. |
| Phi3ForCausalLM | Phi-4, Phi-3 | microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base , adept/persimmon-8b-chat , etc. |
| OrionForCausalLM | Orion | OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc. |
| OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b , facebook/opt-iml-max-30b , etc. |
| OlmoForCausalLM | OLMo | allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc. |
| OlmoeForCausalLM | OLMoE | allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc. |
| Olmo2ForCausalLM | OLMo2 | allenai/OLMo2-7B-1124 , etc. |
| Olmo3ForCausalLM | OLMo3 | TBA |
| NemotronHForCausalLM | Nemotron-H | nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc. |
| NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc. |
| MPTForCausalLM | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc. |
| MotifForCausalLM | Motif-1-Tiny | Motif-Technologies/Motif-2.6B,
Motif-Technologies/Motif-2.6b-v1.1-LC,
etc. |
| MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc. |
| MistralForCausalLM | Mistral, Mistral-Instruct | mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc. |
| MiniCPM3ForCausalLM | MiniCPM3 | openbmb/MiniCPM3-4B , etc. |
| MiniCPMForCausalLM | MiniCPM | openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc. |
| MambaForCausalLM | Mamba | state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc. |
| Mamba2ForCausalLM | Mamba2 | mistralai/Mamba-Codestral-7B-v0.1, etc. |
| MiMoForCausalLM | MiMo | XiaomiMiMo/MiMo-7B-RL, etc. |
| LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc. |
| Lfm2ForCausalLM | LFM2 | LiquidAI/LFM2-1.2B, LiquidAI/LFM2-700M, LiquidAI/LFM2-350M, etc. |
| JambaForCausalLM | Jamba | ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc. |
| JAISLMHeadModel | Jais | inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc. |
| InternLM3ForCausalLM | InternLM3 | internlm/internlm3-8b-instruct , etc. |
| InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b , internlm/internlm2-chat-7b , etc. |
| InternLMForCausalLM | InternLM | internlm/internlm-7b, internlm/internlm-chat-7b, etc. |
| HCXVisionForCausalLM | HyperCLOVAX-SEED-Vision-Instruct-3B | naver-hyperclovax/HyperCLOVAX-SEED-Vision-Instruct-3B |
| HunYuanMoEV1ForCausalLM | Hunyuan-80B-A13B | tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8, etc. |
| HunYuanDenseV1ForCausalLM | Hunyuan-7B-Instruct-0124 | tencent/Hunyuan-7B-Instruct-0124 |
| Grok1ModelForCausalLM | Grok1 | hpcai-tech/grok-1. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm . |
| GraniteMoeSharedForCausalLM | Granite MoE Shared | ibm-research/moe-7b-1b-active-shared-experts (test model) |
| GraniteMoeHybridForCausalLM | Granite 4.0 MoE Hybrid | ibm-granite/granite-4.0-tiny-preview, etc. |
| GraniteMoeForCausalLM | Granite 3.0 MoE, PowerMoE | ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc. |
| GraniteForCausalLM | Granite 3.0, Granite 3.1, PowerLM | ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc. |
| GptOssForCausalLM | GPT-OSS | openai/gpt-oss-120b, openai/gpt-oss-20b |
| GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc. |
| GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc. |
| GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc. |
| GPT2LMHeadModel | GPT-2 | gpt2 , gpt2-xl , etc. |
| Glm4MoeForCausalLM | GLM-4.5 | zai-org/GLM-4.5, etc. |
| Glm4ForCausalLM | GLM-4-0414 | THUDM/GLM-4-32B-0414, etc. |
| GlmForCausalLM | GLM-4 | THUDM/glm-4-9b-chat-hf , etc. |
| Gemma3nForCausalLM | Gemma 3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| Gemma3ForCausalLM | Gemma 3 | google/gemma-3-1b-it, etc. |
| Gemma2ForCausalLM | Gemma 2 | google/gemma-2-9b, google/gemma-2-27b, etc. |
| GemmaForCausalLM | Gemma | google/gemma-2b , google/gemma-7b , etc. |
| FalconH1ForCausalLM | Falcon-H1 | tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc. |
| FalconMambaForCausalLM | FalconMamba | tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc. |
| FalconForCausalLM | Falcon | tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc. |
| Fairseq2LlamaForCausalLM | Llama (fairseq2 format) | mgleize/fairseq2-dummy-Llama-3.2-1B, etc. |
| Exaone4ForCausalLM | EXAONE-4 | LGAI-EXAONE/EXAONE-4.0-32B, etc. |
| ExaoneForCausalLM | EXAONE-3 | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc. |
| Ernie4_5_MoeForCausalLM | Ernie4.5MoE | baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc. |
| Ernie4_5ForCausalLM | Ernie4.5 | baidu/ERNIE-4.5-0.3B-PT, etc. |
| DotsOCRForCausalLM | dots_ocr | rednote-hilab/dots.ocr |
| Dots1ForCausalLM | dots.llm1 | rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst, etc. |
| DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc. |
| DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc. |
| DeepseekForCausalLM | DeepSeek | deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc. |
| DeciLMForCausalLM | DeciLM | Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc. |
| DbrxForCausalLM | DBRX | databricks/dbrx-base , databricks/dbrx-instruct , etc. |
| CohereForCausalLM , Cohere2ForCausalLM | Command-R | CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc. |
| ChatGLMModel, ChatGLMForConditionalGeneration | ChatGLM | THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc. |
| BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom , bigscience/bloomz , etc. |
| BambaForCausalLM | Bamba | ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B |
| BailingMoeV2ForCausalLM | Ling | inclusionAI/Ling-mini-2.0, etc. |
| BailingMoeForCausalLM | Ling | inclusionAI/Ling-lite-1.5, inclusionAI/Ling-plus, etc. |
| BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc. |
| ArcticForCausalLM | Arctic | Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc. |
| ArceeForCausalLM | Arcee (AFM) | arcee-ai/AFM-4.5B-Base, etc. |
| AquilaForCausalLM | Aquila, Aquila2 | BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc. |
| ApertusForCausalLM | Apertus | swiss-ai/Apertus-8B-2509, swiss-ai/Apertus-70B-Instruct-2509, etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertModelC | BERT-based | BAAI/bge-base-en-v1.5 , etc. |
| Gemma2ModelC | Gemma2-based | BAAI/bge-multilingual-gemma2 , etc. |
| Gemma3TextModelC | Gemma 3-based | google/embeddinggemma-300m, etc. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| GteModelC | Arctic-Embed-2.0-M | Snowflake/snowflake-arctic-embed-m-v2.0. |
| GteNewModelC | mGTE-TRM | Alibaba-NLP/gte-multilingual-base, etc. |
| ModernBertModelC | ModernBERT-based | Alibaba-NLP/gte-modernbert-base, etc. |
| NomicBertModelC | Nomic BERT | nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc. |
| LlamaModelC, LlamaForCausalLMC, MistralModelC, etc. | Llama-based | intfloat/e5-mistral-7b-instruct , etc. |
| Qwen2ModelC, Qwen2ForCausalLMC | Qwen2-based | ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc. |
| Qwen3ModelC, Qwen3ForCausalLMC | Qwen3-based | Qwen/Qwen3-Embedding-0.6B, etc. |
| RobertaModel , RobertaForMaskedLM | RoBERTa-based | sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc. |
| *ModelC, *ForCausalLMCC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert embed转换为嵌入模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc. |
| LlamaForCausalLM | Llama-based | peiyi9979/math-shepherd-mistral-7b-prm , etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B , etc. |
| Qwen2ForProcessRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc. |
| *ModelCC, *ForCausalLMCC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert reward转换为奖励模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| JambaForSequenceClassification | Jamba | ai21labs/Jamba-tiny-reward-dev , etc. |
| GPT2ForSequenceClassification | GPT2 | nie3e/sentiment-polish-gpt2-small |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert classify转换为分类模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForSequenceClassification | BERT-based | cross-encoder/ms-marco-MiniLM-L-6-v2 , etc. |
| GemmaForSequenceClassification | Gemma-based | BAAI/bge-reranker-v2-gemma, etc. |
| GteNewForSequenceClassification | mGTE-TRM | Alibaba-NLP/gte-multilingual-reranker-base, etc. |
| Qwen2ForSequenceClassification | Qwen2-based | mixedbread-ai/mxbai-rerank-base-v2, etc. |
| Qwen3ForSequenceClassification | Qwen3-based | tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc. |
| RobertaForSequenceClassification | RoBERTa-based | cross-encoder/quora-roberta-base , etc. |
| XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3 , etc. |
| *ModelC, *ForCausalLMC, etc. | Generative models | N/A |
说明:
- C表示该模型可通过
--convert classify转换为分类模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForTokenClassification | bert-based | boltuix/NeuroBERT-NER, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
|
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc. | |
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. | |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b etc. | |
| Cohere2VisionForConditionalGeneration | Command A Vision | T + I+ | CohereLabs/command-a-vision-07-2025, etc. | |
| DeepseekVLV2ForCausalLM | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc. | |
| Ernie4_5_VLMoeForConditionalGeneration | Ernie4.5-VL | T + I+/ V+ | baidu/ERNIE-4.5-VL-28B-A3B-PT, baidu/ERNIE-4.5-VL-424B-A47B-PT | |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b etc. | |
| Gemma3ForConditionalGeneration | Gemma 3 | T + I+ | google/gemma-3-4b-it, google/gemma-3-27b-it, etc. | |
| Gemma3nForConditionalGeneration | Gemma 3n | T + I + A | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. | |
| GLM4VForCausalLM^ | GLM-4V | T + I | zai-org/glm-4v-9b, zai-org/cogagent-9b-20241220, etc. | |
| Glm4vForConditionalGeneration | GLM-4.1V-Thinking | T + IE+ + VE+ | zai-org/GLM-4.1V-9B-Thinking, etc. | |
| Glm4vMoeForConditionalGeneration | GLM-4.5V | T + IE+ + VE+ | zai-org/GLM-4.5V, etc. | |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b | |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. | |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3 etc. | |
| InternS1ForConditionalGeneration | Intern-S1 | T + IE+ + VE+ | internlm/Intern-S1, etc. | |
| InternVLChatModel | InternVL 3.5, InternVL 3.0, InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE++ (VE+) | OpenGVLab/InternVL3_5-14B, OpenGVLab/InternVL3-9B, OpenGVLab/InternVideo2_5_Chat_8B, OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. | |
| InternVLForConditionalGeneration | InternVL 3.0 (HF format) | T + IE+ + VE+ | OpenGVLab/InternVL3-1B-hf, etc. | |
| KeyeForConditionalGeneration | Keye-VL-8B-Preview | T + IE+ + VE+ | Kwai-Keye/Keye-VL-8B-Preview | |
| KeyeVL1_5ForConditionalGeneration | Keye-VL-1_5-8B | T + IE+ + VE+ | Kwai-Keye/Keye-VL-1_5-8B | |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking | |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. | |
| Llama_Nemotron_Nano_VL | Llama Nemotron Nano VL | T + IE+ | nvidia/Llama-3.1-Nemotron-Nano-VL-8B-V1 | |
| LlavaForConditionalGeneration | LLaVA-1.5 | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc. | |
| LlavaNextForConditionalGeneration | LLaVA-NeXT | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. | |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. | |
| MiDashengLMModel | MiDashengLM | T + A+ | mispeech/midashenglm-7b | |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. | |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc. | |
| MiniMaxVL01ForConditionalGeneration | MiniMax-VL | T + IE+ | MiniMaxAI/MiniMax-VL-01, etc. | |
| Mistral3ForConditionalGeneration | Mistral3 (HF Transformers) | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. | |
| MolmoForCausalLM | Molmo | T + I | allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc. | |
| NVLM_D_Model | NVLM-D 1.0 | T + IE+ | nvidia/NVLM-D-72B, etc. | |
| Ovis | Ovis2, Ovis1.6 | T + I+ | AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc. | |
| Ovis2_5 | Ovis2.5 | T + I+ + V | AIDC-AI/Ovis2.5-9B, etc. | |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. | |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. | |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+/ I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. | |
| PixtralForConditionalGeneration | Pixtral | T + I+ | mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc. | |
| QwenVLForConditionalGeneration | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct | |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. | |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-7B | |
| Qwen3VLForConditionalGeneration | Qwen3-VL | T + IE+ + VE+ | Qwen/Qwen3-VL-4B-Instruct, etc. | |
| Qwen3VLMoeForConditionalGeneration | Qwen3-VL-MOE | T + IE+ + VE+ | Qwen/Qwen3-VL-30B-A3B-Instruct, etc. | |
| RForConditionalGeneration | R-VL-4B | T + IE+ | YannQi/R-4B | |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B | |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct | |
| Step3VLForConditionalGeneration | Step3-VL | T + I+ | stepfun-ai/step3 | |
| TarsierForConditionalGeneration | Tarsier | T + IE+ | omni-search/Tarsier-7b, omni-search/Tarsier-34b | |
| Tarsier2ForConditionalGeneration^ | Tarsier2 | T + IE+ + VE+ | omni-research/Tarsier2-Recap-7b, omni-research/Tarsier2-7b-0115 |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| WhisperForConditionalGeneration | Whisper | openai/whisper-small, openai/whisper-large-v3-turbo, etc. |
| VoxtralForConditionalGeneration | Voxtral (Mistral format) | mistralai/Voxtral-Mini-3B-2507, mistralai/Voxtral-Small-24B-2507, etc. |
| Gemma3nForConditionalGeneration | Gemma3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| LlavaNextForConditionalGenerationC | LLaVA-NeXT-based | T / I | royokong/e5-v |
|
| Phi3VForCausalLMC | Phi-3-Vision-based | T + I | TIGER-Lab/VLM2Vec-Full | |
| *ForConditionalGenerationC, *ForCausalLMC, etc. | Generative models | / | N/A |
说明:
- C表示该模型可通过
--convert embed转换为嵌入模型。 - *表示模型功能和原始模型一致。
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| JinaVLForSequenceClassification | JinaVL-based | T + IE+ | jinaai/jina-reranker-m0,
etc. |
|
vLLM 0.9.2
以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| Zamba2ForCausalLM | Zamba2 | Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc. |
| MiniMaxText01ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01, etc. |
| MiniMaxM1ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-M1-40k, MiniMaxAI/MiniMax-M1-80ketc. |
| XverseForCausalLM | XVERSE | xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc. |
| TeleFLMForCausalLM | TeleFLM | CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc. |
| TeleChat2ForCausalLM | TeleChat2 | TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc. |
| Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc. |
| StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc. |
| SolarForCausalLM | Solar Pro | upstage/solar-pro-preview-instruct , etc. |
| QWenLMHeadModel | Qwen | Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc. |
| Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc. |
| Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc. |
| Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B, etc. |
| Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-MoE-15B-A2B, etc. |
| Plamo2ForCausalLM | PLaMo2 | pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base, adept/persimmon-8b-chat, etc. |
| PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct , etc. |
| PhiForCausalLM | Phi | microsoft/phi-1_5 , microsoft/phi-2 , etc. |
| Phi3SmallForCausalLM | Phi-3-Small | microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc. |
| Phi3ForCausalLM | Phi-4, Phi-3 | microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc. |
| OrionForCausalLM | Orion | OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc. |
| OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b , facebook/opt-iml-max-30b , etc. |
| OlmoForCausalLM | OLMo | allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc. |
| OlmoeForCausalLM | OLMoE | allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc. |
| Olmo2ForCausalLM | OLMo2 | allenai/OLMo2-7B-1124 , etc. |
| NemotronHForCausalLM | Nemotron-H | nvidia/Nemotron-H-8B-Base-8K, nvidia/Nemotron-H-47B-Base-8K, nvidia/Nemotron-H-56B-Base-8K, etc. |
| NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc. |
| MPTForCausalLM | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc. |
| MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc. |
| MistralForCausalLM | Mistral, Mistral-Instruct | mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc. |
| MiniCPM3ForCausalLM | MiniCPM3 | openbmb/MiniCPM3-4B , etc. |
| MiniCPMForCausalLM | MiniCPM | openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc. |
| Mamba2ForCausalLM | Mamba2 | mistralai/Mamba-Codestral-7B-v0.1, etc. |
| MambaForCausalLM | Mamba | state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc. |
| LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc. |
| JambaForCausalLM | Jamba | ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc. |
| JAISLMHeadModel | Jais | inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc. |
| InternLMForCausalLM | InternLM | internlm/internlm-7b , internlm/internlm-chat-7b , etc. |
| InternLM3ForCausalLM | InternLM3 | internlm/internlm3-8b-instruct , etc. |
| InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b , internlm/internlm2-chat-7b , etc. |
| HunYuanMoEV1ForCausalLM | Hunyuan-80B-A13B | tencent/Hunyuan-A13B-Instruct, tencent/Hunyuan-A13B-Pretrain, tencent/Hunyuan-A13B-Instruct-FP8etc. |
| Grok1ModelForCausalLM | Grok1 | hpcai-tech/grok-1. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm . |
| GraniteMoeSharedForCausalLM | Granite MoE Shared | ibm-research/moe-7b-1b-active-shared-experts (test model) |
| GraniteMoeHybridForCausalLM | Granite 4.0 MoE Hybrid | ibm-granite/granite-4.0-tiny-preview, etc. |
| GraniteMoeForCausalLM | Granite 3.0 MoE, PowerMoE | ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc. |
| GraniteForCausalLM | Granite 3.0, Granite 3.1, PowerLM | ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc. |
| GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc. |
| GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc. |
| GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc. |
| GPT2LMHeadModel | GPT-2 | gpt2 , gpt2-xl , etc. |
| Glm4ForCausalLM | GLM-4-0414 | THUDM/GLM-4-32B-0414, etc. |
| GlmForCausalLM | GLM-4 | THUDM/glm-4-9b-chat-hf , etc. |
| Gemma3nForConditionalGeneration | Gemma 3n | google/gemma-3n-E2B-it, google/gemma-3n-E4B-it, etc. |
| Gemma3ForCausalLM | Gemma 3 | google/gemma-3-1b-it, etc. |
| Gemma2ForCausalLM | Gemma 2 | google/gemma-2-9b, google/gemma-2-27b, etc. |
| GemmaForCausalLM | Gemma | google/gemma-2b , google/gemma-7b , etc. |
| FalconH1ForCausalLM | Falcon-H1 | tiiuae/Falcon-H1-34B-Base, tiiuae/Falcon-H1-34B-Instruct, etc. |
| FalconMambaForCausalLM | FalconMamba | tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc. |
| FalconForCausalLM | Falcon | tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc. |
| ExaoneForCausalLM | EXAONE-3 | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc. |
| Ernie4_5_MoeForCausalLM | Ernie4.5MoE | baidu/ERNIE-4.5-21B-A3B-PT, baidu/ERNIE-4.5-300B-A47B-PT, etc. |
| Ernie4_5_ForCausalLM | Ernie4.5 | baidu/ERNIE-4.5-0.3B-PT,etc. |
| Dots1ForCausalLM | dots.llm1 | rednote-hilab/dots.llm1.base, rednote-hilab/dots.llm1.inst etc. |
| DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc. |
| DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc. |
| DeepseekForCausalLM | DeepSeek | deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc. |
| DeciLMForCausalLM | DeciLM | Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc. |
| DbrxForCausalLM | DBRX | databricks/dbrx-base , databricks/dbrx-instruct , etc. |
| CohereForCausalLM , Cohere2ForCausalLM | Command-R | CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc. |
| ChatGLMModel, ChatGLMForConditionalGeneration | ChatGLM | THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc. |
| BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom , bigscience/bloomz , etc. |
| BartForConditionalGeneration | BART | facebook/bart-base , facebook/bart-large-cnn , etc. |
| BambaForCausalLM | Bamba | ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B |
| BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc. |
| ArcticForCausalLM | Arctic | Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc. |
| AquilaForCausalLM | Aquila, Aquila2 | BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertModel | BERT-based | BAAI/bge-base-en-v1.5 , etc. |
| Gemma2Model | Gemma2-based | BAAI/bge-multilingual-gemma2 , etc. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| GteModel | Arctic-Embed-2.0-M | Snowflake/snowflake-arctic-embed-m-v2.0. |
| GteNewModel | mGTE-TRM | Alibaba-NLP/gte-multilingual-base, etc. |
| ModernBertModel | ModernBERT-based | Alibaba-NLP/gte-modernbert-base, etc. |
| NomicBertModel | Nomic BERT | nomic-ai/nomic-embed-text-v1, nomic-ai/nomic-embed-text-v2-moe, Snowflake/snowflake-arctic-embed-m-long, etc. |
| LlamaModel , LlamaForCausalLM , MistralModel , etc. | Llama-based | intfloat/e5-mistral-7b-instruct , etc. |
| Qwen2Model , Qwen2ForCausalLM | Qwen2-based | ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc. |
| RobertaModel, RobertaForMaskedLM | RoBERTa-based | sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc. |
| Qwen3Model, Qwen3ForCausalLM | Qwen3-based | Qwen/Qwen3-Embedding-0.6B, etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc. |
| LlamaForCausalLM | Llama-based | peiyi9979/math-shepherd-mistral-7b-prm , etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B , etc. |
| Qwen2ForProcessRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| JambaForSequenceClassification | Jamba | ai21labs/Jamba-tiny-reward-dev , etc. |
| Qwen2ForSequenceClassification | Qwen2-based | jason9693/Qwen2.5-1.5B-apeach , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForSequenceClassification | BERT-based | cross-encoder/ms-marco-MiniLM-L-6-v2 , etc. |
| Qwen2ForSequenceClassification | Qwen2-based | mixedbread-ai/mxbai-rerank-base-v2, etc. |
| Qwen3ForSequenceClassification | Qwen3-based | tomaarsen/Qwen3-Reranker-0.6B-seq-cls, Qwen/Qwen3-Reranker-0.6B, etc. |
| RobertaForSequenceClassification | RoBERTa-based | cross-encoder/quora-roberta-base , etc. |
| XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3 , etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
|
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc. | |
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. | |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b etc. | |
| DeepseekVLV2ForCausalLM^ | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc. | |
| Florence2ForConditionalGeneration | Florence-2 | T + I | microsoft/Florence-2-base, microsoft/Florence-2-large etc. | |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b etc. | |
| Gemma3ForConditionalGeneration | Gemma 3 | T + I+ | google/gemma-3-4b-it,
google/gemma-3-27b-it, etc. |
|
| GLM4VForCausalLM^ | GLM-4V | T + I | THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc. | |
| Glm4vForConditionalGeneration | GLM-4.1V-Thinking | T + IE+ + VE+ | THUDM/GLM-4.1V-9B-Thinkg, etc. | |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b | |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. | |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3 etc. | |
| InternVLChatModel | InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE+ | OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. | |
| KeyeForConditionalGeneration | Keye-VL-8B-Preview | T + IE+ + VE+ | Kwai-Keye/Keye-VL-8B-Preview | |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking | |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. | |
| LlavaForConditionalGeneration | LLaVA-1.5 | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc. | |
| LlavaNextForConditionalGeneration | LLaVA-NeXT | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. | |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. | |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. | |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc. | |
| MiniMaxVL01ForConditionalGeneration | MiniMax-VL | T + IE+ | MiniMaxAI/MiniMax-VL-01, etc. | |
| Mistral3ForConditionalGeneration | Mistral3 | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. | |
| MllamaForConditionalGeneration | Llama 3.2 | T + I+ | meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc. | |
| MolmoForCausalLM | Molmo | T + I | allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc. | |
| NVLM_D_Model | NVLM-D 1.0 | T + IE+ | nvidia/NVLM-D-72B, etc. | |
| Ovis | Ovis2, Ovis1.6 | T + I+ | AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc. | |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. | |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. | |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+/ I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. | |
| PixtralForConditionalGeneration | Pixtral | T + I+ | mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc. | |
| QwenVLForConditionalGeneration^ | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct | |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. | |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-7B | |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B | |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct | |
| TarsierForConditionalGeneration | Tarsier | T + IE+ | omni-search/Tarsier-7b, omni-search/Tarsier-34b | |
| Tarsier2ForConditionalGeneration^ | Tarsier2 | T + IE+ + VE+ | omni-research/Tarsier2-Recap-7b,omni-research/Tarsier2-7b-0115 |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
|
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc. | |
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. | |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b etc. | |
| DeepseekVLV2ForCausalLM^ | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc. | |
| Florence2ForConditionalGeneration | Florence-2 | T + I | microsoft/Florence-2-base, microsoft/Florence-2-large etc. | |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b etc. | |
| Gemma3ForConditionalGeneration | Gemma 3 | T + I+ | google/gemma-3-4b-it,
google/gemma-3-27b-it, etc. |
|
| GLM4VForCausalLM^ | GLM-4V | T + I | THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc. | |
| Glm4vForConditionalGeneration | GLM-4.1V-Thinking | T + IE+ + VE+ | THUDM/GLM-4.1V-9B-Thinkg, etc. | |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b | |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. | |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3 etc. | |
| InternVLChatModel | InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE+ | OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. | |
| KeyeForConditionalGeneration | Keye-VL-8B-Preview | T + IE+ + VE+ | Kwai-Keye/Keye-VL-8B-Preview | |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking | |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. | |
| LlavaForConditionalGeneration | LLaVA-1.5 | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc. | |
| LlavaNextForConditionalGeneration | LLaVA-NeXT | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. | |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. | |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. | |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc. | |
| MiniMaxVL01ForConditionalGeneration | MiniMax-VL | T + IE+ | MiniMaxAI/MiniMax-VL-01, etc. | |
| Mistral3ForConditionalGeneration | Mistral3 | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. | |
| MllamaForConditionalGeneration | Llama 3.2 | T + I+ | meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc. | |
| MolmoForCausalLM | Molmo | T + I | allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc. | |
| NVLM_D_Model | NVLM-D 1.0 | T + IE+ | nvidia/NVLM-D-72B, etc. | |
| Ovis | Ovis2, Ovis1.6 | T + I+ | AIDC-AI/Ovis2-1B, AIDC-AI/Ovis1.6-Llama3.2-3B, etc. | |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. | |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. | |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+/ I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. | |
| PixtralForConditionalGeneration | Pixtral | T + I+ | mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc. | |
| QwenVLForConditionalGeneration^ | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct | |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. | |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-7B | |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B | |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct | |
| TarsierForConditionalGeneration | Tarsier | T + IE+ | omni-search/Tarsier-7b, omni-search/Tarsier-34b | |
| Tarsier2ForConditionalGeneration^ | Tarsier2 | T + IE+ + VE+ | omni-research/Tarsier2-Recap-7b,omni-research/Tarsier2-7b-0115 |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| WhisperForConditionalGeneration | Whisper | openai/whisper-small, openai/whisper-large-v3-turbo, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| LlavaNextForConditionalGeneration | LLaVA-NeXT-based | T / I | royokong/e5-v |
|
| Phi3VForCausalLM | Phi-3-Vision-based | T + I | TIGER-Lab/VLM2Vec-Full |
vLLM 0.8.5
以下列举该模板兼容的模型架构、名称和示例。如需进一步了解兼容列表中各类模型的使用方法和注意事项,可参考vLLM官方文档
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| Zamba2ForCausalLM | Zamba2 | Zyphra/Zamba2-7B-instruct, Zyphra/Zamba2-2.7B-instruct, Zyphra/Zamba2-1.2B-instruct, etc. |
| MiniMaxText01ForCausalLM | MiniMax-Text | MiniMaxAI/MiniMax-Text-01, etc. |
| XverseForCausalLM | XVERSE | xverse/XVERSE-7B-Chat , xverse/XVERSE-13B-Chat , xverse/XVERSE-65B-Chat , etc. |
| TeleFLMForCausalLM | TeleFLM | CofeAI/FLM-2-52B-Instruct-2407, CofeAI/Tele-FLM, etc. |
| TeleChat2ForCausalLM | TeleChat2 | TeleAI/TeleChat2-3B , TeleAI/TeleChat2-7B , TeleAI/TeleChat2-35B , etc. |
| Starcoder2ForCausalLM | Starcoder2 | bigcode/starcoder2-3b , bigcode/starcoder2-7b , bigcode/starcoder2-15b , etc. |
| StableLmForCausalLM | StableLM | stabilityai/stablelm-3b-4e1t , stabilityai/stablelm-base-alpha-7b-v2 , etc. |
| SolarForCausalLM | Solar Pro | upstage/solar-pro-preview-instruct , etc. |
| QWenLMHeadModel | Qwen | Qwen/Qwen-7B , Qwen/Qwen-7B-Chat , etc. |
| Qwen2MoeForCausalLM | Qwen2MoE | Qwen/Qwen1.5-MoE-A2.7B , Qwen/Qwen1.5-MoE-A2.7B-Chat , etc. |
| Qwen2ForCausalLM | QwQ, Qwen2 | Qwen/QwQ-32B-Preview , Qwen/Qwen2-7B-Instruct , Qwen/Qwen2-7B , etc. |
| Qwen3ForCausalLM | Qwen3 | Qwen/Qwen3-8B, etc. |
| Qwen3MoeForCausalLM | Qwen3MoE | Qwen/Qwen3-MoE-15B-A2B, etc. |
| Plamo2ForCausalLM | PLaMo2 | pfnet/plamo-2-1b, pfnet/plamo-2-8b, etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base, adept/persimmon-8b-chat, etc. |
| PhiMoEForCausalLM | Phi-3.5-MoE | microsoft/Phi-3.5-MoE-instruct , etc. |
| PhiForCausalLM | Phi | microsoft/phi-1_5 , microsoft/phi-2 , etc. |
| Phi3SmallForCausalLM | Phi-3-Small | microsoft/Phi-3-small-8k-instruct , microsoft/Phi-3-small-128k-instruct , etc. |
| Phi3ForCausalLM | Phi-4, Phi-3 | microsoft/Phi-4 , microsoft/Phi-3-mini-4k-instruct , microsoft/Phi-3-mini-128k-instruct , microsoft/Phi-3-medium-128k-instruct , etc. |
| PersimmonForCausalLM | Persimmon | adept/persimmon-8b-base , adept/persimmon-8b-chat , etc. |
| OrionForCausalLM | Orion | OrionStarAI/Orion-14B-Base , OrionStarAI/Orion-14B-Chat , etc. |
| OPTForCausalLM | OPT, OPT-IML | facebook/opt-66b , facebook/opt-iml-max-30b , etc. |
| OlmoForCausalLM | OLMo | allenai/OLMo-1B-hf , allenai/OLMo-7B-hf , etc. |
| OlmoeForCausalLM | OLMoE | allenai/OLMoE-1B-7B-0924 , allenai/OLMoE-1B-7B-0924-Instruct , etc. |
| Olmo2ForCausalLM | OLMo2 | allenai/OLMo2-7B-1124 , etc. |
| NemotronForCausalLM | Nemotron-3, Nemotron-4, Minitron | nvidia/Minitron-8B-Base , mgoin/Nemotron-4-340B-Base-hf-FP8 , etc. |
| MPTForCausalLM | MPT, MPT-Instruct, MPT-Chat, MPT-StoryWriter | mosaicml/mpt-7b , mosaicml/mpt-7b-storywriter , mosaicml/mpt-30b , etc. |
| MixtralForCausalLM | Mixtral-8x7B, Mixtral-8x7B-Instruct | mistralai/Mixtral-8x7B-v0.1 , mistralai/Mixtral-8x7B-Instruct-v0.1 , mistral-community/Mixtral-8x22B-v0.1 , etc. |
| MistralForCausalLM | Mistral, Mistral-Instruct | mistralai/Mistral-7B-v0.1 , mistralai/Mistral-7B-Instruct-v0.1 , etc. |
| MiniCPM3ForCausalLM | MiniCPM3 | openbmb/MiniCPM3-4B , etc. |
| MiniCPMForCausalLM | MiniCPM | openbmb/MiniCPM-2B-sft-bf16, openbmb/MiniCPM-2B-dpo-bf16, openbmb/MiniCPM-S-1B-sft, etc. |
| MambaForCausalLM | Mamba | state-spaces/mamba-130m-hf , state-spaces/mamba-790m-hf , state-spaces/mamba-2.8b-hf , etc. |
| LlamaForCausalLM | Llama 3.1, Llama 3, Llama 2, LLaMA, Yi | meta-llama/Meta-Llama-3.1-405B-Instruct , meta-llama/Meta-Llama-3.1-70B , meta-llama/Meta-Llama-3-70B-Instruct , meta-llama/Llama-2-70b-hf , 01-ai/Yi-34B , etc. |
| JambaForCausalLM | Jamba | ai21labs/AI21-Jamba-1.5-Large , ai21labs/AI21-Jamba-1.5-Mini , ai21labs/Jamba-v0.1 , etc. |
| JAISLMHeadModel | Jais | inceptionai/jais-13b , inceptionai/jais-13b-chat , inceptionai/jais-30b-v3 , inceptionai/jais-30b-chat-v3 , etc. |
| InternLMForCausalLM | InternLM | internlm/internlm-7b , internlm/internlm-chat-7b , etc. |
| InternLM3ForCausalLM | InternLM3 | internlm/internlm3-8b-instruct , etc. |
| InternLM2ForCausalLM | InternLM2 | internlm/internlm2-7b , internlm/internlm2-chat-7b , etc. |
| Grok1ModelForCausalLM | Grok1 | hpcai-tech/grok-1. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm . |
| GraniteMoeSharedForCausalLM | Granite MoE Shared | ibm-research/moe-7b-1b-active-shared-experts (test model) |
| GraniteMoeForCausalLM | Granite 3.0 MoE, PowerMoE | ibm-granite/granite-3.0-1b-a400m-base , ibm-granite/granite-3.0-3b-a800m-instruct , ibm/PowerMoE-3b , etc. |
| GraniteForCausalLM | Granite 3.0, Granite 3.1, PowerLM | ibm-granite/granite-3.0-2b-base , ibm-granite/granite-3.1-8b-instruct , ibm/PowerLM-3b , etc. |
| GPTNeoXForCausalLM | GPT-NeoX, Pythia, OpenAssistant, Dolly V2, StableLM | EleutherAI/gpt-neox-20b , EleutherAI/pythia-12b , OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5 , databricks/dolly-v2-12b , stabilityai/stablelm-tuned-alpha-7b , etc. |
| GPTJForCausalLM | GPT-J | EleutherAI/gpt-j-6b , nomic-ai/gpt4all-j , etc. |
| GPTBigCodeForCausalLM | StarCoder, SantaCoder, WizardCoder | bigcode/starcoder , bigcode/gpt_bigcode-santacoder , WizardLM/WizardCoder-15B-V1.0 , etc. |
| GPT2LMHeadModel | GPT-2 | gpt2 , gpt2-xl , etc. |
| Glm4ForCausalLM | GLM-4-0414 | THUDM/GLM-4-32B-0414, etc. |
| GlmForCausalLM | GLM-4 | THUDM/glm-4-9b-chat-hf , etc. |
| Gemma3ForCausalLM | Gemma 3 | google/gemma-3-1b-it, etc. |
| Gemma2ForCausalLM | Gemma 2 | google/gemma-2-9b, google/gemma-2-27b, etc. |
| GemmaForCausalLM | Gemma | google/gemma-2b , google/gemma-7b , etc. |
| FalconMambaForCausalLM | FalconMamba | tiiuae/falcon-mamba-7b , tiiuae/falcon-mamba-7b-instruct , etc. |
| FalconForCausalLM | Falcon | tiiuae/falcon-7b , tiiuae/falcon-40b , tiiuae/falcon-rw-7b , etc. |
| ExaoneForCausalLM | EXAONE-3 | LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct , etc. |
| DeepseekV3ForCausalLM | DeepSeek-V3 | deepseek-ai/DeepSeek-V3-Base , deepseek-ai/DeepSeek-V3 etc. |
| DeepseekV2ForCausalLM | DeepSeek-V2 | deepseek-ai/DeepSeek-V2 , deepseek-ai/DeepSeek-V2-Chat etc. |
| DeepseekForCausalLM | DeepSeek | deepseek-ai/deepseek-llm-67b-base , deepseek-ai/deepseek-llm-7b-chat etc. |
| DeciLMForCausalLM | DeciLM | Deci/DeciLM-7B , Deci/DeciLM-7B-instruct , etc. |
| DbrxForCausalLM | DBRX | databricks/dbrx-base , databricks/dbrx-instruct , etc. |
| CohereForCausalLM , Cohere2ForCausalLM | Command-R | CohereForAI/c4ai-command-r-v01 , CohereForAI/c4ai-command-r7b-12-2024 , etc. |
| ChatGLMModel, ChatGLMForConditionalGeneration | ChatGLM | THUDM/chatglm2-6b , THUDM/chatglm3-6b , etc. |
| BloomForCausalLM | BLOOM, BLOOMZ, BLOOMChat | bigscience/bloom , bigscience/bloomz , etc. |
| BartForConditionalGeneration | BART | facebook/bart-base , facebook/bart-large-cnn , etc. |
| BambaForCausalLM | Bamba | ibm-ai-platform/Bamba-9B-fp8, ibm-ai-platform/Bamba-9B |
| BaiChuanForCausalLM | Baichuan2, Baichuan | baichuan-inc/Baichuan2-13B-Chat , baichuan-inc/Baichuan-7B , etc. |
| ArcticForCausalLM | Arctic | Snowflake/snowflake-arctic-base , Snowflake/snowflake-arctic-instruct , etc. |
| AquilaForCausalLM | Aquila, Aquila2 | BAAI/Aquila-7B , BAAI/AquilaChat-7B , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertModel | BERT-based | BAAI/bge-base-en-v1.5 , etc. |
| Gemma2Model | Gemma2-based | BAAI/bge-multilingual-gemma2 , etc. |
| GritLM | GritLM | parasail-ai/GritLM-7B-vllm . |
| LlamaModel , LlamaForCausalLM , MistralModel , etc. | Llama-based | intfloat/e5-mistral-7b-instruct , etc. |
| Qwen2Model , Qwen2ForCausalLM | Qwen2-based | ssmits/Qwen2-7B-Instruct-embed-base (see note), Alibaba-NLP/gte-Qwen2-7B-instruct (see note), etc. |
| RobertaModel , RobertaForMaskedLM | RoBERTa-based | sentence-transformers/all-roberta-large-v1 , sentence-transformers/all-roberta-large-v1 , etc. |
| XLMRobertaModel | XLM-RoBERTa-based | intfloat/multilingual-e5-large , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| InternLM2ForRewardModel | InternLM2-based | internlm/internlm2-1_8b-reward , internlm/internlm2-7b-reward , etc. |
| LlamaForCausalLM | Llama-based | peiyi9979/math-shepherd-mistral-7b-prm , etc. |
| Qwen2ForRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-RM-72B , etc. |
| Qwen2ForProcessRewardModel | Qwen2-based | Qwen/Qwen2.5-Math-PRM-7B , Qwen/Qwen2.5-Math-PRM-72B , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| JambaForSequenceClassification | Jamba | ai21labs/Jamba-tiny-reward-dev , etc. |
| Qwen2ForSequenceClassification | Qwen2-based | jason9693/Qwen2.5-1.5B-apeach , etc. |
| 架构 | 模型 | HuggingFace模型示例 |
|---|---|---|
| BertForSequenceClassification | BERT-based | cross-encoder/ms-marco-MiniLM-L-6-v2 , etc. |
| RobertaForSequenceClassification | RoBERTa-based | cross-encoder/quora-roberta-base , etc. |
| XLMRobertaForSequenceClassification | XLM-RoBERTa-based | BAAI/bge-reranker-v2-m3 , etc. |
| ModernBertForSequenceClassification | ModernBert-based | Alibaba-NLP/gte-reranker-modernbert-base, etc. |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| AriaForConditionalGeneration | Aria | T + I+ | rhymes-ai/Aria |
|
| AyaVisionForConditionalGeneration | Aya Vision | T + I+ | CohereForAI/aya-vision-8b, CohereForAI/aya-vision-32b, etc. | |
| Blip2ForConditionalGeneration | BLIP-2 | T + IE | Salesforce/blip2-opt-2.7b, Salesforce/blip2-opt-6.7b, etc. | |
| ChameleonForConditionalGeneration | Chameleon | T + I | facebook/chameleon-7b etc. | |
| DeepseekVLV2ForCausalLM | DeepSeek-VL2 | T + I+ | deepseek-ai/deepseek-vl2-tiny, deepseek-ai/deepseek-vl2-small, deepseek-ai/deepseek-vl2 etc. | |
| Florence2ForConditionalGeneration | Florence-2 | T + I | microsoft/Florence-2-base, microsoft/Florence-2-large etc. | |
| FuyuForCausalLM | Fuyu | T + I | adept/fuyu-8b etc. | |
| Gemma3ForConditionalGeneration | Gemma 3 | T + I+ | google/gemma-3-4b-it,
google/gemma-3-27b-it, etc. |
|
| GLM4VForCausalLM^ | GLM-4V | T + I | THUDM/glm-4v-9b, THUDM/cogagent-9b-20241220 etc. | |
| GraniteSpeechForConditionalGeneration | Granite Speech | T + A | ibm-granite/granite-speech-3.3-8b | |
| H2OVLChatModel | H2OVL | T + IE+ | h2oai/h2ovl-mississippi-800m, h2oai/h2ovl-mississippi-2b, etc. | |
| Idefics3ForConditionalGeneration | Idefics3 | T + I | HuggingFaceM4/Idefics3-8B-Llama3 etc. | |
| InternVLChatModel | InternVL 2.5, Mono-InternVL, InternVL 2.0 | T + IE+ | OpenGVLab/InternVL2_5-4B, OpenGVLab/Mono-InternVL-2B, OpenGVLab/InternVL2-4B, etc. | |
| KimiVLForConditionalGeneration | Kimi-VL-A3B-Instruct, Kimi-VL-A3B-Thinking | T + I+ | moonshotai/Kimi-VL-A3B-Instruct, moonshotai/Kimi-VL-A3B-Thinking | |
| Llama4ForConditionalGeneration | Llama 4 | T + I+ | meta-llama/Llama-4-Scout-17B-16E-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, meta-llama/Llama-4-Maverick-17B-128E-Instruct, etc. | |
| LlavaForConditionalGeneration | LLaVA-1.5 | T + IE+ | llava-hf/llava-1.5-7b-hf, TIGER-Lab/Mantis-8B-siglip-llama3 (see note), etc. | |
| LlavaNextForConditionalGeneration | LLaVA-NeXT | T + IE+ | llava-hf/llava-v1.6-mistral-7b-hf, llava-hf/llava-v1.6-vicuna-7b-hf, etc. | |
| LlavaNextVideoForConditionalGeneration | LLaVA-NeXT-Video | T + V | llava-hf/LLaVA-NeXT-Video-7B-hf, etc. | |
| LlavaOnevisionForConditionalGeneration | LLaVA-Onevision | T + I+ + V+ | llava-hf/llava-onevision-qwen2-7b-ov-hf, llava-hf/llava-onevision-qwen2-0.5b-ov-hf, etc. | |
| MiniCPMO | MiniCPM-O | T + IE+ + VE+ + AE+ | openbmb/MiniCPM-o-2_6, etc. | |
| MiniCPMV | MiniCPM-V | T + IE+ + VE+ | openbmb/MiniCPM-V-2 (see note), openbmb/MiniCPM-Llama3-V-2_5, openbmb/MiniCPM-V-2_6, etc. | |
| Mistral3ForConditionalGeneration | Mistral3 | T + I+ | mistralai/Mistral-Small-3.1-24B-Instruct-2503, etc. | |
| MllamaForConditionalGeneration | Llama 3.2 | T + I+ | meta-llama/Llama-3.2-90B-Vision-Instruct, meta-llama/Llama-3.2-11B-Vision, etc. | |
| MolmoForCausalLM | Molmo | T + I | allenai/Molmo-7B-D-0924, allenai/Molmo-72B-0924, etc. | |
| NVLM_D_Model | NVLM-D 1.0 | T + IE+ | nvidia/NVLM-D-72B, etc. | |
| PaliGemmaForConditionalGeneration | PaliGemma, PaliGemma 2 | T + IE | google/paligemma-3b-pt-224, google/paligemma-3b-mix-224, google/paligemma2-3b-ft-docci-448, etc. | |
| Phi3VForCausalLM | Phi-3-Vision, Phi-3.5-Vision | T + IE+ | microsoft/Phi-3-vision-128k-instruct, microsoft/Phi-3.5-vision-instruct, etc. | |
| Phi4MMForCausalLM | Phi-4-multimodal | T + I+ / T + A+/ I+ + A+ | microsoft/Phi-4-multimodal-instruct, etc. | |
| PixtralForConditionalGeneration | Pixtral | T + I+ | mistralai/Pixtral-12B-2409, mistral-community/pixtral-12b (see note), etc. | |
| QwenVLForConditionalGeneration | Qwen-VL | T + IE+ | Qwen/Qwen-VL, Qwen/Qwen-VL-Chat, etc. | |
| Qwen2AudioForConditionalGeneration | Qwen2-Audio | T + A+ | Qwen/Qwen2-Audio-7B-Instruct | |
| Qwen2VLForConditionalGeneration | QVQ, Qwen2-VL | T + IE+ + VE+ | Qwen/QVQ-72B-Preview, Qwen/Qwen2-VL-7B-Instruct, Qwen/Qwen2-VL-72B-Instruct, etc. | |
| Qwen2_5_VLForConditionalGeneration | Qwen2.5-VL | T + IE+ + VE+ | Qwen/Qwen2.5-VL-3B-Instruct, Qwen/Qwen2.5-VL-72B-Instruct, etc. | |
| Qwen2_5OmniThinkerForConditionalGeneration | Qwen2.5-Omni | T + IE+ + VE+ + A+ | Qwen/Qwen2.5-Omni-7B | |
| SkyworkR1VChatModel | Skywork-R1V-38B | T + I | Skywork/Skywork-R1V-38B | |
| SmolVLMForConditionalGeneration | SmolVLM2 | T + I | SmolVLM2-2.2B-Instruct | |
| UltravoxModel | Ultravox | T + AE+ | fixie-ai/ultravox-v0_3 |
| 架构 | 模型 | 输入 | HuggingFace模型示例 | 说明 |
|---|---|---|---|---|
| LlavaNextForConditionalGeneration | LLaVA-NeXT-based | T / I | royokong/e5-v |
|
| Phi3VForCausalLM | Phi-3-Vision-based | T + I | TIGER-Lab/VLM2Vec-Full | |
| Qwen2VLForConditionalGeneration | Qwen2-VL-based | T + I | MrLight/dse-qwen2-2b-mrl-v1 |
vllm-ascend-v0.17.0rc1
以下列举该模板支持的模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考vllm-ascend官方文档
| Models |
|---|
| Aria |
| Baichuan |
| Baichuan2 |
| Bert |
| DeepSeek Distill (Qwen/Llama) |
| DeepSeek R1 |
| DeepSeek V3.2 |
| DeepSeek V3/3.1 |
| Ernie4.5 |
| Ernie4.5-Moe |
| Gemma-2 |
| Gemma-3 |
| Gemma3 |
| GLM-4.x |
| GLM-5 |
| Internlm |
| Kimi-K2-Thinking |
| DeepseekOCR2 |
| MiniMax-M2.5 |
| Llama2/3/3.1/3.2 |
| Llama3.2 |
| LLaVA-Next |
| LLaVA-Next-Video |
| MiniCPM |
| MiniCPM-V |
| MiniCPM3 |
| Mistral/Mistral-Instruct |
| DeepSeek V2.5 |
| Mllama |
| MiniMax-Text |
| Mistral3 |
| Molmo |
| PaddleOCR-VL |
| Llama4 |
| Keye-VL-8B-Preview |
| Florence-2 |
| GLM-4V |
| InternVL2.0/2.5/3.0InternVideo2.5/Mono-InternVL |
| Whisper |
| Ultravox |
| Phi-3-Vision/Phi-3.5-Vision |
| Phi-3/4 |
| Phi-4-mini |
| QVQ |
| Qwen2 |
| Qwen2-Audio |
| Qwen2-based |
| Qwen2-VL |
| Qwen2.5 |
| Qwen2.5-Omni |
| Qwen2.5-VL |
| Qwen3 |
| Qwen3-based |
| Qwen3-Coder |
| Qwen3-Embedding |
| Qwen3-Moe |
| Qwen3-Next |
| Qwen3-Omni |
| Qwen3-Omni-30B-A3B-Thinking |
| Qwen3-Reranker |
| Qwen3-VL |
| Qwen3-VL-Embedding |
| Qwen3-VL-MOE |
| Qwen3.5-397B-A17B |
| Qwen3.5-27B |
| Qwen3-VL-Reranker |
| QwQ-32B |
| XLM-RoBERTa-based |
vllm-ascend-v0.18.0
以下列举该模板支持的模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考vllm-ascend官方文档
| Models |
|---|
| Aria |
| Baichuan |
| Baichuan2 |
| Bert |
| DeepSeek Distill (Qwen/Llama) |
| DeepSeek R1 |
| DeepSeek V3.2 |
| DeepSeek V3/3.1 |
| Ernie4.5 |
| Ernie4.5-Moe |
| Gemma-2 |
| Gemma-3 |
| Gemma3 |
| GLM-4.x |
| GLM-5 |
| Internlm |
| Kimi-K2-Thinking |
| DeepseekOCR2 |
| MiniMax-M2.5 |
| MiniMax-M2.7 |
| Llama2/3/3.1/3.2 |
| Llama3.2 |
| LLaVA-Next |
| LLaVA-Next-Video |
| MiniCPM |
| MiniCPM-V |
| MiniCPM3 |
| Mistral/Mistral-Instruct |
| DeepSeek V2.5 |
| Mllama |
| MiniMax-Text |
| Mistral3 |
| Molmo |
| PaddleOCR-VL |
| Llama4 |
| Keye-VL-8B-Preview |
| Florence-2 |
| GLM-4V |
| InternVL2.0/2.5/3.0 InternVideo2.5/Mono-InternVL |
| Whisper |
| Ultravox |
| Phi-3-Vision/Phi-3.5-Vision |
| Phi-3/4 |
| Phi-4-mini |
| QVQ |
| Qwen2 |
| Qwen2-Audio |
| Qwen2-based |
| Qwen2-VL |
| Qwen2.5 |
| Qwen2.5-Omni |
| Qwen2.5-VL |
| Qwen3 |
| Qwen3-based |
| Qwen3-Coder |
| Qwen3-Embedding |
| Qwen3-Moe |
| Qwen3-Next |
| Qwen3-Omni |
| Qwen3-Omni-30B-A3B-Thinking |
| Qwen3-Reranker |
| Qwen3-VL |
| Qwen3-VL-Embedding |
| Qwen3-VL-MOE |
| Qwen3.5-397B-A17B |
| Qwen3.5-27B |
| Qwen3.5-35B-A3B |
| Qwen3.6-27B |
| Qwen3.6-35B-A3B |
| Qwen3-VL-Reranker |
| QwQ-32B |
| XLM-RoBERTa-based |
Diffusers 0.37.0
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Diffuser官方文档
Transformers 5.3.0
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Transformers官方文档
Sentence Transformers 5.3.0
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考Sentence Transformers官方文档
| 模型 |
|---|
| all-MiniLM-L12-v1 |
| all-MiniLM-L12-v2 |
| all-MiniLM-L6-v1 |
| all-MiniLM-L6-v2 |
| all-distilroberta-v1 |
| all-mpnet-base-v1 |
| all-mpnet-base-v2 |
| all-roberta-large-v1 |
| average_word_embeddings_glove.6B.300d |
| average_word_embeddings_komninos |
| distiluse-base-multilingual-cased-v1 |
| distiluse-base-multilingual-cased-v2 |
| gtr-t5-base |
| gtr-t5-large |
| gtr-t5-xxl |
| gtr-t5-xl |
| msmarco-bert-base-dot-v5 |
| msmarco-distilbert-dot-v5 |
| msmarco-distilbert-base-tas-b |
| msmarco-distilbert-cos-v5 |
| msmarco-MiniLM-L12-cos-v5 |
| msmarco-MiniLM-L6-cos-v5 |
| multi-qa-MiniLM-L6-cos-v1 |
| multi-qa-MiniLM-L6-dot-v1 |
| multi-qa-distilbert-cos-v1 |
| multi-qa-distilbert-dot-v1 |
| multi-qa-mpnet-base-cos-v1 |
| multi-qa-mpnet-base-dot-v1 |
| paraphrase-MiniLM-L12-v2 |
| paraphrase-MiniLM-L3-v2 |
| paraphrase-MiniLM-L6-v2 |
| paraphrase-TinyBERT-L6-v2 |
| paraphrase-albert-small-v2 |
| paraphrase-distilroberta-base-v2 |
| paraphrase-mpnet-base-v2 |
| paraphrase-multilingual-MiniLM-L12-v2 |
| paraphrase-multilingual-mpnet-base-v2 |
| LaBSE |
| sentence-t5-base |
| sentence-t5-large |
| sentence-t5-xl |
| sentence-t5-xxl |
| clip-ViT-L-14 |
| clip-ViT-B-16 |
| clip-ViT-B-32 |
| clip-ViT-B-32-multilingual-v1 |
| Qwen/Qwen3-VL-Embedding-2B |
| hkunlp/instructor-base |
| hkunlp/instructor-large |
| hkunlp/instructor-xl |
| allenai-specter |
说明: 以上是Sentence Transformers提供的官方模型,查看更多支持的社区模型,请参考Sentence Transformers社区模型
llama.cpp-b6152
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考llama.cpp官方文档
SGLang-0.5.11
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考SGLang官方文档
| 模型类型 | 模型 |
|---|---|
| 大语言模型 | DeepSeek (v1, v2, v3/R1) |
| Kimi K2 (Thinking, Instruct) | |
| Kimi Linear (48B-A3B) | |
| GPT-OSS | |
| Qwen (3.5, 3, 3MoE, 3Next, 2.5, 2 series) | |
| Llama (2, 3.x, 4 series) | |
| Mistral (Mixtral, NeMo, Small3) | |
| Gemma (v1, v2, v3) | |
| Phi (Phi-1.5, Phi-2, Phi-3, Phi-4, Phi-MoE series) | |
| MiniCPM (v3, 4B) | |
| OLMo (2, 3) | |
| OLMoE (Open MoE) | |
| MiniMax-M2 (M2, M2.1, M2.5) | |
| StableLM (3B, 7B) | |
| Command-(R,A) (Cohere) | |
| DBRX (Databricks) | |
| Grok (xAI) | |
| ChatGLM (GLM-130B family) | |
| InternLM 2 (7B, 20B) | |
| ExaONE 3 (Korean-English) | |
| Baichuan 2 (7B, 13B) | |
| XVERSE (MoE) | |
| SmolLM (135M–1.7B) | |
| GLM-4 (Multilingual 9B) | |
| MiMo (7B series) | |
| ERNIE-4.5 (4.5, 4.5MoE series) | |
| Arcee AFM-4.5B | |
| Persimmon (8B) | |
| Solar (10.7B) | |
| Tele FLM (52B-1T) | |
| Ling (16.8B–290B) | |
| Granite 3.0, 3.1 (IBM) | |
| Granite 3.0 MoE (IBM) | |
| GPT-J (6B) | |
| Orion (14B) | |
| Llama Nemotron Super (v1, v1.5, NVIDIA) | |
| Llama Nemotron Ultra (v1, NVIDIA) | |
| NVIDIA Nemotron Nano 2.0 | |
| NVIDIA Nemotron 3 Super (NVIDIA) | |
| NVIDIA Nemotron 3 Nano (NVIDIA) | |
| StarCoder2 (3B-15B) | |
| Jet-Nemotron | |
| Trinity (Nano, Mini) | |
| LFM2 (350M, 1.2B) | |
| LFM2-MoE (8B-A1B, 24B-A2B) | |
| Falcon-H1 (0.5B–34B) | |
| Hunyuan-Large (389B, MoE) | |
| IBM Granite 4.0 (Hybrid, Dense) | |
| Sarvam 2 (30B-A2B, 105B-A10B) | |
| Laguna XS.2 (poolside) | |
| 多模态模型 | Qwen-VL (Qwen2-VL, Qwen2.5-VL, Qwen3-VL, Qwen3-Omni) |
| DeepSeek-VL2 | |
| DeepSeek-OCR / OCR-2 | |
| Janus-Pro (1B, 7B) | |
| MiniCPM-V / MiniCPM-o | |
| Llama 3.2 Vision (11B) | |
| LLaVA (v1.5 & v1.6) | |
| LLaVA-NeXT (8B, 72B) | |
| LLaVA-OneVision | |
| Gemma 3 (Multimodal) | |
| Kimi-VL (A3B) | |
| Mistral-Small-3.1-24B | |
| Phi-4-multimodal-instruct | |
| MiMo-VL (7B) | |
| GLM-4.5V (106B) / GLM-4.1V(9B) | |
| GLM-OCR | |
| DotsVLM (General/OCR) | |
| DotsVLM-OCR | |
| NVILA (8B, 15B, Lite-2B, Lite-8B, Lite-15B) | |
| NVIDIA Nemotron Nano 2.0 VL | |
| Ernie4.5-VL | |
| JetVLM | |
| Step3-VL (10B) | |
| Qwen3-ASR (0.6B, 1.7B) | |
| Qwen3-Omni | |
| LFM2-VL | |
| 音频转写模型 | Whisper |
| Qwen3-ASR (0.6B, 1.7B) | |
| 扩散语言模型 | LLaDA2.0 (mini, flash) |
| SDAR (JetLM, dense/MoE) | |
| 嵌入模型 | E5 (Llama/Mistral based) |
| GTE-Qwen2 | |
| Qwen3-Embedding | |
| BGE | |
| GME (Multimodal) | |
| CLIP | |
| 奖励模型 | Llama (3.1 Reward / LlamaForSequenceClassification) |
| Gemma 2 (27B Reward / Gemma2ForSequenceClassification) | |
| InternLM 2 (Reward / InternLM2ForRewardModel) | |
| Qwen2.5 (Reward - Math / Qwen2ForRewardModel) | |
| Qwen2.5 (Reward - Sequence / Qwen2ForSequenceClassification) | |
| 重排序模型 | BGE-Reranker (BgeRerankModel) |
| Qwen3-Reranker (decoder-only yes/no) | |
| Qwen3-VL-Reranker (multimodal yes/no) | |
| 分类模型 | LlamaForSequenceClassification |
| Qwen2ForSequenceClassification | |
| Qwen3ForSequenceClassification | |
| BertForSequenceClassification | |
| Gemma2ForSequenceClassification |
MindIE 2.3.0
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考MindIE官方文档
| 模型类型 | 模型 |
|---|---|
| 大语言模型 | Qwen3-235B-A22B |
| Qwen3-30B-A3B | |
| DeepSeek-R1-0528 | |
| DeepSeek-V2-236B | |
| DeepSeek-V3-0324 | |
| DeepSeek-V3.1 | |
| Mixtral-8x7B-Instruct-V0.1 | |
| Mixtral-8x22B-Instruct-V0.1 | |
| Kimi K2 | |
| GLM4.5 | |
| Ernie 4.5 | |
| DeepSeek-R1-Distill-Llama-8B | |
| DeepSeek-R1-Distill-Llama-70B | |
| DeepSeek-R1-Distill-Qwen-1.5B | |
| DeepSeek-R1-Distill-Qwen-7B | |
| DeepSeek-R1-Distill-Qwen-14B | |
| Qwen2-7B-Instruct | |
| Qwen2-72B-Instruct | |
| Qwen2.5-7B-Instruct | |
| Qwen2.5-14B-Instruct | |
| Qwen2.5-32B-Instruct | |
| Qwen2.5-72B-Instruct | |
| Qwen3-4B | |
| Qwen3-8B | |
| Qwen3-14B | |
| Qwen3-32B | |
| LLaMA3-8B | |
| LLaMA3-70B | |
| LLaMA3.1-8B | |
| LLaMA3.1-70B | |
| LLaMA3.1-405B | |
| ChatGLM3-6B | |
| GLM4-9B | |
| Baichuan2-7B | |
| Baichuan2-13B | |
| Bloom-7B | |
| 多模态理解模型 | GLM-4V-9B |
| MiniCPM-V2.6-8B | |
| InternVL2-8B | |
| InternVL2-40B | |
| InternVL2.5-8B | |
| InternVL2.5-78B | |
| Qwen2-Audio-7B-Instruct | |
| Qwen2-VL-7B-Instruct | |
| Qwen2-VL-72B-Instruct | |
| Qwen2.5-VL-7B-Instruct | |
| Qwen2.5-VL-32B-Instruct | |
| Qwen2.5-VL-72B-Instruct | |
| VITA1.5-8B | |
| 多模态生成模型 | Stable Diffusion 1.5 |
| Stable Diffusion 2.1 | |
| Stable Diffusion XL | |
| Stable Diffusion XL_lighting | |
| Stable Diffusion 3 | |
| Stable Video Diffusion | |
| Stable Audio Open v1.0 | |
| OpenSora v1.2 | |
| OpenSoraPlan v1.2 | |
| OpenSoraPlan v1.3 | |
| DiT | |
| sd-webui | |
| CogView3-Plus-3B | |
| CogVideoX-2B | |
| CogVideoX-5B | |
| FLUX.1-dev | |
| HunyuanDiT | |
| HunyuanVideo | |
| Wan2.1-T2V-14B | |
| Wan2.1-I2V-14B | |
| Wan2.2-T2V-A14B | |
| Wan2.2-I2V-A14B | |
| Wan2.2-TI2V-5B |
MindIE 1.0.0
以下列举该模板兼容模型名称。如需进一步了解兼容列表中各模型的使用方法和注意事项,可参考MindIE官方文档
| 模型类型 | 模型 |
|---|---|
| 大语言模型 | DeepSeek-V2-Lite-16B |
| DeepSeek-V2-236B | |
| Qwen2.5-72B | |
| Qwen2.5-32B | |
| Qwen2.5-14B | |
| Qwen2.5-7B | |
| Qwen2-57B-A14B | |
| Qwen2-72B | |
| Qwen2-7B | |
| Qwen1.5-0.5B | |
| Qwen1.5-1.8B | |
| Qwen1.5-4B | |
| Qwen-7B | |
| Qwen-14B | |
| Qwen-72B | |
| LLaMA3-8B | |
| LLaMA3-70B | |
| LLaMA3.1-8B | |
| LLaMA3.1-70B | |
| LLaMA3.1-405B | |
| LLaMA-7B | |
| LLaMA-13B | |
| LLaMA-33B | |
| LLaMA-65B | |
| LLaMA2-7B | |
| LLaMA2-13B | |
| LLaMA2-70B | |
| ChatGLM2-6B | |
| ChatGLM3-6B | |
| ChatGLM3-6B-32K | |
| GLM4-9B-Chat | |
| Baichuan2-7B | |
| Baichuan2-13B | |
| Bloom-7B | |
| Bloom-176B | |
| CodeLLaMA-34B | |
| StarCoder-15.5B | |
| StarCoder2-15B | |
| Yi-6B-200K | |
| Yi-34B-200K | |
| CodeGeeX2-6B | |
| CodeShell-7B | |
| Gemma-7B | |
| GPT-NEOX-20B | |
| Ziya-Coding-34B | |
| InternLM2-20B | |
| InternLM-20B | |
| InternLM2-7B | |
| InternLM2-20B | |
| Mixtral-8x7B-Instruct-V0.1 | |
| Mixtral-8x22B-Instruct-V0.1 | |
| Vicuna-13B | |
| 嵌入模型 | bge-large-zh-v1.5 |
| bge-reranker-large | |
| bge-m3 | |
| 多模态理解模型 | InternVL-Chat-V1-2 |
| InternVL-Chat-V1-5 | |
| InternVL2-8B | |
| InternVL2-40B | |
| Qwen-VL-9.6B | |
| Qwen2-Audio-7B-Instruct | |
| Qwen2-VL-7B-Instruct | |
| internLM-xcomposer2-vl-7B | |
| internLM-XComposer2-4KHD-7B | |
| LLava-1.6-mistral-7B | |
| LLava-1.6-vicuna-7B | |
| LLava-1.6-vicuna-13B | |
| LLava-v1.6-34b-hf | |
| LLava-next-video-34b | |
| LLava-next-video-7b | |
| LLava-v1.5-13B | |
| LLava-v1.5-7B | |
| MiniCPM-Llama3-V-2_5 | |
| MiniCPM-V-2 | |
| 多模态生成模型 | Stable Diffusion 1.5 |
| Stable Diffusion 2.1 | |
| Stable Diffusion XL | |
| Stable Diffusion XL_controlnet | |
| Stable Diffusion XL_inpainting | |
| Stable Diffusion XL_prompt_weight | |
| Stable Diffusion 3 | |
| Stable Video Diffusion | |
| Stable Audio Open v1.0 | |
| OpenSora v1.2 | |
| DiT | |
| sd-webui | |
| CogView3-Plus-3B | |
| HunyuanDiT |