ai/deepseek-r1-distill-llama

Verified Publisher

By Docker

Updated 9 months ago

Distilled LLaMA by DeepSeek, fast and optimized for real-world tasks

Model
76

100K+

ai/deepseek-r1-distill-llama repository overview

Deepseek-R1-Distill-Llama

logo

DeepSeek introduced its first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1, leveraging reinforcement learning to enhance reasoning performance, with DeepSeek-R1 achieving state-of-the-art results and open-sourcing multiple distilled models.

The models provided here are the distill-llama variants, which are llama based models that have been fine-tuned on the responses and reasoning output of the full DeepSeek-R1 model.

Intended uses

Deepseek-R1-Distill-Llama can help with:

  • Software development: Generates code, debugs, and explains complex concepts.
  • Mathematics: Solves and explains complex problems for research and education.
  • Content creation and editing: Writes, edits, and summarizes content for various industries.
  • Customer service: Powers chatbots to engage users and answer queries.
  • Data analysis: Extracts insights and generates reports from large datasets.
  • Education: Acts as a digital tutor, providing clear explanations and personalized lessons.

Characteristics

AttributeDetails
ProviderDeepseek
Architecturellama
Cutoff dateMay 2024ⁱ
LanguagesEnglish, Chinese
Tool calling
Input modalitiesText
Output modalitiesText
LicenseMIT

i: Estimated

Available model variants

Model variantParametersQuantizationContext windowVRAM¹Size
ai/deepseek-r1-distill-llama:latest

ai/deepseek-r1-distill-llama:8B-Q4_K_M
8BIQ2_XXS/Q4_K_M131K tokens5.33 GiB4.58 GB
ai/deepseek-r1-distill-llama:8B-Q4_08BQ4_0131K tokens5.09 GiB4.33 GB
ai/deepseek-r1-distill-llama:8B-Q4_K_M8BIQ2_XXS/Q4_K_M131K tokens5.33 GiB4.58 GB
ai/deepseek-r1-distill-llama:8B-F168BF16131K tokens15.01 GiB14.96 GB
ai/deepseek-r1-distill-llama:70B-Q4_070BQ4_0131K tokens38.73 GiB37.22 GB
ai/deepseek-r1-distill-llama:70B-Q4_K_M70BIQ2_XXS/Q4_K_M131K tokens41.11 GiB39.59 GB

¹: VRAM estimated based on model characteristics.

latest8B-Q4_K_M

Use this AI model with Docker Model Runner

First, pull the model:

docker model pull ai/deepseek-r1-distill-llama

Then run the model:

docker model run ai/deepseek-r1-distill-llama

For more information on Docker Model Runner, explore the documentation.

Usage tips

  • Set the temperature between 0.5 and 0.7 (recommended: 0.6) to avoid repetition or incoherence.
  • Do not use a system prompt. Include all instructions within the user prompt.
  • For math problems, add a directive like: "Please reason step by step and enclose the final answer in \boxed{}."

This model is sensitive to prompts. Few-shot prompting consistently degrades its performance. Therefore, we recommend you directly describe the problem and specify the output format using a zero-shot setting for optimal results.

Benchmark performance

CategoryBenchmarkDeepSeek R1
English
MMLU (Pass@1)90.8
MMLU-Redux (EM)92.9
MMLU-Pro (EM)
DROP (3-shot F1)
IF-Eval (Prompt Strict)
GPQA-Diamond (Pass@1)
SimpleQA (Correct)
FRAMES (Acc.)
AlpacaEval2.0 (LC-winrate)87.6
ArenaHard (GPT-4-1106)92.3
Code
LiveCodeBench (Pass@1-COT)65.9
Codeforces (Percentile)96.3
Codeforces (Rating)2029
SWE Verified (Resolved)49.2
Aider-Polyglot (Acc.)53.3
Math
AIME 2024 (Pass@1)79 .8
MATH-500 (Pass@1)97.3
CNMO 2024 (Pass@1)78.8
Chinese
CLUEWSC (EM)92.8
C-Eval (EM)91.8
C-SimpleQA (Correct)63.7

Tag summary

Content type

Model

Digest

sha256:828b7874c

Size

37.2 GB

Last updated

9 months ago

docker model pull ai/deepseek-r1-distill-llama:70B-Q4_0

This week's pulls

Pulls:

2,800

Last week