Offline Text LLMs

02 Large Offline Text Models (Unimodal)

This chapter introduces practical offline text-only LLM options for local deployment. It compares representative open models, explains their strengths and tradeoffs, and helps you choose a model that matches Jetson resource constraints and your target application.

11.02-01 Meta AI: Llama 3.2

Introduction

Llama 3.2 is a major update in Meta's open model family. For this chapter, we focus on its text variants for offline inference workflows. Llama 3.2 is widely used because it provides a good balance between capability and deployment cost.

Model size

VariantModalityTypical fit
Llama 3.2 1BTextSmallest local text-only option
Llama 3.2 3BTextPractical local deployment on Jetson
Llama 3.2 11B VisionVision-languageImage understanding with higher compute demand
Llama 3.2 90B VisionVision-languageServer-class deployment rather than single-device use

Performance

Run Llama 3.2

Run the model with ollama run. If the model is not available locally, Ollama will download it first and then start inference.

bash
ollama run llama3.2:3b

Dialogue test

bash
who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-02 Aliyun: Qwen3

Introduction

Qwen3 is a new generation open model family from Alibaba Cloud. It covers a wide range of sizes, supports long context windows, and includes both dense and MoE variants. This flexibility makes Qwen3 suitable for edge testing, workstation use, and larger server deployment.

Model size

VariantFamily typeNotes
Qwen3 0.6BDenseSmallest local deployment option
Qwen3 1.7B / 4B / 8BDenseCommon edge and workstation sizes
Qwen3 14B / 32BDenseLarger local or server deployment
Qwen3 30B-A3BMoEMixture-of-experts model with lighter active parameters
Qwen3 235B-A22BMoELargest flagship model for server-scale deployment

Performance

Run Qwen3

Run the model with ollama run. If the model is missing locally, Ollama downloads it automatically.

bash
ollama run qwen3:8b

Dialogue test

bash
please tell me a story.

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-03 Microsoft: Phi-4-mini

Introduction

Phi-4-mini is a compact language model in Microsoft's Phi family. It is designed for efficient reasoning with relatively low resource requirements, which makes it a practical option for constrained edge environments.

Model size

VariantParametersNotes
Phi-4-mini~3.8BCompact reasoning-focused model with long-context support

Model performance

Run Phi-4-mini

Run the model with ollama run. If the model is not installed, Ollama pulls it first.

bash
ollama run phi4-mini:3.8b

Dialogue test

bash
who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-04 DeepSeek: DeepSeek-R1

Introduction

DeepSeek-R1 is an open reasoning-focused model family. Compared with models optimized mainly for fluent text generation, it emphasizes structured reasoning ability for tasks such as logic, mathematics, and coding.

Model size

VariantTypical scaleNotes
DeepSeek-R1 1.5B / 7B / 8BSmall distilled variantsEasiest to test locally
DeepSeek-R1 14B / 32BMedium distilled variantsBetter reasoning quality with higher memory demand
DeepSeek-R1 70B and aboveLarge variantsBetter suited to server-class hardware than a single Jetson

Model performance

Run DeepSeek-R1

Run the model with ollama run. If needed, Ollama will download the model automatically before execution.

bash
ollama run deepseek-r1

Dialogue test

bash
who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

References

Ollama

Llama 3.2

Qwen3

Phi-4-mini

DeepSeek-R1