Offline Text LLMs

02 Large Offline Text Models (Unimodal)

This chapter introduces practical offline text-only LLM options for local deployment. It compares representative open models, explains their strengths and tradeoffs, and helps you choose a model that matches Jetson resource constraints and your target application.

11.02-01 Meta AI: Llama 3.2

Introduction

Llama 3.2 is a major update in Meta's open model family. For this chapter, we focus on its text variants for offline inference workflows. Llama 3.2 is widely used because it provides a good balance between capability and deployment cost.

Model size

Variant	Modality	Typical fit
Llama 3.2 1B	Text	Smallest local text-only option
Llama 3.2 3B	Text	Practical local deployment on Jetson
Llama 3.2 11B Vision	Vision-language	Image understanding with higher compute demand
Llama 3.2 90B Vision	Vision-language	Server-class deployment rather than single-device use

Performance

Run Llama 3.2

Run the model with ollama run. If the model is not available locally, Ollama will download it first and then start inference.

bash

ollama run llama3.2:3b

Dialogue test

bash

who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-02 Aliyun: Qwen3

Introduction

Qwen3 is a new generation open model family from Alibaba Cloud. It covers a wide range of sizes, supports long context windows, and includes both dense and MoE variants. This flexibility makes Qwen3 suitable for edge testing, workstation use, and larger server deployment.

Model size

Variant	Family type	Notes
Qwen3 0.6B	Dense	Smallest local deployment option
Qwen3 1.7B / 4B / 8B	Dense	Common edge and workstation sizes
Qwen3 14B / 32B	Dense	Larger local or server deployment
Qwen3 30B-A3B	MoE	Mixture-of-experts model with lighter active parameters
Qwen3 235B-A22B	MoE	Largest flagship model for server-scale deployment

Performance

Run Qwen3

Run the model with ollama run. If the model is missing locally, Ollama downloads it automatically.

bash

ollama run qwen3:8b

Dialogue test

bash

please tell me a story.

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-03 Microsoft: Phi-4-mini

Introduction

Phi-4-mini is a compact language model in Microsoft's Phi family. It is designed for efficient reasoning with relatively low resource requirements, which makes it a practical option for constrained edge environments.

Model size

Variant	Parameters	Notes
Phi-4-mini	~3.8B	Compact reasoning-focused model with long-context support

Model performance

Run Phi-4-mini

Run the model with ollama run. If the model is not installed, Ollama pulls it first.

bash

ollama run phi4-mini:3.8b

Dialogue test

bash

who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

11.02-04 DeepSeek: DeepSeek-R1

Introduction

DeepSeek-R1 is an open reasoning-focused model family. Compared with models optimized mainly for fluent text generation, it emphasizes structured reasoning ability for tasks such as logic, mathematics, and coding.

Model size

Variant	Typical scale	Notes
DeepSeek-R1 1.5B / 7B / 8B	Small distilled variants	Easiest to test locally
DeepSeek-R1 14B / 32B	Medium distilled variants	Better reasoning quality with higher memory demand
DeepSeek-R1 70B and above	Large variants	Better suited to server-class hardware than a single Jetson

Model performance

Run DeepSeek-R1

Run the model with ollama run. If needed, Ollama will download the model automatically before execution.

bash

ollama run deepseek-r1

Dialogue test

bash

who are you?

Exit the dialogue

Use Ctrl + d to exit the conversation.

References