v0.0.6
Ollama
This commit is contained in:
parent
434172755d
commit
65b43da03b
11
.env
11
.env
@ -1,4 +1,13 @@
|
|||||||
# Qwen/Qwen3.5-4B
|
# Qwen/Qwen3.5-4B
|
||||||
# deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
|
# deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
|
||||||
SILICONFLOW_API_KEY = "sk-sylilrjrtxlvecwhfusjkutclmppzuzhncfcfxtekxrzyjee"
|
SILICONFLOW_API_KEY = "sk-sylilrjrtxlvecwhfusjkutclmppzuzhncfcfxtekxrzyjee"
|
||||||
SILICONFLOW_BASE_URL = "https://api.siliconflow.cn/v1"
|
SILICONFLOW_BASE_URL = "https://api.siliconflow.cn/v1"
|
||||||
|
|
||||||
|
OLLAMA_API_KEY = "ollama"
|
||||||
|
OLLAMA_BASE_URL = "http://localhost:11434/v1"
|
||||||
|
|
||||||
|
MINIMAX_API_KEY = "sk-cp-wWkzvRP-BiQia-6izxvqgehEsHSz8v4_PtDJAuT3OI0s8QFcEOsxIHcQoZC2cVQTK3L09EUuu5HDArYMvKXFnf91jk8LuZ0tteS7-Wd4Lk2zDm8RqrKkrd4"
|
||||||
|
MINIMAX_BASE_URL = "https://api.minimaxi.com/v1"
|
||||||
|
|
||||||
|
BAILIAN_API_KEY = "sk-8c8bec7a613249dbbed08bc3affeef72"
|
||||||
|
BAILIAN_BASE_URL = "https://dashscope.aliyuncs.com/compatible-mode/v1"
|
||||||
54
README.md
54
README.md
@ -1,19 +1,21 @@
|
|||||||
# LangChain Learning
|
# LangChain Learning
|
||||||
|
|
||||||
[](https://github.com/your-repo/langchain-learning)
|
[](https://github.com/your-repo/langchain-learning)
|
||||||
[](https://www.python.org/)
|
[](https://www.python.org/)
|
||||||
[](https://www.langchain.com/)
|
[](https://www.langchain.com/)
|
||||||
|
|
||||||
> LangChain 框架学习项目,集成 SiliconFlow API
|
> LangChain 框架学习项目,集成 SiliconFlow & Ollama API
|
||||||
|
|
||||||
## 功能特性
|
## 功能特性
|
||||||
|
|
||||||
- **多 LLM 集成**:支持 OpenAI API、Silicon Flow 及 LangChain 抽象层
|
- **多 LLM 集成**:支持 OpenAI API、SiliconFlow、Ollama 及 LangChain 抽象层
|
||||||
- **流式响应**:实时流式输出,带来更好的使用体验
|
- **流式响应**:实时流式输出,带来更好的使用体验
|
||||||
- **Prompt 工程**:多种 Prompt 模板构建方式
|
- **Prompt 工程**:多种 Prompt 模板构建方式
|
||||||
- **输出解析**:支持 JSON 等格式解析
|
- **输出解析**:支持 JSON 等格式解析
|
||||||
- **Token 用量追踪**:轻松监控 API 调用消耗
|
- **Token 用量追踪**:轻松监控 API 调用消耗
|
||||||
- **内存管理**:实现对话历史持久化(ConversationBufferMemory, SummaryMemory)
|
- **内存管理**:实现对话历史持久化(ConversationBufferMemory, SummaryMemory)
|
||||||
|
- **Rich 终端界面**:支持 Markdown 渲染、多行输入等高级交互
|
||||||
|
- **模型测速工具**:测试模型的首字延迟 (TTFT) 和每秒生成速度 (TPS)
|
||||||
- **实战示例**:从基础到进阶的使用模式
|
- **实战示例**:从基础到进阶的使用模式
|
||||||
|
|
||||||
## 快速开始
|
## 快速开始
|
||||||
@ -21,18 +23,21 @@
|
|||||||
### 1. 安装依赖
|
### 1. 安装依赖
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
pip install langchain>=1.2.15 langchain-community>=0.4.1 langchain-siliconflow>=1.0.0 requests>=2.33.1
|
pip install langchain>=1.2.15 langchain-community>=0.4.1 langchain-siliconflow>=1.0.0 requests>=2.33.1 rich openai
|
||||||
```
|
```
|
||||||
|
|
||||||
***注意:*** *如果需要完整的记忆功能和更高级的模型,你可能需要安装额外的库。*
|
|
||||||
|
|
||||||
### 2. 配置环境变量
|
### 2. 配置环境变量
|
||||||
|
|
||||||
在项目根目录创建 `.env` 文件:
|
在项目根目录创建 `.env` 文件:
|
||||||
|
|
||||||
```env
|
```env
|
||||||
|
# SiliconFlow
|
||||||
SILICONFLOW_API_KEY=your_api_key_here
|
SILICONFLOW_API_KEY=your_api_key_here
|
||||||
SILICONFLOW_BASE_URL=https://api.siliconflow.cn/v1
|
SILICONFLOW_BASE_URL=https://api.siliconflow.cn/v1
|
||||||
|
|
||||||
|
# Ollama / 本地模型
|
||||||
|
OLLAMA_BASE_URL=http://localhost:11434/v1
|
||||||
|
OLLAMA_API_KEY=ollama
|
||||||
```
|
```
|
||||||
|
|
||||||
### 3. 运行示例
|
### 3. 运行示例
|
||||||
@ -70,8 +75,17 @@ SILICONFLOW_BASE_URL=https://api.siliconflow.cn/v1
|
|||||||
|
|
||||||
| 示例 | 命令 | 说明 |
|
| 示例 | 命令 | 说明 |
|
||||||
|------|------|------|
|
|------|------|------|
|
||||||
| 基础记忆 | `python memory/memory_desc.py` | 演示不同类型的 Memory 对象。|
|
| 基础记忆 | `python memory/memory_desc.py` | 演示不同类型的 Memory 对象 |
|
||||||
| 带内存聊天 | `python memory/with_memory_demo.py` | 在对话链中管理和利用聊天历史记录。 |
|
| 带内存聊天 | `python memory/memory_demo.py` | 使用 ConversationBufferMemory 进行多轮对话 |
|
||||||
|
| 无内存聊天 | `python memory/without_memory_demo.py` | 基础 LLM 聊天,无历史上下文 |
|
||||||
|
| Rich 界面聊天 | `python memory/without_memory_demo_rich.py` | 使用 Rich 美化的无内存聊天界面 |
|
||||||
|
|
||||||
|
**Ollama 示例**
|
||||||
|
|
||||||
|
| 示例 | 命令 | 说明 |
|
||||||
|
|------|------|------|
|
||||||
|
| Rich 流式聊天 | `python ollama/ollama_rich_chat.py` | 支持 Markdown 渲染、多行输入的流式聊天 |
|
||||||
|
| 模型测速工具 | `python ollama/tps_monitor.py` | 测量模型的 TTFT 和 TPS 性能 |
|
||||||
|
|
||||||
## 项目结构
|
## 项目结构
|
||||||
|
|
||||||
@ -86,14 +100,21 @@ langchain-learning/
|
|||||||
│ ├── prompt_demo.py # PromptTemplate 模板示例
|
│ ├── prompt_demo.py # PromptTemplate 模板示例
|
||||||
│ ├── fewshot_demo.py # Few-shot Learning 示例
|
│ ├── fewshot_demo.py # Few-shot Learning 示例
|
||||||
│ ├── promt_from_file.py # 从文件加载 Prompt
|
│ ├── promt_from_file.py # 从文件加载 Prompt
|
||||||
│ └── prompt_from_file.yaml # Prompt 模板文件
|
│ ├── prompt_from_file.yaml # Prompt YAML 模板文件
|
||||||
|
│ └── prompt_from_file.json # Prompt JSON 模板文件
|
||||||
├── parser/
|
├── parser/
|
||||||
│ └── json_parser_demo.py # JSON 输出解析示例
|
│ └── json_parser_demo.py # JSON 输出解析示例
|
||||||
├── token/
|
├── token/
|
||||||
│ └── token_demo.py # Token 用量追踪示例
|
│ └── token_demo.py # Token 用量追踪示例
|
||||||
├── memory/ # 记忆管理模块
|
├── memory/
|
||||||
│ ├── memory_desc.py # 演示 Memory 对象类型
|
│ ├── memory_desc.py # 演示 Memory 对象类型
|
||||||
│ └── with_memory_do.py # 演示使用带内存的聊天循环
|
│ ├── memory_demo.py # 带内存的对话链示例
|
||||||
|
│ ├── with_memory_demo.py # 手动管理内存的聊天示例
|
||||||
|
│ ├── without_memory_demo.py # 无内存的基础聊天
|
||||||
|
│ └── without_memory_demo_rich.py # Rich 界面的无内存聊天
|
||||||
|
├── ollama/
|
||||||
|
│ ├── ollama_rich_chat.py # Ollama 流式聊天(Rich 界面)
|
||||||
|
│ └── tps_monitor.py # 模型性能测速工具
|
||||||
├── main.py # 入口文件
|
├── main.py # 入口文件
|
||||||
├── pyproject.toml # 项目配置
|
├── pyproject.toml # 项目配置
|
||||||
└── README.md
|
└── README.md
|
||||||
@ -101,17 +122,24 @@ langchain-learning/
|
|||||||
|
|
||||||
## 可用模型
|
## 可用模型
|
||||||
|
|
||||||
|
**SiliconFlow**
|
||||||
- `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`
|
- `deepseek-ai/DeepSeek-R1-0528-Qwen3-8B`
|
||||||
- `Qwen/Qwen3.5-4B`
|
- `Qwen/Qwen3.5-4B`
|
||||||
|
- `Qwen/Qwen3-8B`
|
||||||
|
|
||||||
|
**Ollama (本地)**
|
||||||
|
- `gemma4:26b`
|
||||||
|
- `deepseek-v3.1:671b-cloud`
|
||||||
|
|
||||||
## 技术栈
|
## 技术栈
|
||||||
|
|
||||||
| 类别 | 技术 |
|
| 类别 | 技术 |
|
||||||
|------|------|
|
|------|------|
|
||||||
| 框架 | LangChain |
|
| 框架 | LangChain |
|
||||||
| LLM 提供商 | SiliconFlow |
|
| LLM 提供商 | SiliconFlow, Ollama |
|
||||||
|
| 终端美化 | Rich |
|
||||||
| 语言 | Python 3.11+ |
|
| 语言 | Python 3.11+ |
|
||||||
|
|
||||||
## 许可证
|
## 许可证
|
||||||
|
|
||||||
MIT License
|
MIT License
|
||||||
178
ollama/llm_benchmark_dashboard_v2.html
Normal file
178
ollama/llm_benchmark_dashboard_v2.html
Normal file
@ -0,0 +1,178 @@
|
|||||||
|
|
||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="zh-TW">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<title>LLM Benchmark Dashboard V2</title>
|
||||||
|
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
|
||||||
|
<style>
|
||||||
|
body { background-color: #f1f4f9; font-family: 'Segoe UI', system-ui, -apple-system, sans-serif; padding: 40px; }
|
||||||
|
.card { border: none; border-radius: 20px; box-shadow: 0 15px 35px rgba(0,0,0,0.1); overflow: hidden; }
|
||||||
|
.header-box { background: linear-gradient(135deg, #00d2ff 0%, #3a7bd5 100%); color: white; padding: 40px; text-align: center; }
|
||||||
|
.table thead th { background: #ffffff; color: #495057; border-bottom: 2px solid #dee2e6; text-transform: uppercase; font-size: 0.85rem; }
|
||||||
|
.progress { height: 10px; border-radius: 5px; background-color: #e9ecef; }
|
||||||
|
.progress-bar { background: linear-gradient(90deg, #3a7bd5, #00d2ff); }
|
||||||
|
.text-success { color: #28a745 !important; }
|
||||||
|
.text-danger { color: #dc3545 !important; }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
<div class="container-fluid">
|
||||||
|
<div class="card">
|
||||||
|
<div class="header-box">
|
||||||
|
<h1 class="display-4 font-weight-bold">LLM 推理效能大看板 V2</h1>
|
||||||
|
<p class="lead">多維度對比:本地部署 vs 雲端 API | 數據驅動決策</p>
|
||||||
|
<span class="badge badge-light">更新日期: 2026-04-14</span>
|
||||||
|
</div>
|
||||||
|
<div class="card-body p-0">
|
||||||
|
<div class="table-responsive">
|
||||||
|
<table class="table table-hover mb-0">
|
||||||
|
<thead class="text-center">
|
||||||
|
<tr>
|
||||||
|
<th class="text-left">模型名稱</th>
|
||||||
|
<th>首字延遲 (TTFT) ↓</th>
|
||||||
|
<th>生成速度 (TPS)</th>
|
||||||
|
<th>總耗時</th>
|
||||||
|
<th>總字數</th>
|
||||||
|
<th width="20%">速度視覺化</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody class="text-center">
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>local-gemma4:26b (128K)</strong><br><small class="text-muted">Local</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">87.99s</td>
|
||||||
|
<td class="text-primary">10.79</td>
|
||||||
|
<td>131.7s</td>
|
||||||
|
<td>717</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 10.79%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>local-gemma4:26b (32K)</strong><br><small class="text-muted">Local</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">78.67s</td>
|
||||||
|
<td class="text-primary">10.16</td>
|
||||||
|
<td>127.7s</td>
|
||||||
|
<td>722</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 10.16%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>local-gemma4:e4b</strong><br><small class="text-muted">Local</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">39.93s</td>
|
||||||
|
<td class="text-primary">12.34</td>
|
||||||
|
<td>110.4s</td>
|
||||||
|
<td>1338</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 12.34%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>ollama-deepseek-v3.1:671b-cloud</strong><br><small class="text-muted">Cloud (Ollama)</small></td>
|
||||||
|
<td class="text-success font-weight-bold">1.04s</td>
|
||||||
|
<td class="text-success">51.74</td>
|
||||||
|
<td>7.3s</td>
|
||||||
|
<td>479</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 51.74%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>ollama-gemma4:31b-cloud</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-success font-weight-bold">0.85s</td>
|
||||||
|
<td class="text-primary">31.79</td>
|
||||||
|
<td>14.2s</td>
|
||||||
|
<td>613</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 31.79%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>ollama-glm-5:cloud</strong><br><small class="text-muted">Cloud (Ollama)</small></td>
|
||||||
|
<td class="text-warning font-weight-bold">13.58s</td>
|
||||||
|
<td class="text-success">102.25</td>
|
||||||
|
<td>19.5s</td>
|
||||||
|
<td>779</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 100%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>ollama-kimi-k2.5:cloud</strong><br><small class="text-muted">Cloud (Ollama)</small></td>
|
||||||
|
<td class="text-warning font-weight-bold">15.91s</td>
|
||||||
|
<td class="text-primary">29.67</td>
|
||||||
|
<td>23.3s</td>
|
||||||
|
<td>505</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 29.67%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>ollama-minimax-m2.7:cloud</strong><br><small class="text-muted">Cloud (Ollama)</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">40.40s</td>
|
||||||
|
<td class="text-primary">3.75</td>
|
||||||
|
<td>40.9s</td>
|
||||||
|
<td>508</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 3.75%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>百煉-qwen3-max</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-success font-weight-bold">0.86s</td>
|
||||||
|
<td class="text-primary">6.18</td>
|
||||||
|
<td>14.1s</td>
|
||||||
|
<td>595</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 6.18%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>百煉-qwen3.5-35b-a3b</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">37.64s</td>
|
||||||
|
<td class="text-success">69.15</td>
|
||||||
|
<td>39.2s</td>
|
||||||
|
<td>543</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 69.15%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>百煉-qwen3.6-plus</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">77.35s</td>
|
||||||
|
<td class="text-primary">15.25</td>
|
||||||
|
<td>83.3s</td>
|
||||||
|
<td>507</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 15.25%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>百煉-qwen3.6-plus-v2</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-danger font-weight-bold">47.14s</td>
|
||||||
|
<td class="text-primary">15.58</td>
|
||||||
|
<td>53.1s</td>
|
||||||
|
<td>503</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 15.58%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>直連-MiniMax-M2.7</strong><br><small class="text-muted">Cloud (Direct)</small></td>
|
||||||
|
<td class="text-success font-weight-bold">1.19s</td>
|
||||||
|
<td class="text-primary">1.97</td>
|
||||||
|
<td>13.9s</td>
|
||||||
|
<td>842</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 1.97%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr>
|
||||||
|
<td><strong>硅基流動-DeepSeek-R1-Qwen-8B</strong><br><small class="text-muted">Cloud</small></td>
|
||||||
|
<td class="text-warning font-weight-bold">10.15s</td>
|
||||||
|
<td class="text-success">75.57</td>
|
||||||
|
<td>13.4s</td>
|
||||||
|
<td>398</td>
|
||||||
|
<td><div class="progress"><div class="progress-bar" style="width: 75.57%"></div></div></td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<footer class="mt-4 text-center text-muted">
|
||||||
|
<small>由 AI 工程總監生成 | 數據來源: 手動跑分測試</small>
|
||||||
|
</footer>
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
27
ollama/llm_benchmark_report_v2.md
Normal file
27
ollama/llm_benchmark_report_v2.md
Normal file
@ -0,0 +1,27 @@
|
|||||||
|
# LLM 推理性能基準測試報告 V2 (2026-04)
|
||||||
|
|
||||||
|
## 1. 全模型性能對比表
|
||||||
|
| 模型名稱 | 部署類型 | TTFT (秒) | TPS (tokens/s) | 總耗時 (秒) | 總字數 |
|
||||||
|
| :--- | :--- | :--- | :--- | :--- | :--- |
|
||||||
|
| local-gemma4:26b (128K) | Local | 87.99 | 10.79 | 131.74 | 717 |
|
||||||
|
| local-gemma4:26b (32K) | Local | 78.67 | 10.16 | 127.70 | 722 |
|
||||||
|
| local-gemma4:e4b | Local | 39.93 | 12.34 | 110.43 | 1338 |
|
||||||
|
| ollama-deepseek-v3.1:671b-cloud | Cloud (Ollama) | 1.04 | 51.74 | 7.30 | 479 |
|
||||||
|
| ollama-gemma4:31b-cloud | Cloud | 0.85 | 31.79 | 14.22 | 613 |
|
||||||
|
| ollama-glm-5:cloud | Cloud (Ollama) | 13.58 | 102.25 | 19.53 | 779 |
|
||||||
|
| ollama-kimi-k2.5:cloud | Cloud (Ollama) | 15.91 | 29.67 | 23.29 | 505 |
|
||||||
|
| ollama-minimax-m2.7:cloud | Cloud (Ollama) | 40.40 | 3.75 | 40.94 | 508 |
|
||||||
|
| 百煉-qwen3-max | Cloud | 0.86 | 6.18 | 14.13 | 595 |
|
||||||
|
| 百煉-qwen3.5-35b-a3b | Cloud | 37.64 | 69.15 | 39.22 | 543 |
|
||||||
|
| 百煉-qwen3.6-plus | Cloud | 77.35 | 15.25 | 83.32 | 507 |
|
||||||
|
| 百煉-qwen3.6-plus-v2 | Cloud | 47.14 | 15.58 | 53.11 | 503 |
|
||||||
|
| 直連-MiniMax-M2.7 | Cloud (Direct) | 1.19 | 1.97 | 13.90 | 842 |
|
||||||
|
| 硅基流動-DeepSeek-R1-Qwen-8B | Cloud | 10.15 | 75.57 | 13.40 | 398 |
|
||||||
|
|
||||||
|
## 2. 數據分析結論
|
||||||
|
- **雲端極速化**:`GLM-5` (102 t/s) 與 `DeepSeek v3.1` (51 t/s) 展現了極致的雲端吞吐能力。
|
||||||
|
- **本地 e4b 觀察**:即使是 4B 規模模型,在本地冷啟動仍需約 40 秒,說明啟動瓶頸(硬碟與 Ollama 服務初始化)與模型參數量的相關性較低,更受系統底層 IO 影響。
|
||||||
|
- **穩定性提升**:直連 API 的 TTFT 普遍穩定在 1 秒左右,相比之下,各類中轉或代理層(如部分百煉接口)波動較大。
|
||||||
|
|
||||||
|
---
|
||||||
|
*報告生成時間: 2026-04-14*
|
||||||
76
ollama/ollama_rich_chat.py
Normal file
76
ollama/ollama_rich_chat.py
Normal file
@ -0,0 +1,76 @@
|
|||||||
|
import os
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
from openai import OpenAI
|
||||||
|
from rich.console import Console
|
||||||
|
from rich.panel import Panel
|
||||||
|
from rich.markdown import Markdown
|
||||||
|
from rich.live import Live
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# 1. 初始化 Rich 控制台
|
||||||
|
console = Console()
|
||||||
|
|
||||||
|
# 2. 初始化 OpenAI 客戶端 (指向本地 Ollama 或 SiliconFlow)
|
||||||
|
client = OpenAI(
|
||||||
|
base_url=os.getenv("OLLAMA_BASE_URL"),
|
||||||
|
api_key=os.getenv("OLLAMA_API_KEY")
|
||||||
|
)
|
||||||
|
|
||||||
|
# 【核心架構 1】:維護一個對話歷史列表 (這就是 AI 的大腦記憶區)
|
||||||
|
# 確保一開始把人設 (System Prompt) 塞進去
|
||||||
|
chat_history = [
|
||||||
|
{"role": "system", "content": "你是一個精通 Python 的高級工程師,請保持專業且友善的語氣。"}
|
||||||
|
]
|
||||||
|
|
||||||
|
# 印出漂亮的歡迎畫面
|
||||||
|
console.print(Panel("✨ 歡迎使用流式 AI 助手!輸入 'quit' 退出。", border_style="green"))
|
||||||
|
|
||||||
|
# 進入「你問我答」的無限循環
|
||||||
|
while True:
|
||||||
|
# 替换原来的单行 input()
|
||||||
|
console.print("\n👤 [bold green]你 (支持多行输入,输入 '/send' 并回车发送,输入 'quit' 退出):[/bold green]")
|
||||||
|
lines = []
|
||||||
|
while True:
|
||||||
|
line = input()
|
||||||
|
if line.strip().lower() == 'quit':
|
||||||
|
console.print("[dim]👋 再见![/dim]")
|
||||||
|
exit() # 直接退出程序
|
||||||
|
if line.strip() == '/send':
|
||||||
|
break # 结束输入,跳出收集循环
|
||||||
|
lines.append(line)
|
||||||
|
|
||||||
|
# 将多行列表拼接成一个包含真正换行符的完整字符串
|
||||||
|
user_input = "\n".join(lines)
|
||||||
|
|
||||||
|
# 將使用者的新問題,追加進對話歷史中
|
||||||
|
chat_history.append({"role": "user", "content": user_input})
|
||||||
|
|
||||||
|
# 呼叫大模型,並開啟流式輸出 (stream=True)
|
||||||
|
# 注意這裡的 messages 傳入的是完整的 chat_history
|
||||||
|
response_stream = client.chat.completions.create(
|
||||||
|
model="gemma4:26b", # 替換成你實際運行的模型名稱
|
||||||
|
messages=chat_history,
|
||||||
|
stream=True
|
||||||
|
)
|
||||||
|
|
||||||
|
full_response = ""
|
||||||
|
|
||||||
|
# 【核心架構 3】:使用 Live 區塊進行 UI 即時渲染
|
||||||
|
with Live(Panel("思考中...", title="🤖 AI", border_style="cyan"), refresh_per_second=15) as live:
|
||||||
|
for chunk in response_stream:
|
||||||
|
content = chunk.choices[0].delta.content
|
||||||
|
if content is not None:
|
||||||
|
full_response += content
|
||||||
|
# 即時更新青色的對話框
|
||||||
|
live.update(
|
||||||
|
Panel(
|
||||||
|
Markdown(full_response),
|
||||||
|
title="🤖 AI",
|
||||||
|
border_style="cyan"
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
# 【核心架構 4】:將 AI 剛剛吐出來的完整回答,存回對話歷史中
|
||||||
|
# 這樣下一輪對話時,AI 才會「記得」它自己剛剛說過什麼
|
||||||
|
chat_history.append({"role": "assistant", "content": full_response})
|
||||||
58
ollama/tps_monitor.py
Normal file
58
ollama/tps_monitor.py
Normal file
@ -0,0 +1,58 @@
|
|||||||
|
import os
|
||||||
|
import time
|
||||||
|
from openai import OpenAI
|
||||||
|
from dotenv import load_dotenv
|
||||||
|
|
||||||
|
load_dotenv()
|
||||||
|
|
||||||
|
# 配置你的環境
|
||||||
|
client = OpenAI(
|
||||||
|
base_url=os.getenv("OLLAMA_BASE_URL", "你的URL"),
|
||||||
|
api_key=os.getenv("OLLAMA_API_KEY", "你的APIKEY")
|
||||||
|
)
|
||||||
|
|
||||||
|
def test_model_speed(model_name, prompt="請寫一篇關於未來AI發展的500字文章。"):
|
||||||
|
print(f"🚀 正在測試模型: {model_name} ...")
|
||||||
|
|
||||||
|
start_time = time.time()
|
||||||
|
first_token_time = None
|
||||||
|
tokens_count = 0
|
||||||
|
full_response = ""
|
||||||
|
|
||||||
|
try:
|
||||||
|
stream = client.chat.completions.create(
|
||||||
|
model=model_name,
|
||||||
|
messages=[{"role": "user", "content": prompt}],
|
||||||
|
stream=True
|
||||||
|
)
|
||||||
|
|
||||||
|
for chunk in stream:
|
||||||
|
if chunk.choices[0].delta.content:
|
||||||
|
if first_token_time is None:
|
||||||
|
# 紀錄首字時間 (TTFT)
|
||||||
|
first_token_time = time.time() - start_time
|
||||||
|
|
||||||
|
content = chunk.choices[0].delta.content
|
||||||
|
full_response += content
|
||||||
|
# 粗略計算法:中文大約 1 字 = 0.6~1 token,英文 1 詞 = 1.3 token
|
||||||
|
# 這裡直接用字數估算,或者如果你想更準確,可以計算 chunk 的數量
|
||||||
|
tokens_count += 1
|
||||||
|
|
||||||
|
total_time = time.time() - start_time
|
||||||
|
generation_time = total_time - first_token_time
|
||||||
|
tps = tokens_count / generation_time if generation_time > 0 else 0
|
||||||
|
|
||||||
|
print("-" * 30)
|
||||||
|
print(f"📊 測試結果:")
|
||||||
|
print(f"⏱️ 首字延遲 (TTFT): {first_token_time:.2f} 秒")
|
||||||
|
print(f"⚡ 生成速度 (TPS): {tps:.2f} tokens/s")
|
||||||
|
print(f"🕒 總耗時: {total_time:.2f} 秒")
|
||||||
|
print(f"📝 總字數: {len(full_response)} 字")
|
||||||
|
print("-" * 30)
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f"❌ 測試出錯: {e}")
|
||||||
|
|
||||||
|
if __name__ == "__main__":
|
||||||
|
# 替換成你實際想測的模型名稱
|
||||||
|
test_model_speed("deepseek-v3.1:671b-cloud")
|
||||||
Loading…
Reference in New Issue
Block a user