mirror of
https://github.com/vndangkhoa/Sys-Arc-Visl.git
synced 2026-04-05 01:17:57 +07:00
feat: optimize browser AI - enhanced vision descriptions, faster text generation
This commit is contained in:
parent
96670cb678
commit
8df9b4873e
5 changed files with 167 additions and 93 deletions
76
README.md
76
README.md
|
|
@ -1,13 +1,14 @@
|
||||||
# 🔮 KV-Graph
|
# 🔮 SysVis.AI - System Design Visualizer
|
||||||
|
|
||||||
**AI-Powered Diagram Editor** — Transform ideas into beautiful, interactive flowcharts using natural language, images, or Mermaid code.
|
**AI-Powered Diagram Editor** — Transform ideas into beautiful, interactive flowcharts using natural language, images, or Mermaid code.
|
||||||
|
|
||||||

|
[](https://hub.docker.com/r/vndangkhoa/sys-arc-visl)
|
||||||
|
[](https://github.com/vndangkhoa/kv-graph)
|
||||||
|
|
||||||
## ✨ Features
|
## ✨ Features
|
||||||
|
|
||||||
- **🤖 AI-Powered Generation** — Generates complex diagrams from text prompts using **Llama 3** (local browser) or Cloud AI.
|
- **🤖 AI-Powered Generation** — Generates complex diagrams from text prompts using **Qwen3-0.6B** (local browser) or Cloud AI.
|
||||||
- **👁️ Vision-to-Diagram** — **Florence-2** powered analysis converts screenshots and sketches into editable layouts entirely in the browser.
|
- **👁️ Vision-to-Diagram** — **ViT-GPT2** powered analysis converts screenshots and sketches into editable layouts entirely in the browser.
|
||||||
- **🖌️ Unified Toolkit** — A clean, consolidated toolbar for critical actions (Zoom, Layout, Pan/Select) keeps the canvas "void-like".
|
- **🖌️ Unified Toolkit** — A clean, consolidated toolbar for critical actions (Zoom, Layout, Pan/Select) keeps the canvas "void-like".
|
||||||
- **🗺️ MiniMap Overlay** — Navigational aid for large diagrams, unobtrusively positioned in the bottom-right.
|
- **🗺️ MiniMap Overlay** — Navigational aid for large diagrams, unobtrusively positioned in the bottom-right.
|
||||||
- **💡 Smart Guidance** — Context-aware tips and rotation suggestions when looking at empty space.
|
- **💡 Smart Guidance** — Context-aware tips and rotation suggestions when looking at empty space.
|
||||||
|
|
@ -18,17 +19,19 @@
|
||||||
|
|
||||||
## 🚀 Quick Start
|
## 🚀 Quick Start
|
||||||
|
|
||||||
### Prerequisites
|
### 🐳 Docker (Recommended)
|
||||||
|
|
||||||
- Node.js 18+
|
```bash
|
||||||
- npm or pnpm
|
docker run -d -p 8338:80 vndangkhoa/sys-arc-visl:latest
|
||||||
- WebGPU-compatible browser (Chrome 113+, Edge) for In-Browser AI
|
```
|
||||||
|
|
||||||
### Installation
|
Open [http://localhost:8338](http://localhost:8338) in your browser.
|
||||||
|
|
||||||
|
### 💻 Local Development
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Clone the repository
|
# Clone the repository
|
||||||
git clone https://github.com/your-username/kv-graph.git
|
git clone https://github.com/vndangkhoa/kv-graph.git
|
||||||
cd kv-graph
|
cd kv-graph
|
||||||
|
|
||||||
# Install dependencies
|
# Install dependencies
|
||||||
|
|
@ -40,6 +43,12 @@ npm run dev
|
||||||
|
|
||||||
Open [http://localhost:5173](http://localhost:5173) in your browser.
|
Open [http://localhost:5173](http://localhost:5173) in your browser.
|
||||||
|
|
||||||
|
### Prerequisites
|
||||||
|
|
||||||
|
- Node.js 18+
|
||||||
|
- npm or pnpm
|
||||||
|
- WebGPU-compatible browser (Chrome 113+, Edge) for In-Browser AI
|
||||||
|
|
||||||
## 🧠 AI Configuration
|
## 🧠 AI Configuration
|
||||||
|
|
||||||
KV-Graph supports a **Local-First** AI architecture, running powerful models directly in your browser via WebGPU.
|
KV-Graph supports a **Local-First** AI architecture, running powerful models directly in your browser via WebGPU.
|
||||||
|
|
@ -47,34 +56,24 @@ KV-Graph supports a **Local-First** AI architecture, running powerful models dir
|
||||||
### 🌐 In-Browser Mode (Privacy First)
|
### 🌐 In-Browser Mode (Privacy First)
|
||||||
Runs entirely on your device. No data leaves your machine.
|
Runs entirely on your device. No data leaves your machine.
|
||||||
|
|
||||||
| Capability | Model | Technology |
|
| Capability | Model | Size | Speed |
|
||||||
|------------|-------|------------|
|
|------------|-------|------|-------|
|
||||||
| **Text Generation** | Llama-3-8B-Instruct | WebLLM (WebGPU) |
|
| **Text Generation** | Qwen3-0.6B | ~500MB | ~30-60s |
|
||||||
| **Vision Analysis** | Florence-2-base | Transformers.js (ONNX) |
|
| **Vision Analysis** | ViT-GPT2 | ~300MB | ~8-10s |
|
||||||
|
|
||||||
*Note: First-time load requires downloading model weights (~4GB total).*
|
*Note: First-time load requires downloading model weights.*
|
||||||
|
|
||||||
### ☁️ Cloud Mode (Optional)
|
### ☁️ Cloud Mode (Fast & Powerful)
|
||||||
Connect to external providers for enhanced capabilities.
|
Connect to external providers for enhanced capabilities.
|
||||||
|
|
||||||
| Provider | Model | API Key Required |
|
| Provider | Model | API Key Required |
|
||||||
|----------|-------|------------------|
|
|----------|-------|------------------|
|
||||||
|
| **Google Gemini** | Gemini 2.0 Flash | ✅ (Free tier available) |
|
||||||
| OpenAI | GPT-4 Vision | ✅ |
|
| OpenAI | GPT-4 Vision | ✅ |
|
||||||
| Google Gemini | Gemini Pro Vision | ✅ |
|
|
||||||
| Ollama | Custom | Local URL |
|
| Ollama | Custom | Local URL |
|
||||||
|
|
||||||
Configure your AI provider in **Settings** (⚙️ icon).
|
Configure your AI provider in **Settings** (⚙️ icon).
|
||||||
|
|
||||||
## 🐳 Docker Support
|
|
||||||
|
|
||||||
Run KV-Graph locally using Docker:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
docker-compose up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
Open [http://localhost:8338](http://localhost:8338) in your browser.
|
|
||||||
|
|
||||||
## 📁 Project Structure
|
## 📁 Project Structure
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
@ -88,17 +87,15 @@ kv-graph/
|
||||||
│ ├── hooks/ # Custom React hooks
|
│ ├── hooks/ # Custom React hooks
|
||||||
│ ├── lib/ # Core Logic
|
│ ├── lib/ # Core Logic
|
||||||
│ │ ├── aiService.ts # AI Orchestrator
|
│ │ ├── aiService.ts # AI Orchestrator
|
||||||
│ │ ├── webLlmService.ts # Local LLM Engine
|
│ │ ├── webLlmService.ts # Local LLM Engine (Qwen3)
|
||||||
│ │ ├── visionService.ts # Local Vision Engine
|
│ │ ├── visionService.ts # Local Vision Engine (ViT-GPT2)
|
||||||
│ │ └── layoutEngine.ts # Dagre Auto-Layout
|
│ │ └── layoutEngine.ts # Dagre Auto-Layout
|
||||||
│ ├── pages/ # Route pages
|
│ ├── pages/ # Route pages
|
||||||
│ ├── store/ # Zustand Global State
|
│ ├── store/ # Zustand Global State
|
||||||
│ │ ├── flowStore.ts # Combined Flow State
|
|
||||||
│ │ └── settingsStore.ts # AI & Theme Config
|
|
||||||
│ ├── styles/ # Tailwind Global Styles
|
|
||||||
│ └── types/ # TypeScript interfaces
|
│ └── types/ # TypeScript interfaces
|
||||||
├── public/ # Static assets & Models
|
├── public/ # Static assets
|
||||||
└── Configuration files
|
├── Dockerfile # Docker build
|
||||||
|
└── docker-compose.yml # Docker Compose
|
||||||
```
|
```
|
||||||
|
|
||||||
## 🛠️ Tech Stack
|
## 🛠️ Tech Stack
|
||||||
|
|
@ -139,9 +136,11 @@ kv-graph/
|
||||||
## 🗺️ Roadmap
|
## 🗺️ Roadmap
|
||||||
|
|
||||||
- [x] Undo/Redo history
|
- [x] Undo/Redo history
|
||||||
- [x] API for programmatic generation
|
- [x] Browser-based AI (WebLLM + Transformers.js)
|
||||||
- [x] Plugin system (Foundation)
|
- [x] Vision-to-Diagram (ViT-GPT2)
|
||||||
|
- [x] Cloud AI integration (Gemini, OpenAI)
|
||||||
|
- [ ] Collaborative editing
|
||||||
|
- [ ] Plugin system
|
||||||
|
|
||||||
## 📄 License
|
## 📄 License
|
||||||
|
|
||||||
|
|
@ -151,7 +150,8 @@ MIT License — see [LICENSE](./LICENSE) for details.
|
||||||
|
|
||||||
- [React Flow](https://reactflow.dev/) — Powerful diagram library
|
- [React Flow](https://reactflow.dev/) — Powerful diagram library
|
||||||
- [Mermaid.js](https://mermaid.js.org/) — Diagram syntax inspiration
|
- [Mermaid.js](https://mermaid.js.org/) — Diagram syntax inspiration
|
||||||
- [Ollama](https://ollama.ai/) — Local AI inference
|
- [WebLLM](https://webllm.mlc.ai/) — Browser-based LLM inference
|
||||||
|
- [Transformers.js](https://huggingface.co/docs/transformers.js/) — Browser ML models
|
||||||
- [Tailwind CSS](https://tailwindcss.com/) — Utility-first styling
|
- [Tailwind CSS](https://tailwindcss.com/) — Utility-first styling
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
|
||||||
|
|
@ -277,7 +277,7 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<h4 className="text-[11px] font-bold text-violet-200">Neural Engine (Text)</h4>
|
<h4 className="text-[11px] font-bold text-violet-200">Neural Engine (Text)</h4>
|
||||||
<p className="text-[9px] text-violet-400/60">Llama-3.2-1B-Instruct-q4f32_1</p>
|
<p className="text-[9px] text-violet-400/60">Qwen3-0.6B (Fast!)</p>
|
||||||
</div>
|
</div>
|
||||||
{isBrowserReady && <div className="ml-auto w-2 h-2 rounded-full bg-green-500 shadow-[0_0_8px_rgba(34,197,94,0.6)]" />}
|
{isBrowserReady && <div className="ml-auto w-2 h-2 rounded-full bg-green-500 shadow-[0_0_8px_rgba(34,197,94,0.6)]" />}
|
||||||
</div>
|
</div>
|
||||||
|
|
@ -321,7 +321,7 @@ export function SettingsModal({ isOpen, onClose }: SettingsModalProps) {
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<h4 className="text-[11px] font-bold text-violet-200">Vision Engine (Image)</h4>
|
<h4 className="text-[11px] font-bold text-violet-200">Vision Engine (Image)</h4>
|
||||||
<p className="text-[9px] text-violet-400/60">Florence-2-base (~200MB)</p>
|
<p className="text-[9px] text-violet-400/60">ViT-GPT2 (~300MB, Fast!)</p>
|
||||||
</div>
|
</div>
|
||||||
{isVisionReady && <div className="ml-auto w-2 h-2 rounded-full bg-green-500 shadow-[0_0_8px_rgba(34,197,94,0.6)]" />}
|
{isVisionReady && <div className="ml-auto w-2 h-2 rounded-full bg-green-500 shadow-[0_0_8px_rgba(34,197,94,0.6)]" />}
|
||||||
</div>
|
</div>
|
||||||
|
|
|
||||||
|
|
@ -241,7 +241,31 @@ async function callBrowserAI(
|
||||||
messages: any[],
|
messages: any[],
|
||||||
customSystemPrompt?: string
|
customSystemPrompt?: string
|
||||||
): Promise<AIResponse> {
|
): Promise<AIResponse> {
|
||||||
const activePrompt = customSystemPrompt || SYSTEM_PROMPT;
|
// Simplified prompt for browser AI - just ask for Mermaid code directly
|
||||||
|
const BROWSER_AI_PROMPT = `You are a system design diagram generator. Generate ONLY Mermaid flowchart code.
|
||||||
|
|
||||||
|
RULES:
|
||||||
|
- Start with "graph TD" or "graph LR"
|
||||||
|
- Use simple node IDs like A, B, C
|
||||||
|
- Use subgraph for grouping
|
||||||
|
- NO explanations, NO markdown, NO JSON - ONLY the mermaid code
|
||||||
|
|
||||||
|
Example output:
|
||||||
|
graph TD
|
||||||
|
subgraph Frontend
|
||||||
|
A[Web App]
|
||||||
|
B[Mobile App]
|
||||||
|
end
|
||||||
|
subgraph Backend
|
||||||
|
C[API Server]
|
||||||
|
D[(Database)]
|
||||||
|
end
|
||||||
|
A --> C
|
||||||
|
B --> C
|
||||||
|
C --> D
|
||||||
|
|
||||||
|
Now generate mermaid code for the user's request. Output ONLY the mermaid code, nothing else.`;
|
||||||
|
|
||||||
try {
|
try {
|
||||||
if (!webLlmService.getStatus().isReady) {
|
if (!webLlmService.getStatus().isReady) {
|
||||||
throw new Error('Browser model is not loaded. Please initialize it in Settings.');
|
throw new Error('Browser model is not loaded. Please initialize it in Settings.');
|
||||||
|
|
@ -261,6 +285,7 @@ async function callBrowserAI(
|
||||||
// Analyze the first image
|
// Analyze the first image
|
||||||
// Assuming msg.images[0] is base64 string
|
// Assuming msg.images[0] is base64 string
|
||||||
const imageDescription = await visionService.analyzeImage(msg.images[0]);
|
const imageDescription = await visionService.analyzeImage(msg.images[0]);
|
||||||
|
console.log('Vision description:', imageDescription);
|
||||||
|
|
||||||
// Augment the prompt with the description
|
// Augment the prompt with the description
|
||||||
content = `${content}\n\n[VISUAL CONTEXT FROM IMAGE]:\n${imageDescription}\n\n(Use this visual description to generate the Mermaid code.)`;
|
content = `${content}\n\n[VISUAL CONTEXT FROM IMAGE]:\n${imageDescription}\n\n(Use this visual description to generate the Mermaid code.)`;
|
||||||
|
|
@ -273,32 +298,67 @@ async function callBrowserAI(
|
||||||
}
|
}
|
||||||
|
|
||||||
const fullMessages = [
|
const fullMessages = [
|
||||||
{ role: 'system' as const, content: activePrompt },
|
{ role: 'system' as const, content: BROWSER_AI_PROMPT },
|
||||||
...processedMessages
|
...processedMessages
|
||||||
];
|
];
|
||||||
|
|
||||||
|
console.log('Starting WebLLM text generation...');
|
||||||
const generator = await webLlmService.chat(fullMessages);
|
const generator = await webLlmService.chat(fullMessages);
|
||||||
let fullContent = "";
|
let fullContent = "";
|
||||||
|
|
||||||
for await (const chunk of generator) {
|
for await (const chunk of generator) {
|
||||||
fullContent += chunk;
|
fullContent += chunk;
|
||||||
}
|
}
|
||||||
|
console.log('WebLLM raw output:', fullContent.substring(0, 500)); // First 500 chars
|
||||||
|
|
||||||
// Parse JSON
|
// Clean up the output - Browser AI outputs Mermaid code directly
|
||||||
let cleanContent = fullContent.trim();
|
let cleanContent = fullContent.trim();
|
||||||
|
|
||||||
|
// Strip Qwen3's <think> reasoning tags if present
|
||||||
|
cleanContent = cleanContent.replace(/<think>[\s\S]*?<\/think>/g, '').trim();
|
||||||
|
// Also remove incomplete <think> tags (if model was cut off)
|
||||||
|
cleanContent = cleanContent.replace(/<think>[\s\S]*$/g, '').trim();
|
||||||
|
|
||||||
|
// Remove markdown code blocks if present
|
||||||
if (cleanContent.startsWith('```')) {
|
if (cleanContent.startsWith('```')) {
|
||||||
cleanContent = cleanContent.replace(/^```(?:json)?\s*\n?/, '').replace(/\n?```\s*$/, '');
|
cleanContent = cleanContent.replace(/^```(?:mermaid|json)?\s*\n?/, '').replace(/\n?```\s*$/, '');
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Try to extract mermaid code - look for "graph" pattern
|
||||||
|
const mermaidMatch = cleanContent.match(/graph\s+(?:TB|TD|LR|RL|BT)[\s\S]*/);
|
||||||
|
if (mermaidMatch) {
|
||||||
|
console.log('Extracted mermaid code successfully');
|
||||||
|
return {
|
||||||
|
success: true,
|
||||||
|
mermaidCode: mermaidMatch[0].trim()
|
||||||
|
};
|
||||||
|
}
|
||||||
|
|
||||||
|
// Fallback: try to parse as JSON if it looks like JSON
|
||||||
|
if (cleanContent.startsWith('{')) {
|
||||||
|
try {
|
||||||
const parsed = JSON.parse(cleanContent);
|
const parsed = JSON.parse(cleanContent);
|
||||||
|
console.log('Parsed as JSON:', Object.keys(parsed));
|
||||||
return {
|
return {
|
||||||
success: true,
|
success: true,
|
||||||
mermaidCode: parsed.mermaidCode,
|
mermaidCode: parsed.mermaidCode,
|
||||||
metadata: parsed.metadata,
|
metadata: parsed.metadata,
|
||||||
analysis: parsed.analysis // Forward analysis field if present
|
analysis: parsed.analysis
|
||||||
|
};
|
||||||
|
} catch (e) {
|
||||||
|
// Not valid JSON, continue
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
// If we get here, we couldn't extract mermaid code
|
||||||
|
console.error('Could not extract mermaid code from:', cleanContent.substring(0, 500));
|
||||||
|
return {
|
||||||
|
success: false,
|
||||||
|
error: 'Could not generate valid Mermaid diagram code'
|
||||||
};
|
};
|
||||||
|
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
|
console.error('Browser AI error:', error);
|
||||||
return {
|
return {
|
||||||
success: false,
|
success: false,
|
||||||
error: error instanceof Error ? error.message : 'Browser model logic failed'
|
error: error instanceof Error ? error.message : 'Browser model logic failed'
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,5 @@
|
||||||
|
|
||||||
import { env, AutoProcessor, AutoModel, RawImage } from '@huggingface/transformers';
|
import { env, pipeline, RawImage } from '@huggingface/transformers';
|
||||||
|
|
||||||
// Configure transformers.js
|
// Configure transformers.js
|
||||||
env.allowLocalModels = false;
|
env.allowLocalModels = false;
|
||||||
|
|
@ -11,13 +11,12 @@ export type VisionProgress = {
|
||||||
file?: string;
|
file?: string;
|
||||||
};
|
};
|
||||||
|
|
||||||
// We use Florence-2-base for a good balance of speed and accuracy (~200MB - 400MB)
|
// ViT-GPT2 is the ONLY working model for browser-based image captioning
|
||||||
// 'onnx-community/Florence-2-base-ft' is the modern standard for Transformers.js v3.
|
// Other models (BLIP, Florence-2, LLaVA) are not supported by transformers.js
|
||||||
const MODEL_ID = 'onnx-community/Florence-2-base-ft';
|
const MODEL_ID = 'Xenova/vit-gpt2-image-captioning';
|
||||||
|
|
||||||
export class VisionService {
|
export class VisionService {
|
||||||
private model: any = null;
|
private captioner: any = null;
|
||||||
private processor: any = null;
|
|
||||||
private isLoading = false;
|
private isLoading = false;
|
||||||
private isReady = false;
|
private isReady = false;
|
||||||
|
|
||||||
|
|
@ -46,13 +45,10 @@ export class VisionService {
|
||||||
|
|
||||||
try {
|
try {
|
||||||
console.log('Loading Vision Model...');
|
console.log('Loading Vision Model...');
|
||||||
if (onProgress) onProgress({ status: 'Loading Processor...' });
|
if (onProgress) onProgress({ status: 'Loading Vision Model...' });
|
||||||
|
|
||||||
this.processor = await AutoProcessor.from_pretrained(MODEL_ID);
|
// Use the pipeline API - much simpler and faster
|
||||||
|
this.captioner = await pipeline('image-to-text', MODEL_ID, {
|
||||||
if (onProgress) onProgress({ status: 'Loading Model (this may take a while)...' });
|
|
||||||
|
|
||||||
this.model = await AutoModel.from_pretrained(MODEL_ID, {
|
|
||||||
progress_callback: (progress: any) => {
|
progress_callback: (progress: any) => {
|
||||||
if (onProgress && progress.status === 'progress') {
|
if (onProgress && progress.status === 'progress') {
|
||||||
onProgress({
|
onProgress({
|
||||||
|
|
@ -75,8 +71,8 @@ export class VisionService {
|
||||||
}
|
}
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* Analyzes an image (Base64 or URL) and returns a detailed description.
|
* Analyzes an image (Base64 or URL) and returns a description.
|
||||||
* We use the '<MORE_DETAILED_CAPTION>' task for Florence-2.
|
* Uses vit-gpt2 for fast captioning.
|
||||||
*/
|
*/
|
||||||
async analyzeImage(imageBase64: string): Promise<string> {
|
async analyzeImage(imageBase64: string): Promise<string> {
|
||||||
if (!this.isReady) {
|
if (!this.isReady) {
|
||||||
|
|
@ -87,39 +83,47 @@ export class VisionService {
|
||||||
// Handle data URL prefix if present
|
// Handle data URL prefix if present
|
||||||
const cleanBase64 = imageBase64.includes(',') ? imageBase64 : `data:image/png;base64,${imageBase64}`;
|
const cleanBase64 = imageBase64.includes(',') ? imageBase64 : `data:image/png;base64,${imageBase64}`;
|
||||||
|
|
||||||
const image = await RawImage.fromURL(cleanBase64);
|
let image = await RawImage.fromURL(cleanBase64);
|
||||||
|
|
||||||
// Task: Detailed Captioning is best for understanding diagrams
|
// Keep higher resolution for better detail detection
|
||||||
const task = '<MORE_DETAILED_CAPTION>';
|
if (image.width > 512 || image.height > 512) {
|
||||||
|
image = await image.resize(512, 512);
|
||||||
|
}
|
||||||
|
|
||||||
// Construct prompts using the processor's method (required for Florence-2)
|
console.log('Starting enhanced image analysis...');
|
||||||
const prompts = this.processor.construct_prompts(task);
|
const startTime = performance.now();
|
||||||
|
|
||||||
// Pre-process the image and text inputs
|
// Run multiple passes for more comprehensive description
|
||||||
// Processor expects batch input, so wrap single image in array
|
const results = await Promise.all([
|
||||||
if (!this.processor) throw new Error('Processor is undefined');
|
// Pass 1: Detailed description
|
||||||
const inputs = await this.processor([image], prompts);
|
this.captioner(image, {
|
||||||
|
max_new_tokens: 150,
|
||||||
|
num_beams: 4, // Beam search for better quality
|
||||||
|
}),
|
||||||
|
// Pass 2: Alternative perspective
|
||||||
|
this.captioner(image, {
|
||||||
|
max_new_tokens: 100,
|
||||||
|
do_sample: true,
|
||||||
|
temperature: 0.7,
|
||||||
|
}),
|
||||||
|
]);
|
||||||
|
|
||||||
const generatedIds = await this.model.generate({
|
const endTime = performance.now();
|
||||||
...inputs,
|
console.log(`Vision analysis completed in ${((endTime - startTime) / 1000).toFixed(1)}s`);
|
||||||
max_new_tokens: 512, // Sufficient for a description
|
|
||||||
});
|
|
||||||
|
|
||||||
const generatedText = this.processor.batch_decode(generatedIds, {
|
// Combine descriptions for richer output
|
||||||
skip_special_tokens: false,
|
const caption1 = results[0]?.[0]?.generated_text || '';
|
||||||
})[0];
|
const caption2 = results[1]?.[0]?.generated_text || '';
|
||||||
|
|
||||||
// Post-process to extract the caption
|
// If both are similar, use just one; otherwise combine
|
||||||
// Florence-2 output format usually includes the task token
|
if (caption1.toLowerCase().includes(caption2.toLowerCase().substring(0, 20)) ||
|
||||||
const parsedAnswer = this.processor.post_process_generation(
|
caption2.toLowerCase().includes(caption1.toLowerCase().substring(0, 20))) {
|
||||||
generatedText,
|
return caption1.length > caption2.length ? caption1 : caption2;
|
||||||
task,
|
}
|
||||||
image.size
|
|
||||||
);
|
|
||||||
|
|
||||||
// Access the dictionary result. For CAPTION tasks, it's usually under '<MORE_DETAILED_CAPTION>' or similar key
|
const combined = `${caption1}. Additionally: ${caption2}`;
|
||||||
// Ideally post_process_generation returns { '<MORE_DETAILED_CAPTION>': "Description..." }
|
console.log('Enhanced description:', combined);
|
||||||
return parsedAnswer['<MORE_DETAILED_CAPTION>'] || typeof parsedAnswer === 'string' ? parsedAnswer : JSON.stringify(parsedAnswer);
|
return combined;
|
||||||
|
|
||||||
} catch (error) {
|
} catch (error) {
|
||||||
console.error('Vision analysis failed:', error);
|
console.error('Vision analysis failed:', error);
|
||||||
|
|
|
||||||
|
|
@ -7,8 +7,8 @@ export type WebLlmProgress = {
|
||||||
timeElapsed: number;
|
timeElapsed: number;
|
||||||
};
|
};
|
||||||
|
|
||||||
// Latest "Tiny" model with high instruction adherence
|
// Qwen3-0.6B is fast and works well with simple Mermaid generation prompts
|
||||||
const DEFAULT_MODEL = "Llama-3.2-1B-Instruct-q4f32_1-MLC";
|
const DEFAULT_MODEL = "Qwen3-0.6B-q4f32_1-MLC";
|
||||||
|
|
||||||
export class WebLlmService {
|
export class WebLlmService {
|
||||||
private engine: MLCEngine | null = null;
|
private engine: MLCEngine | null = null;
|
||||||
|
|
@ -73,21 +73,31 @@ export class WebLlmService {
|
||||||
throw new Error("WebLLM Engine not initialized. Please load the model first.");
|
throw new Error("WebLLM Engine not initialized. Please load the model first.");
|
||||||
}
|
}
|
||||||
|
|
||||||
|
console.log('WebLLM: Creating completion...');
|
||||||
|
const startTime = performance.now();
|
||||||
const completion = await this.engine.chat.completions.create({
|
const completion = await this.engine.chat.completions.create({
|
||||||
messages,
|
messages,
|
||||||
stream: true,
|
stream: true,
|
||||||
temperature: 0.1, // Low temp for code/logic generation
|
temperature: 0, // Deterministic output for code
|
||||||
max_tokens: 4096, // Sufficient for diagrams
|
max_tokens: 512, // Mermaid code is compact
|
||||||
|
top_p: 0.9, // Faster sampling
|
||||||
|
repetition_penalty: 1.1, // Avoid repetitive output
|
||||||
});
|
});
|
||||||
|
console.log('WebLLM: Completion created, streaming...');
|
||||||
|
|
||||||
// Create a generator to stream chunks easily
|
// Create a generator to stream chunks easily
|
||||||
async function* streamGenerator() {
|
async function* streamGenerator() {
|
||||||
|
let tokenCount = 0;
|
||||||
for await (const chunk of completion) {
|
for await (const chunk of completion) {
|
||||||
const content = chunk.choices[0]?.delta?.content || "";
|
const content = chunk.choices[0]?.delta?.content || "";
|
||||||
if (content) {
|
if (content) {
|
||||||
|
tokenCount++;
|
||||||
|
if (tokenCount === 1) console.log('WebLLM: First token received');
|
||||||
yield content;
|
yield content;
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
const endTime = performance.now();
|
||||||
|
console.log(`WebLLM: Generation complete (${tokenCount} tokens, ${((endTime - startTime) / 1000).toFixed(1)}s)`);
|
||||||
}
|
}
|
||||||
|
|
||||||
return streamGenerator();
|
return streamGenerator();
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue