Multimodal-Edge-Comparator

Input Image ID: 01

Upload Image

Click or drop image here

—

Model Selector ID: 02

Active Model

QWEN3-VL · 2B

Qwen3-VL-2B-Instruct — dedicated vision-language model by Alibaba Cloud. Strong spatial grounding, OCR & instruction-following.

Task Config ID: 03

Task Category

Prompt Directive

Output Stream ID: 04

Streamed Result

Results will stream here...

View Grounding ID: 05

Point / Detect Overlay

Active for Point / Detect tasks.
Run inference to visualise.