Browser LLM Evaluation

This project explores how in-browser LLM inference behaves compared to cloud-based inference in terms of latency. The goal is to model different request incoming patterns and routing strategies between cloud and on-device models. For the prompts and evaluation of the accuracy the BooIQ dataset is used. The project is currently under development and does not aim to provide accurate LLM responses, but rather to measure performance differences. To run the cloud based inference, you need to bring your own OpenRouter API key.

Cloud (OpenRouter)

On-Device

Not loaded

Request Pattern & Routing

Live Log & Results

ID Time Route Total Latency (ms) Queue (ms) Inference (ms) Question Answer Correct