Browser LLM Evaluation

This project explores how in-browser LLM inference behaves compared to cloud-based inference in terms of latency. The goal is to model different request incoming patterns and routing strategies between cloud and on-device models. For the prompts and evaluation of the accuracy the BooIQ dataset is used. The project is currently under development and does not aim to provide accurate LLM responses, but rather to measure performance differences. To run the cloud based inference, you need to bring your own OpenRouter API key.

Cloud (OpenRouter)

Model API Key OpenRouter

Linear Model for JSEQ (cloud)

Slope (ms/char)

Intercept (ms)

On-Device

Model (transformers.js)

Not loaded

Linear Model for JSEQ (on-device)

Slope (ms/char)

Intercept (ms)

Request Pattern & Routing

Dataset Load Pattern Route strategy Cloud probability (for probabilistic) Interarrival Time Lambda (for exponential arrival)

Live Log & Results

ID	Time	Route	Total Latency (ms)	Queue (ms)	Inference (ms)	Question	Answer	Correct