Skip to content

LLaVA Bechmark evaluates image recognition capabilities of AI models with Ollama.


Notifications You must be signed in to change notification settings


Repository files navigation

llava-benchmark 🌋

Pytests codecov License: MIT



AI Benchmarking Tool

llava-benchmark is a general purpose benchmarking tool designed to evaluate AI image recognition capabilities of LLaVA models with Ollama.

Included Benchmarks

  • EvalRateBenchmark: Measure model image processing speed 📈
  • LicensePlateBenchmark: Extract license plate numbers from processed images 🚗

By running these benchmarks, you can quickly assess how well different models perform in recognizing license plates from images, and how quickly they can do so.


  • Python 3 🐍
  • Ollama 🦙
  • Packages: asciichartpy, os, pytest, shutil, subprocess, yaml

Cloning the Repository

Before running llava-benchmark, clone the repository to your local machine:

  1. Open a Terminal: On Windows, you can use Command Prompt or PowerShell. On macOS or Linux, you can use Terminal.

  2. Navigate to the Desired Directory: Use the cd command to navigate to the directory where you want to clone the repository.

  3. Clone the Repository: Run the following command to clone the repository:

git clone

Setting Up the Project

Follow these steps after cloning into the local llava-benchmark/ repo directory:

  1. Create a Virtual Environment:

    python -m venv .venv
  2. Activate the Virtual Environment:

    • On Windows:
    • On Linux or MacOS:
      source .venv/bin/activate
  3. Install the Dependencies:

    pip install -r requirements.txt


The tool uses YAML configuration file data/config.yml to specify the models, prompts, and images for the benchmark to use. Here's a brief explanation of each section:

  • models: This lists the models to be benchmarked
  • prompts: This lists the prompts to be used for each model
  • images: This lists the image files to be used in the benchmark
# data/config.yml
  - llava:latest
  - llava-llama3:8b
  - Read and return the license plate number and letters as text on a new line
  - 1.jpg
  - 2.jpg


When you execute, it performs a series of operations:

  1. Checks if Ollama is Installed: The script checks if the ollama binary is present on your system. If not, it will print an error message and exit.

  2. Checks if the model is Installed: For each model specified in the YAML configuration file, the script checks if the model is installed. If a model is not found, it will print a message and skip that model.

  3. Runs the Benchmark: For each model, prompt, and image specified in the YAML configuration file, the script runs the ollama benchmark and stores the evaluation rate and license plate number (if found).

  4. Prints the Average Evaluation Rate: After running the benchmark for all models, prompts, and images, the script prints the average evaluation rate for each model.

  5. Plots the Evaluation Rate Chart: The script plots an ASCII line chart of the evaluation rates for visual analysis.

Command Line

To run the script, navigate to the directory containing the script and type the following command:

$ python
🦙  MODEL: llava:latest 🦙
◽ Tokens/s: 57.84 📈
◽ Plate: PAX 44 🚗

◽ Tokens/s: 52.79 📈
◽ Plate: OPEC LOL 🚗

◽ Tokens/s: 59.87 📈
◽ Plate: F1 🚗

Average eval rate: 56.833 📊

                Y-axis: Evaluation Rates
                X-axis: Images
   76.07 ┤
   71.52 ┤             ╭─╮ ╭─
   66.96 ┤             │ │ │
   62.41 ┤ ╭─╮         │ │ │
   57.85 ┼─╯ ╰─╮ ╭─╮ ╭─╯ ╰─╯
   53.30 ┤     ╰─╯ ╰─╯


The source code for the project includes comprehensive documentation comments and docstrings. HTML documentation can be viewed on GitHub Pages: 📄

Please see file comments for additional information on how this project loads local module code.


Benchmark class

A general-purpose class for processing benchmark results. It's used as a base class for EvalRateBenchmark and LicensePlateBenchmark through inheritance.

EvalRateBenchmark class

The EvalRateBenchmark class is initialized with a method to process the evaluation rate from a benchmark result. It also initializes an empty list eval_rates to store evaluation rates.

LicensePlateBenchmark class

The LicensePlateBenchmark class is initialized with a method to process the license plate number from a benchmark result. It also initializes the list license_plate_numbers to store plate data associated with each model.


The project is designed to be easily extendable for other LLaVA image recognition tasks. This is done through the use of benchmark objects, which are instances of classes that define specific tasks.

In the main function of, instances of EvalRateBenchmark and LicensePlateBenchmark are executed after loading the YAML configuration file:

benchmarks = [EvalRateBenchmark(), LicensePlateBenchmark()]
llava_benchmark("data/config.yml", benchmarks)

To extend the script for other LLAVA tasks, you can define new benchmark classes that implement the code needed for those tasks. Then, you can create instances of those classes and add them to the benchmarks list.


The llava-benchmark module includes a suite of tests to ensure its functionality. These tests are written using the pytest framework and make use of fixtures and parameterization to test various aspects of the benchmarking process.

Running the Tests

To run the tests, navigate to the llava-benchmark/tests/ directory and execute the following command:

$ pytest
==================== test session starts =====================
collected 3 items                                                                                           .                  [ 33%] .        [ 66%] .    [100%] 

===================== 3 passed in 0.06s ===================== 


Contributions are welcome to the llava-benchmark project! If you're interested in contributing, here's how you can do it:

  1. Open an Issue: If you have a suggestion for an improvement, or you've found a bug, start by opening an issue in the project repository. Describe your suggestion or bug report in detail.

  2. Discussion: Once the issue is opened, maintainers of the project or other contributors will review the issue and discuss it.

  3. Implementation: If your suggestion is accepted, you or someone else can start working on implementing it.

We appreciate your help in making the LLAVA Benchmark project better!


Jordan Cassady is a Canadian Network Engineer with a decade of startup experience automating test systems aligned to company KPIs. If you’ve got a puzzle to solve, a codebase to conquer, or a moonshot idea, count me in. Let’s connect! ✌️



This project is licensed under the terms of the MIT license.


LLaVA Bechmark evaluates image recognition capabilities of AI models with Ollama.








No releases published
