.. Local Inference Calculator documentation master file Local Inference Calculator =========================== Welcome to the **Local Inference Calculator** documentation - a capacity planning tool for local Large Language Model (LLM) inference. This tool allows you to quickly estimate which language models can run on specific GPUs, considering context size and model precision/quantization. .. toctree:: :maxdepth: 2 :caption: Contents: installation usage glossary api examples Overview ======== The *Local Inference Calculator* was developed to answer a simple question: **"With this GPU and this context size, which LLMs can I run?"** The tool considers: * **Model parameters**: Base memory required to store weights * **Overhead**: Additional memory for runtime, activations, etc. * **KV Cache**: Memory for attention cache during inference * **Precision/Quantization**: FP32, FP16, INT8, or INT4 Features -------- * Support for models from 7B to 180B parameters * Database with 38 GPUs (consumer + datacenter) * Conservative calculations to ensure real-world viability * Export results to JSON and CSV * Command-line interface (CLI) Other Languages =============== * **Português (Brazil)**: Run ``make html LANG=pt_BR`` and open ``docs/_build/pt_BR/html/index.html`` Indices and Tables ================== * :ref:`genindex` * :ref:`modindex` * :ref:`search`