Installation
============

This guide will help you install LOOM-Eval and its dependencies.

System Requirements
-------------------

* **Python**: 3.10 or higher
* **CUDA**: 11.8 or higher (for GPU acceleration)
* **GPU**: NVIDIA GPU with 24GB+ VRAM (3090, A100, H20 recommended)
* **Conda**: Recommended for environment management

Basic Installation
------------------

Step 1: Create Environment
^^^^^^^^^^^^^^^^^^^^^^^^^^

::

    conda create -n loom python=3.10 -y
    conda activate loom

Step 2: Install LOOM-Eval
^^^^^^^^^^^^^^^^^^^^^^^^^^

From source::

    git clone https://github.com/loomeval/loom-eval.git
    cd loom-eval
    pip install -e .

Or via pip::

    pip install loom-eval

Step 3: Install Flash Attention
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Flash Attention is optional but recommended for better performance:

Visit `Flash Attention Releases <https://github.com/Dao-AILab/flash-attention/releases>`_ 
and install the appropriate version for your system.

Acceleration Methods
--------------------

LOOM-Eval supports 11 acceleration methods. Each requires specific installation steps.

General Acceleration Environment
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

For most acceleration methods (H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, CaM)::

    conda create -n acceleration python=3.10.16
    conda activate acceleration
    pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121
    pip install transformers==4.44.2
    pip install ninja
    pip install flash-attn==2.5.6 --no-build-isolation
    pip install -r requirements.txt

KIVI Installation
^^^^^^^^^^^^^^^^^

::

    cd models/acceleration/KIVI/quant
    pip install -e .

ThinK Installation
^^^^^^^^^^^^^^^^^^

::

    cd models/acceleration/ThinK/load_tiny_cuda_resource
    make install

FlexPrefill Installation
^^^^^^^^^^^^^^^^^^^^^^^^

FlexPrefill requires a special environment::

    conda create -n FlexPrefill python=3.10.16
    conda activate FlexPrefill
    pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121
    pip install triton==3.0.0 transformers==4.44.0 vllm==0.5.4
    pip install ninja
    pip install flash_attn==2.6.3 --no-build-isolation
    pip install nemo-toolkit[all]==1.21 --no-deps -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install Cython==3.0.11 -i https://pypi.tuna.tsinghua.edu.cn/simple
    pip install -r ./models/acceleration/flexprefill_requirements.txt
    
    # Install flashinfer
    pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
    
    # Install FlexPrefill
    cd models/acceleration/FlexPrefill
    pip install -e .

XAttention Installation
^^^^^^^^^^^^^^^^^^^^^^^

XAttention requires a special environment::

    conda create -yn xattn python=3.10
    conda activate xattn
    
    conda install -y git
    conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit
    conda install -y nvidia::cuda-cudart-dev
    conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia
    
    pip install transformers==4.46 accelerate sentencepiece minference datasets wandb zstandard matplotlib
    pip install huggingface_hub==0.23.2 torch torchaudio torchvision xformers
    pip install vllm==0.6.3.post1 vllm-flash-attn==2.6.1
    pip install tensor_parallel==2.0.0
    
    pip install ninja packaging
    pip install flash-attn==2.6.3 --no-build-isolation
    pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/
    
    # LongBench evaluation
    pip install seaborn rouge_score einops pandas
    
    # Install xAttention
    pip install -e .
    
    # Install Block Sparse Streaming Attention
    git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
    cd Block-Sparse-Attention
    python setup.py install
    cd ..
    
    export PYTHONPATH="$PYTHONPATH:$(pwd)"

Other Acceleration Methods
^^^^^^^^^^^^^^^^^^^^^^^^^^

For H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, and CaM, no additional installation 
is required beyond the general acceleration environment.

RAG Installation
----------------

Install RAG dependencies::

    pip install -r ./models/rag/requirements.txt

The following RAG methods are supported:

* BM25
* LlamaIndex  
* OpenAI
* Contriever
* LlamaRAG
* StreamingRAG


Next Steps
----------

* Follow the :doc:`quickstart` guide
* Explore available :doc:`benchmarks/index`
* Learn about :doc:`acceleration` methods
* Configure :doc:`rag` if needed