Installation

This guide will help you install LOOM-Eval and its dependencies.

System Requirements

Python: 3.10 or higher
CUDA: 11.8 or higher (for GPU acceleration)
GPU: NVIDIA GPU with 24GB+ VRAM (3090, A100, H20 recommended)
Conda: Recommended for environment management

Basic Installation

Step 1: Create Environment

conda create -n loom python=3.10 -y
conda activate loom

Step 2: Install LOOM-Eval

From source:

git clone https://github.com/loomeval/loom-eval.git
cd loom-eval
pip install -e .

Or via pip:

pip install loom-eval

Step 3: Install Flash Attention

Flash Attention is optional but recommended for better performance:

Visit Flash Attention Releases and install the appropriate version for your system.

Acceleration Methods

LOOM-Eval supports 11 acceleration methods. Each requires specific installation steps.

General Acceleration Environment

For most acceleration methods (H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, CaM):

conda create -n acceleration python=3.10.16
conda activate acceleration
pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121
pip install transformers==4.44.2
pip install ninja
pip install flash-attn==2.5.6 --no-build-isolation
pip install -r requirements.txt

KIVI Installation

cd models/acceleration/KIVI/quant
pip install -e .

ThinK Installation

cd models/acceleration/ThinK/load_tiny_cuda_resource
make install

FlexPrefill Installation

FlexPrefill requires a special environment:

conda create -n FlexPrefill python=3.10.16
conda activate FlexPrefill
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121
pip install triton==3.0.0 transformers==4.44.0 vllm==0.5.4
pip install ninja
pip install flash_attn==2.6.3 --no-build-isolation
pip install nemo-toolkit[all]==1.21 --no-deps -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install Cython==3.0.11 -i https://pypi.tuna.tsinghua.edu.cn/simple
pip install -r ./models/acceleration/flexprefill_requirements.txt

# Install flashinfer
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

# Install FlexPrefill
cd models/acceleration/FlexPrefill
pip install -e .

XAttention Installation

XAttention requires a special environment:

conda create -yn xattn python=3.10
conda activate xattn

conda install -y git
conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit
conda install -y nvidia::cuda-cudart-dev
conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

pip install transformers==4.46 accelerate sentencepiece minference datasets wandb zstandard matplotlib
pip install huggingface_hub==0.23.2 torch torchaudio torchvision xformers
pip install vllm==0.6.3.post1 vllm-flash-attn==2.6.1
pip install tensor_parallel==2.0.0

pip install ninja packaging
pip install flash-attn==2.6.3 --no-build-isolation
pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/

# LongBench evaluation
pip install seaborn rouge_score einops pandas

# Install xAttention
pip install -e .

# Install Block Sparse Streaming Attention
git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git
cd Block-Sparse-Attention
python setup.py install
cd ..

export PYTHONPATH="$PYTHONPATH:$(pwd)"

Other Acceleration Methods

For H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, and CaM, no additional installation is required beyond the general acceleration environment.

RAG Installation

Install RAG dependencies:

pip install -r ./models/rag/requirements.txt

The following RAG methods are supported:

BM25
LlamaIndex
OpenAI
Contriever
LlamaRAG
StreamingRAG

Next Steps

Follow the Quick Start guide
Explore available Benchmarks
Learn about Acceleration Methods methods
Configure RAG (Retrieval-Augmented Generation) if needed