Installation ============ This guide will help you install LOOM-Eval and its dependencies. System Requirements ------------------- * **Python**: 3.10 or higher * **CUDA**: 11.8 or higher (for GPU acceleration) * **GPU**: NVIDIA GPU with 24GB+ VRAM (3090, A100, H20 recommended) * **Conda**: Recommended for environment management Basic Installation ------------------ Step 1: Create Environment ^^^^^^^^^^^^^^^^^^^^^^^^^^ :: conda create -n loom python=3.10 -y conda activate loom Step 2: Install LOOM-Eval ^^^^^^^^^^^^^^^^^^^^^^^^^^ From source:: git clone https://github.com/loomeval/loom-eval.git cd loom-eval pip install -e . Or via pip:: pip install loom-eval Step 3: Install Flash Attention ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Flash Attention is optional but recommended for better performance: Visit `Flash Attention Releases `_ and install the appropriate version for your system. Acceleration Methods -------------------- LOOM-Eval supports 11 acceleration methods. Each requires specific installation steps. General Acceleration Environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For most acceleration methods (H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, CaM):: conda create -n acceleration python=3.10.16 conda activate acceleration pip install torch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121 pip install transformers==4.44.2 pip install ninja pip install flash-attn==2.5.6 --no-build-isolation pip install -r requirements.txt KIVI Installation ^^^^^^^^^^^^^^^^^ :: cd models/acceleration/KIVI/quant pip install -e . ThinK Installation ^^^^^^^^^^^^^^^^^^ :: cd models/acceleration/ThinK/load_tiny_cuda_resource make install FlexPrefill Installation ^^^^^^^^^^^^^^^^^^^^^^^^ FlexPrefill requires a special environment:: conda create -n FlexPrefill python=3.10.16 conda activate FlexPrefill pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 -f https://mirrors.aliyun.com/pytorch-wheels/cu121 pip install triton==3.0.0 transformers==4.44.0 vllm==0.5.4 pip install ninja pip install flash_attn==2.6.3 --no-build-isolation pip install nemo-toolkit[all]==1.21 --no-deps -i https://pypi.tuna.tsinghua.edu.cn/simple pip install Cython==3.0.11 -i https://pypi.tuna.tsinghua.edu.cn/simple pip install -r ./models/acceleration/flexprefill_requirements.txt # Install flashinfer pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ # Install FlexPrefill cd models/acceleration/FlexPrefill pip install -e . XAttention Installation ^^^^^^^^^^^^^^^^^^^^^^^ XAttention requires a special environment:: conda create -yn xattn python=3.10 conda activate xattn conda install -y git conda install -y nvidia/label/cuda-12.4.0::cuda-toolkit conda install -y nvidia::cuda-cudart-dev conda install -y pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia pip install transformers==4.46 accelerate sentencepiece minference datasets wandb zstandard matplotlib pip install huggingface_hub==0.23.2 torch torchaudio torchvision xformers pip install vllm==0.6.3.post1 vllm-flash-attn==2.6.1 pip install tensor_parallel==2.0.0 pip install ninja packaging pip install flash-attn==2.6.3 --no-build-isolation pip install flashinfer -i https://flashinfer.ai/whl/cu121/torch2.4/ # LongBench evaluation pip install seaborn rouge_score einops pandas # Install xAttention pip install -e . # Install Block Sparse Streaming Attention git clone https://github.com/mit-han-lab/Block-Sparse-Attention.git cd Block-Sparse-Attention python setup.py install cd .. export PYTHONPATH="$PYTHONPATH:$(pwd)" Other Acceleration Methods ^^^^^^^^^^^^^^^^^^^^^^^^^^ For H2O, StreamingLLM, SnapKV, L2Compress, PyramidKV, CakeKV, and CaM, no additional installation is required beyond the general acceleration environment. RAG Installation ---------------- Install RAG dependencies:: pip install -r ./models/rag/requirements.txt The following RAG methods are supported: * BM25 * LlamaIndex * OpenAI * Contriever * LlamaRAG * StreamingRAG Next Steps ---------- * Follow the :doc:`quickstart` guide * Explore available :doc:`benchmarks/index` * Learn about :doc:`acceleration` methods * Configure :doc:`rag` if needed