API Reference

Command Line Interface

Main Commands

loom-eval.run

Run model evaluation with automatic pipeline:

loom-eval.run [OPTIONS]

loom-eval.eval

Evaluate existing model predictions:

loom-eval.eval --folder_name <artifact_directory> --model_name <model_id> --cfg_path <config_path>

Core Parameters

--model_path PATH

Model path or Hugging Face model ID (required)

Example: Meta-Llama/Meta-Llama-3.1-8B-Instruct

--cfg_path PATH

Benchmark configuration file path (required)

Example: ./benchmarks/Faithfulness/L_CiteEval/test.yaml

--device DEVICES

GPU devices to use. Space-separated list of device indices.

Example: --device 0 1 2 3

Default: None

--gp_num STR

Number of GPUs per task

Default: "1"

--eval

Enable evaluation after generation

Default: False

--save_tag TAG

Custom tag for saving results

Default: None

Inference Options

--server {transformers,vllm,rwkv,sglang}

Inference backend selection

Default: transformers

--torch_dtype TYPE

PyTorch data type for model weights

Default: torch.bfloat16

--max_length INT

Maximum context length per task. Data exceeding this will be split.

Default: 1500000

--max_model_len INT

Maximum model length for vLLM backend

Default: 128000

--gpu_memory_utilization FLOAT

GPU memory allocation ratio for vLLM engine

Default: 0.95

Acceleration Options

--acceleration {"",SnapKV,KIVI,H2O,StreamingLLM,L2Compress,CaM,CakeKV,PyramidKV,FlexPrefill,XAttention,ThinK}

Memory-efficient acceleration method

Available methods: * SnapKV: Attention pooling before selection * KIVI: Asymmetric 2bit quantization * H2O: Attention-based selection * StreamingLLM: Retain first few tokens * L2Compress: L2 norm-based compression * CaM: Cache merging * CakeKV: Layer-specific preference score * PyramidKV: Layer-wise budget allocation * FlexPrefill: Dynamic sparse attention * XAttention: Strided antidiagonal scoring * ThinK: Query-driven key cache pruning

Default: "" (no acceleration)

RAG Options

--rag_method {LlamaRAG,openai,BM25,llamaindex,contriever,StreamingRAG}

Retrieval-Augmented Generation method

Default: None

--rag_config_path PATH

RAG configuration file path

Default: ./models/rag/rag_config.yaml

--rag_data_path PATH

Path to save/load RAG data

Default: None

--skip_rag_processing

Skip RAG processing and use existing RAG data

Default: False

Extension Options

--limit INT

Limit evaluation to first N documents per task

Default: None

--adapter_path PATH

Path to adapter weights (e.g., LoRA)

Default: ""

--enable_thinking

Enable step-by-step reasoning prompts for supported models

Default: False

--lightweight

Run LOOMBench lightweight evaluation with reduced dataset

Default: False

--template NAME

Custom chat template function name. Overrides benchmark configs.

Default: False

--enable_length_split

Split evaluation results into 5 length-based categories

Default: False

--provider NAME

API service provider name (e.g., “openai”)

Default: None

Storage Strategy Options

--save_strategy {immediate,step,buffer,time}

Prediction saving strategy

immediate: Save after each prediction
step: Save every N steps
buffer: Save when buffer is full
time: Save at time intervals

Default: immediate

--save_interval INT

Step interval for ‘step’ strategy

Default: 10

--buffer_size INT

Buffer size for ‘buffer’ strategy

Default: 100

--time_interval INT

Time interval in seconds for ‘time’ strategy

Default: 60

--auto_flush

Automatically flush all buffers when program ends

Default: True

--web_index STR

WebUI progress tracking index

Default: "0"

Custom Templates

Add custom templates to models/utils/build_chat.py:

def my_template(tokenizer, input, **kwargs):
    """Custom chat template"""
    return tokenizer.build_prompt(input)

Usage in benchmark config:

template: my_template

Examples

Basic Evaluation

loom-eval.run \
    --model_path Meta-Llama/Meta-Llama-3.1-8B-Instruct \
    --cfg_path ./benchmarks/Faithfulness/L_CiteEval/test.yaml \
    --device 0 1 \
    --gp_num 2 \
    --eval \
    --save_tag my_experiment

With Acceleration

loom-eval.run \
    --model_path Meta-Llama/Meta-Llama-3.1-8B-Instruct \
    --cfg_path ./benchmarks/General/RULER.yaml \
    --acceleration SnapKV \
    --device 0 1 \
    --eval

With RAG

loom-eval.run \
    --model_path Meta-Llama/Meta-Llama-3.1-8B-Instruct \
    --cfg_path ./benchmarks/Faithfulness/L_CiteEval/test.yaml \
    --rag_method BM25 \
    --device 0 \
    --eval \
    --save_tag rag_experiment

vLLM Backend

loom-eval.run \
    --model_path Meta-Llama/Meta-Llama-3.1-8B-Instruct \
    --cfg_path ./benchmarks/General/LongBench.yaml \
    --server vllm \
    --max_model_len 128000 \
    --gpu_memory_utilization 0.95 \
    --device 0 1 2 3 \
    --eval

API Model

loom-eval.run \
    --model_path gpt-4o-2024-05-13 \
    --cfg_path ./benchmarks/Faithfulness/L_CiteEval/test.yaml \
    --provider openai \
    --eval