Install this package:
emerge -a sci-ml/lm-eval
<pkgmetadata> <maintainer type="person"> <email>iohann.s.titov@gmail.com</email> <name>Ivan S. Titov</name> </maintainer> <use> <flag name="api">Wire dependencies for OpenAI/HF Inference / generic HTTP API model backends</flag> <flag name="ifeval">Enable instruction-following evaluation tasks (leaderboard_ifeval and friends) — checks generated text against structural constraints like length limits, language hints, and required keywords. Auto-downloads the NLTK punkt_tab tokenizer at task-load time (not deferred until eval); seed ~/nltk_data ahead of time on offline hosts.</flag> <flag name="math">Enable math-grading tasks (minerva_math, leaderboard math, hendrycks_math, etc.) — parses LaTeX answers and verifies symbolic equality between predicted and ground-truth solutions</flag> <flag name="sentencepiece">Pull sci-ml/sentencepiece for tasks that tokenise via SentencePiece</flag> <flag name="statsmodels">Pull dev-python/statsmodels for the discrim_eval task family</flag> <flag name="vllm">Wire dev-python/vllm for the vLLM model backend</flag> </use> <upstream> <remote-id type="pypi">lm-eval</remote-id> <remote-id type="github">EleutherAI/lm-evaluation-harness</remote-id> </upstream> </pkgmetadata>
Manage flags for this package:
euse -i <flag> -p sci-ml/lm-eval |
euse -E <flag> -p sci-ml/lm-eval |
euse -D <flag> -p sci-ml/lm-eval
| Flag | Description | 0.4.12 | 0.4.11 |
|---|---|---|---|
| api | Wire dependencies for OpenAI/HF Inference / generic HTTP API model backends | ⊕ | ⊕ |
| ifeval | Enable instruction-following evaluation tasks (leaderboard_ifeval and friends) — checks generated text against structural constraints like length limits, language hints, and required keywords. Auto-downloads the NLTK punkt_tab tokenizer at task-load time (not deferred until eval); seed ~/nltk_data ahead of time on offline hosts. | ✓ | ✓ |
| math | Enable math-grading tasks (minerva_math, leaderboard math, hendrycks_math, etc.) — parses LaTeX answers and verifies symbolic equality between predicted and ground-truth solutions | ✓ | ✓ |
| sentencepiece | Pull sci-ml/sentencepiece for tasks that tokenise via SentencePiece | ✓ | ✓ |
| statsmodels | Pull dev-python/statsmodels for the discrim_eval task family | ✓ | ✓ |
| vllm | Wire dev-python/vllm for the vLLM model backend | ✓ | ✓ |
| Type | File | Size | Versions |
|---|
| Type | File | Size |
|---|---|---|
| DIST | lm_eval-0.4.11.tar.gz | 3246509 bytes |
| DIST | lm_eval-0.4.12.tar.gz | 3360517 bytes |