Basic usage
Installation
To use LM-Polygraph, first clone the repo and conduct installation using pip, it is recommended to use virtual environment. Code example is presented below:
$ git clone https://github.com/IINemo/lm-polygraph.git
$ python3 -m venv env
$ source env/bin/activate
(env) $ cd lm-polygraph
(env) $ pip install .
Quick start
Initialize the model (encoder-decoder or decoder-only) from HuggingFace or a local file. For example, bigscience/bloomz-3b:
from lm_polygraph.utils.model import WhiteboxModel model = WhiteboxModel.from_pretrained( "bigscience/bloomz-3b", device="cuda:0", )
Specify UE method:
from lm_polygraph.estimators import * ue_method = MeanPointwiseMutualInformation()
Get predictions and their uncertainty scores:
from lm_polygraph.utils.manager import estimate_uncertainty input_text = "Who is George Bush?" estimate_uncertainty(model, ue_method, input_text=input_text)
Other examples:
examples of library usage: https://github.com/IINemo/lm-polygraph/blob/main/notebooks/example.ipynb
examples of library usage for the QA task with bigscience/bloomz-3b on the TriviaQA dataset: https://github.com/IINemo/lm-polygraph/blob/main/notebooks/qa_example.ipynb
examples of library usage for the NMT task with facebook/wmt19-en-de on the WMT14 En-De dataset: https://github.com/IINemo/lm-polygraph/blob/main/notebooks/mt_example.ipynb
examples of library usage for the ATS task with facebook/bart-large-cnn model on the XSUM dataset: https://github.com/IINemo/lm-polygraph/blob/main/notebooks/ats_example.ipynb
example of running interface from notebook (careful: only bloomz-560m, gpt-3.5-turbo and gpt-4 fits default memory limit, other models can be run only with Colab-pro subscription): https://colab.research.google.com/drive/1JS-NG0oqAVQhnpYY-DsoYWhz35reGRVJ?usp=sharing
Benchmarks
To evaluate the performance of uncertainty estimation methods run:
polygraph_eval --dataset triviaqa.csv --model databricks/dolly-v2-3b --save_path test.man --cache_path . --seed 1 2 3 4 5
Parameters:
dataset: path to .csv dataset
model: path to huggingface model
batch_size: batch size for generation (default: 2)
seed: seed for generation (default: 1; can specify several seeds for multiple tests)
device: cpu or cuda:N (default: cuda:0 if avaliable, cpu otherwise)
save_path: file path to save test results (the directory better be existing)
cache_path: directory path to cache intermediate calculations (the directory better be existing)
Use visualization_tables.ipynb to generate the summarizing tables for an experiment.
The XSUM, TriviaQA, WMT16ru-en datasets downsampled to 300 samples can be found here.