LUNA - A Python Library for Language Understanding and Naturalness Assessment of Language Models

What this paper adds in value

LUNA is first of its kind python library to do unified benchmarking of language generation models.

It supports multiple benchmarks (as shown in picture below). They are classified in two ways -

  1. Reference-based or Reference-free

  2. Their representations approach - string based, embedding based, model based, etc.

It supports parallel evaluation and ability to add novel benchmarks by extending the python library.

What can be built on top of this

This python library can be integrated as testing suite in all machine learning projects to automate evaluations

More applications for this approach

This approach of unification can be used for generation also. Imagine generating output from 10-12 models in one prompt and even getting their qualitative ranking on one page.

Code implementation

You can install this library by cloning the above repo and installing from root.

git clone
pip install .

You can use either pip (as above) or poetry for installing from root.

I prefer poetry over pip.

