Installation and Quick Start

Quick Start

Install

pip install compare-datasets

You can also download wheels from here (opens in a new tab) and install them manually. It is recommended to use the latest version.

Quick Start

Currently, it supports Polars or Pandas dataframes. To quickly get started, just import the Compare class from compare_datasets (from compare_datasets import Compare) and pass the expected, tested dataframes to it (compared = Compare(tested=tested,expected=expected, key=key)). It is highly recommended that you also pass a common column or a list of common columns that uniquely identify a row in both the dataframes. If you don't pass a key, it will default to an un-ordered comparison by the index of the dataframes.

import polars as pl
expected = pl.DataFrame(
    {
    'id': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
    'another_id':[1001, 1002, 1003, 1004, 1005, 1006, 1007, 1008, 1009, 1010],
    'name': ['John', 'Alice', 'Bob', 'Eva', 'Charlie', 'Linda', 'David', 'Sophie', 'Michael', 'Emma'],
    'age': list(range(25, 35)),
    'height': [170, 165, 180, 160, 175, 160, 185, 175, 172, 168],
    'weight': [70, 55, 80, 50, 68, 52, 95, 73, 78, 60]
    }
)
 
tested = pl.DataFrame(
    {
    'id': [102, 103, 104, 105, 106, 107, 108, 109, 110],
    'another_id':[1002, 1003, 1004, 1005, 1006, 1007, 2008, 2009, 2010],
    'name': ['Albert', 'Bobby', 'Evan', 'Charlie', 'Linda', 'David', 'Sophie', 'Michael', 'Emma'],
    'age': list(range(25, 34)),
    'height': [165, 180, 160, 175, 160, 185, 175, 172, 168],
    'weight': [55, 80, 50, 68, 52, 95, 73, 78, 60]
        
    }
)
from compare_datasets import Compare
key = ['id', 'another_id']
compared = Compare(tested=tested,expected=expected, key=key) # creates a Compare object
print(compared) # prints the tabulated result
compared.get_report("<PATH_TO_SAVE_REPORT>, format='txt'") # saves the report to a file

You can print the well formatted result by calling print(<NAME OF COMPARE OBJECT>). You can also save the report to a file by calling get_report(filename="provide_a_name.txt"). The report is saved in txt format.