API Reference

Nirvan (Sanskrit निर्वाण) lit. 'awakening / enlightenment'

Compare

Compare class is used to compare two datasets and generate a comparison report. Once instantiated, the Compare object encapsulates the all the information and the associated result of comparison.

Usage

from compare_datasets import Compare
compared = Compare(tested: polars.DataFrame,expected: polars.DataFrame, key: List, verbose: bool, low_memory: bool) # creates a Compare object
print(compared) # prints the comparison report
compared.get_report(format='txt', save_at_path=None) # saves the comparison report in TXT
compared.get_report(format='html', save_at_path=None) # saves the comparison report in HTML

Arguments

tested (polars.DataFrame or pandas.DataFrame) The tested dataset. If a Pandas dataframe is passed, it will be converted to a Polars dataframe.
expected (polars.DataFrame or pandas.DataFrame) The expected dataset. If a Pandas dataframe is passed, it will be converted to a Polars dataframe.
key (List, optional) The key column used for matching rows between the datasets. Defaults to None. It should uniquely identify the rows in each dataset. Read more about key column here
verbose (bool, optional) Whether to display verbose output. Defaults to False.
low_memory (bool, optional) Whether to use low memory mode for comparison. Low memory mode disables inner join of tested and expected frames. Defaults to False.
case_sensitive_column_names (bool, optional) Whether to use case sensitive column names for comparison. Defaults to False. If True, the column names will be compared as is. If False, the column names will be converted to uppercase before comparison.
numeric_tolerance (int, optional) The number of decinal places to round-off the numeric values before comparison. Defaults to 4. If None, the numeric values will not be rounded-off.
test_name (str, optional) The name of the test. This is used for printing the report
tested_frame_name (str, optional) The name of the tested dataset. This is used for printing the report.
expected_frame_name (str, optional) The name of the expected dataset. This is used for printing the report

Attributes

result (list) List of comparison results.
jaccard_similarity (JaccardSimilarity) Object for calculating Jaccard similarity.
tested (pandas.DataFrame) The prepared tested dataset.
expected (pandas.DataFrame) The prepared expected dataset.
string_comparisons (StringComparisons) Object for comparing string columns.
numeric_comparisons (NumericComparisons) Object for comparing numeric columns.
date_comparisons (DateTimeComparisons) Object for comparing datetime columns.
boolean_comparisons (BooleanComparisons) Object for comparing boolean columns.

Methods

report() Generates a comparison report.
get_report(filename='report.txt Generates and optionally saves the comparison report.

Installation and Quick Start Euclidean Distance