Releases: NannyML/nannyml
Releases · NannyML/nannyml
v0.10.7
Changed
- Optimized summary stats and overall performance by avoiding unnecessary copy operations and index resets in during chunking
(#390) - Optimized performance of
nannyml.base.PerMetricPerColumnResult
filter operations by adding a short-circuit path
when only filtering on period. (#391) - Optimized performance of all data quality calculators by avoiding unnecessary evaluations and avoiding copy and index reset operations
(#392)
Fixed
v0.10.6
Changed
- Make predictions optional for performance calcuation. When not provided, only AUROC and average precision will be calculated. (#380)
- Small DLE docs updates
- Combed through and optimized the reconstruction error calculation with PCA resulting in a nice speedup. Cheers @nikml! (#385)
- Updated summary stats value limits to be in line with the rest of the library. Changed from
np.nan
toNone
. (#387)
Fixed
- Fixed a breaking issue in the sampling error calculation for the median summary statistic when there is only a single value for a column. (#377)
- Drop
identifier
column from the documentation example for reconstruction error calculation with PCA. (#382) - Fix an issue where default threshold configurations would get changed when upon setting custom thresholds, bad mutables! (#386)
v0.10.5
v0.10.4
Changed
- We've changed the defaults for the
incomplete
parameter in theSizeBasedChunker
andCountBasedChunker
tokeep
from the previousappend
. This means that from now on, by default, you might have an additional
"incomplete" final chunk. Previously these records would have been appended to the last "complete" chunk.
This change was required for some internal developments, and we also felt it made more sense when looking at
continuous monitoring (as the incomplete chunk will be filled up later as more data is appended). (#367) - We've renamed the Classifier for Drift Detection (CDD) to the more appropriate Domain Classifier. (#368)
- Bumped the version of the
pyarrow
dependency to^14.0.0
if you're running on Python 3.8 or up.
Congrats on your first contribution here @amrit110, much appreciated!
Fixed
- Continuous distribution plots will now be scaled per chunk, as opposed to globally. (#369)
v0.10.3
Fixed
- Handle median summary stat calculation failing due to NaN values
- Fix standard deviation summary stat sampling error calculation occasionally returning infinity (#363)
- Fix plotting confidence bands when value gaps occur (#364)
Added
- New multivariate drift detection method using a classifier and density ration estimation.
v0.10.2
Changed
- Removed p-value based thresholds for Chi2 univariate drift detection (#349)
- Change default thresholds for univariate drift methods to standard deviation based thresholds.
- Add summary stats support to the Runner and CLI (#353)
- Add unique identifier columns to included datasets for better joining (#348)
- Remove unused
confidence_deviation
properties in CBPE metrics (#357) - Improved error handling: failing metric calculation for a single chunk will no longer stop an entire calculator.
Added
- Add feature distribution calculators (#352)
Fixed
- Fix join column settings for CLI (#356)
- Fix crashes in
UnseenValuesCalculator
v0.10.1
v0.10.0
Changed
- Telemetry now detects AKS and EKS and NannyML Cloud runtimes. (#325)
- Runner was refactored, so it can be extended with premium NannyML calculators and estimators. (#325)
- Sped up telemetry reporting to ensure it doesn't hinder performance.
- Some love for the docs as @santiviquez tediously standardized variable names. (#338)
- Optimize calculations for L-infinity method. [(#340)]
- Refactored the
CalibratorFactory
to align with our other factory implementations. [(#341)] - Updated the
Calibrator
interface with*args
and**kwargs
for easier extension. - Small refactor to the
ResultComparisonMixin
to allow easier extension.
Added
- Added support for directly estimating the confusion matrix of multiclass classification models using CBPE.
Big thanks to our appreciated alumnus @cartgr for the effort (and sorry it took soooo long). (#287) - Added
DatabaseWriter
support for results fromMissingValuesCaclulator
andUnseenValuesCalculator
. Some
excellent work by @bgalvao, thanks for being a long-time user and supporter!
Fixed
- Fix issues with calculation and filtering in performance calculation and estimation. (#321)
- Fix multivariate reconstruction error plot labels. (#323)
- Log a warning when performance metrics for a chunk will return
NaN
value. (#326) - Fix issues with ReadTheDocs build failing
- Fix erroneous
specificity
calculation, both realized and estimated. Well spotted @nikml! (#334) - Fix threshold computation when dealing with
NaN
values. Major thanks to the eagle-eyed @giodavoli. (#333) - Fix exports for confusion matrix metrics using the
DatabaseWriter
. An inspiring commit that lead to some other changes.
Great job @shezadkhan137! (#335) - Fix incorrect normalization for the business value metric in realized and estimated performance. (#337)
- Fix handling
NaN
values when fitting univariate drift. [(#340)]
v0.9.1
Changed
- Updated Mendable client library version to deal with styling overrides in the RTD documentation theme
- Removed superfluous limits for confidence bands in the CBPE class (these are present in the metric classes instead)
- Threshold value limiting behaviour (e.g. overriding a value and emitting a warning) will be triggered not only when
the value crosses the threshold but also when it is equal to the threshold value. This is because we interpret the
threshold as a theoretical maximum.
Added
- Added a new example notebook walking through a full use case using the NYC Green Taxi dataset, based on the blog of @santiviquez
Fixed
- Fixed broken Docker container build due to changes in public Poetry installation procedure
- Fixed broken image source link in the README, thanks @NeoKish!
v0.9.0
Changed
- Updated API docs for the
nannyml.io
package, thanks @maciejbalawejder (#286) - Restricted versions of
numpy
to be<1.25
, since there seems to be a change in theroc_auc
calculation somehow (#301)
Added
- Support for Data Quality calculators in the CLI runner
- Support for Data Quality results in
Ranker
implementations (#297) - Support
mendable
in the docs (#295) - Documentation landing page (#303)
- Support for calculations with delayed targets (#306)
Fixed
- Small changes to quickstart, thanks @NeoKish (#291)
- Fix an issue passing
*args
and**kwargs
inResult.filter()
and subclasses (#298) - Double listing of the binary dataset documentation page
- Add missing thresholds to
roc_auc
inCBPE
(#294) - Fix plotting issue due to introduction of additional values in the 'display names tuple' (#305)
- Fix broken exception handling due to inheriting from
BaseException
and notException
(#307)