Data Profiling Collection
Repositories tagged with "data-profiling"
Repositories tagged with "data-profiling"
OpenMetadata
open-metadata
โOpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.โ
fg-data-profiling
Data-Centric-AI-Community
โ1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames. โ
great_expectations
great-expectations
โAlways know what to expect from your data.โ
cleanlab
cleanlab
โCleanlab's open-source library is the standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.โ
sweetviz
fbdesignpro
โVisualize and compare datasets, target values and associations, with one line of code.โ
soda-core
sodadata
โData Contracts engine for the modern data stack. https://www.soda.ioโ
optimus
hi-primus
โ:truck: Agile Data Preparation Workflows madeย easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySparkโ
odd-platform
opendatadiscovery
โFirst open-source data discovery and observability platform. We make a life for data practitioners easy so you can focus on your business.โ
cleanvision
cleanlab
โAutomatically find issues in image datasets and practice data-centric computer vision.โ
datavines
datavane
โKnow your data better๏ผDatavines is Next-gen Data Observability Platform, support metadata manage and data quality.โ
traceml
polyaxon
โEngine for ML/Data tracking, visualization, explainability, drift detection, and dashboards for Polyaxon.โ
popmon
ing-bank
โMonitor the stability of a Pandas or Spark dataframe โ๏ธโ
piperider
InfuseAI
โCode review for data in dbtโ
desbordante-core
Desbordante
โDesbordante is a high-performance data profiler that is capable of discovering many different patterns in data using various algorithms. It also allows to run data cleaning scenarios using these algorithms. Desbordante has a console version and an easy-to-use web application.โ
haupt
polyaxon
โLineage metadata API, artifacts streams, sandbox, API, and spaces for Polyaxonโ
dqx
databrickslabs
โDatabricks framework to validate Data Quality of pySpark DataFrames and Tablesโ
dqo
dqops
โData Quality and Observability platform for the whole data lifecycle, from profiling new data sources to full automation with Data Observability. Configure data quality checks from the UI or in YAML files, let DQOps run the data quality checks daily to detect data quality issues.โ
bumblebee
hi-primus
โ๐ A spreadsheet-like data preparation web app that works over Optimus (Pandas, Dask, cuDF, Dask-cuDF, Spark and Vaex)โ