Key Facts
- Category
- Data & Tables
- Input Types
- textarea, file, text, number
- Output Type
- html
- Sample Coverage
- 4
- API Ready
- Yes
Overview
The Dataset Quality Profiler is a fast, browser-based tool that inspects CSV and JSON files to generate an instant data quality report. It automatically detects missing values, duplicate records, numeric outliers, and format drift across your columns. Use it to get a quick operational snapshot of your dataset's health before moving data into BI dashboards, ETL pipelines, or machine learning models.
When to Use
- •Before loading raw data into a database or BI tool to catch structural errors and missing values.
- •When auditing a new dataset from a third-party vendor or client to quickly assess data completeness and anomalies.
- •During data preparation for machine learning to identify numeric outliers and inconsistent data types.
How It Works
- •Paste your CSV data directly into the input field or upload a CSV or JSON file.
- •Optionally specify duplicate key columns (like 'id,email') to check for business-logic duplicates instead of exact row matches.
- •Adjust the sample rows setting to control how many example records appear in the final preview.
- •View the generated HTML report, which includes a 0-100 quality score, missing value counts, outlier detection, and format drift alerts.
Use Cases
Examples
1. Profile a transactional CSV before loading it into BI
Data Analyst- Background
- An analyst receives a weekly CSV export of customer transactions that needs to be visualized in a BI dashboard.
- Problem
- The raw export often contains duplicate transaction IDs, missing amounts, and mixed date formats that break the dashboard.
- How to Use
- Paste the CSV into the Dataset Input, set 'Duplicate Key Columns' to 'id', and generate the report.
- Example Config
-
Duplicate Key Columns: id Sample Rows: 8 - Outcome
- The report flags duplicate 'id' rows, highlights missing values in the 'amount' column, and detects format drift in the 'created_at' dates.
2. Auditing a JSON user dataset for anomalies
Data Engineer- Background
- A data engineer is integrating a new JSON feed of user profiles from a third-party API.
- Problem
- The engineer needs to quickly verify if the API is sending complete records without extreme outliers in the 'age' or 'score' fields.
- How to Use
- Upload the JSON file via the 'Dataset File' input and review the Anomalies and Missing metrics in the generated report.
- Outcome
- The profiler assigns a quality score, identifies numeric outliers in the 'score' column using IQR, and confirms no missing values in critical fields.
Try with Samples
json, csv, textRelated Hubs
FAQ
What file formats does the profiler support?
The tool supports CSV and JSON files. For JSON, the data should be formatted as an array of objects or an object containing a 'rows' array.
How is the overall quality score calculated?
The score is a 0-100 operational summary. It decreases based on the frequency of missing cells, duplicate rows, format drift, and numeric anomalies found in the dataset.
Can I check for duplicates using specific columns?
Yes. By entering comma-separated column names in the 'Duplicate Key Columns' field (e.g., 'id,email'), the tool will flag duplicate combinations based only on those business keys.
What does 'format drift' mean in the report?
Format drift highlights columns where the data structure is inconsistent, such as mixing different date formats or combining numeric codes with free-text strings.
How does the tool detect numeric outliers?
The profiler uses an Interquartile Range (IQR) style rule to identify and flag numeric values that fall significantly outside the normal distribution of a column.