Dataset Imbalance Detector & Resampler

Detect class imbalance in CSV or JSON datasets, compare resampling strategies, and preview a balanced output dataset

Paste a CSV dataset or upload a CSV/JSON file, then specify the label column used for classification. The tool counts each class, measures the imbalance ratio, suggests whether oversampling or undersampling is safer, and generates a balanced dataset preview.

How to use it:

  • Dataset Input: paste CSV text when exploring quickly
  • Dataset File: upload CSV or JSON when working from a saved dataset
  • Label Column: choose the target class column to profile
  • Resampling Strategy: pick none, oversample, or undersample for the exported balanced dataset
  • Export Format: preview the balanced result as JSON or CSV
  • Preview Rows: limit how many balanced rows appear in the table

Notes:

  • Oversample duplicates minority rows to match the majority count
  • Undersample trims majority rows down to the minority count
  • The report also compares both strategies so you can decide before exporting
  • For production ML pipelines, the tool can help decide whether a more advanced method such as SMOTE is worth introducing later

Example Results

1 examples

Inspect a fraud dataset with a 95:5 label split

Measure the class skew, compare oversampling vs undersampling, and export a balanced training preview.

Original distribution
View input parameters
{ "datasetInput": "id,label,amount\n1,normal,20\n2,normal,21\n3,normal,19\n4,normal,22\n5,fraud,300", "labelColumn": "label", "strategy": "oversample", "exportFormat": "json", "previewRows": 10 }

Click to upload file or drag and drop file here

Maximum file size: 20MB Supported formats: text/csv, application/json, text/plain, .csv, .json

Key Facts

Category
Data & Tables
Input Types
textarea, file, text, select, number
Output Type
html
Sample Coverage
4
API Ready
Yes

Overview

The Dataset Imbalance Detector & Resampler is a specialized utility for machine learning practitioners and data analysts to identify and correct class skew in CSV or JSON datasets. By specifying a target label column, you can instantly measure imbalance ratios, compare the effects of oversampling versus undersampling, and generate a balanced dataset preview ready for export.

When to Use

  • When preparing training data for classification models to prevent algorithmic bias toward the majority class.
  • When evaluating whether a dataset requires simple resampling techniques or more advanced methods like SMOTE.
  • When you need a quick, code-free way to duplicate minority rows or trim majority rows in a CSV or JSON file.

How It Works

  • Paste your raw CSV data or upload a saved CSV or JSON dataset file.
  • Enter the exact name of your target classification column in the Label Column field.
  • Select a resampling strategy (oversample or undersample) and choose your preferred export format.
  • The tool calculates the class distribution, applies the chosen strategy, and outputs a balanced dataset preview.

Use Cases

Balancing fraud detection datasets where fraudulent transactions make up less than 5% of the total data.
Equalizing medical diagnosis records so a predictive model doesn't heavily favor negative test results.
Normalizing customer churn data to ensure machine learning algorithms learn the characteristics of both retained and churned users equally.

Examples

1. Balancing a highly skewed fraud dataset

Data Scientist
Background
A financial dataset contains 10,000 normal transactions but only 500 fraudulent ones, causing the initial model to predict 'normal' every time.
Problem
The minority class (fraud) needs to be amplified to match the majority class without writing custom Python scripts.
How to Use
Upload the transaction CSV, set the Label Column to 'is_fraud', and select the 'oversample' strategy.
Example Config
Label Column: is_fraud, Strategy: oversample, Export Format: csv
Outcome
The tool duplicates the 500 fraud rows until they match the 10,000 normal rows, outputting a perfectly balanced 20,000-row CSV preview.

2. Downsizing majority class for faster model training

Machine Learning Engineer
Background
A massive user database has 500,000 active users and 50,000 churned users. Training on the full dataset is slow and biased.
Problem
Reduce the majority class to match the minority class size to speed up training and balance class weights.
How to Use
Upload the JSON dataset, set the Label Column to 'status', and choose the 'undersample' strategy.
Example Config
Label Column: status, Strategy: undersample, Export Format: json
Outcome
The tool randomly trims the active users down to 50,000, resulting in a balanced, lightweight dataset of 100,000 total rows formatted as JSON.

Try with Samples

json, csv, text

Related Hubs

FAQ

What is the difference between oversampling and undersampling?

Oversampling duplicates rows from the minority class to match the majority count, while undersampling randomly removes rows from the majority class to match the minority count.

What file formats are supported for the dataset?

You can paste raw CSV text directly into the input field, or upload dataset files in CSV or JSON format.

How do I know which resampling strategy to choose?

Undersampling is generally safer for very large datasets where dropping data won't cause severe information loss, while oversampling is better for small datasets where every data point is critical.

Can I export the fully balanced dataset?

Yes, the tool generates a balanced dataset based on your chosen strategy, which you can preview and export in either JSON or CSV format.

Does this tool apply SMOTE or synthetic data generation?

No, this tool uses exact row duplication for oversampling and random trimming for undersampling. It helps you baseline your data before deciding if complex synthetic methods are necessary.

API Documentation

Request Endpoint

POST /en/api/tools/dataset-imbalance-detector-resampler

Request Parameters

Parameter Name Type Required Description
datasetInput textarea No -
datasetFile file (Upload required) No -
labelColumn text Yes -
strategy select No -
exportFormat select No -
previewRows number No -

File type parameters need to be uploaded first via POST /upload/dataset-imbalance-detector-resampler to get filePath, then pass filePath to the corresponding file field.

Response Format

{
  "result": "
Processed HTML content
", "error": "Error message (optional)", "message": "Notification message (optional)", "metadata": { "key": "value" } }
HTML: HTML

AI MCP Documentation

Add this tool to your MCP server configuration:

{
  "mcpServers": {
    "elysiatools-dataset-imbalance-detector-resampler": {
      "name": "dataset-imbalance-detector-resampler",
      "description": "Detect class imbalance in CSV or JSON datasets, compare resampling strategies, and preview a balanced output dataset",
      "baseUrl": "https://elysiatools.com/mcp/sse?toolId=dataset-imbalance-detector-resampler",
      "command": "",
      "args": [],
      "env": {},
      "isActive": true,
      "type": "sse"
    }
  }
}

You can chain multiple tools, e.g.: `https://elysiatools.com/mcp/sse?toolId=png-to-webp,jpg-to-webp,gif-to-webp`, max 20 tools.

Supports URL file links or Base64 encoding for file parameters.

If you encounter any issues, please contact us at [email protected]