Key Facts
- Category
- Data Processing
- Input Types
- textarea, select, text, checkbox, number
- Output Type
- text
- Sample Coverage
- 4
- API Ready
- Yes
Overview
The Z-Score Standardizer transforms your numerical datasets into a standardized format with a mean of 0 and a standard deviation of 1. This essential data preprocessing tool helps you compare variables across different scales, prepare features for machine learning models, and identify statistical outliers with precision.
When to Use
- •Preparing numerical features for machine learning algorithms that are sensitive to data scale.
- •Comparing datasets measured in different units or ranges to identify relative performance.
- •Detecting and flagging statistical outliers that deviate significantly from the mean or median.
How It Works
- •Paste your CSV data into the input field and select the specific columns you wish to standardize.
- •Choose between standard Z-Score or Robust Z-Score (using median and MAD) based on your data's sensitivity to outliers.
- •Configure optional settings like missing value handling, decimal precision, and outlier detection thresholds.
- •Generate the standardized dataset along with a comprehensive statistical summary of your input.
Use Cases
Examples
1. Preparing ML Features
Data Scientist- Background
- A dataset contains 'Age' and 'Salary' columns with vastly different ranges, which is causing bias in a K-Nearest Neighbors model.
- Problem
- The model is dominated by the 'Salary' feature due to its larger numerical scale.
- How to Use
- Upload the CSV, select 'Age' and 'Salary' as target columns, and apply standard Z-Score normalization.
- Example Config
-
standardizationType: zscore, targetColumns: age, salary - Outcome
- Both features are transformed to a mean of 0 and std of 1, allowing the model to weigh both variables equally.
2. Outlier Detection in Sensor Data
Quality Assurance Engineer- Background
- A production line sensor records temperature readings every second, but occasionally reports erroneous spikes.
- Problem
- Manually identifying temperature spikes across thousands of rows is inefficient.
- How to Use
- Input the sensor logs, enable 'Detect Outliers', and set the threshold to 3 standard deviations.
- Example Config
-
detectOutliers: true, outlierThreshold: 3 - Outcome
- The tool outputs the standardized data and flags all readings exceeding 3 standard deviations as potential outliers for review.
Try with Samples
csv, video, barcodeRelated Hubs
FAQ
What is the difference between Z-Score and Robust Z-Score?
Standard Z-Score uses mean and standard deviation, which are sensitive to outliers. Robust Z-Score uses median and Median Absolute Deviation (MAD), making it more reliable when your data contains extreme values.
Can I process non-numeric columns?
Yes, the tool is designed to preserve non-numeric columns, ensuring your original data structure remains intact while only transforming the selected numerical fields.
How does the tool handle missing values?
You can choose to skip rows containing missing values or fill them using the mean, median, mode, or zero to maintain dataset consistency.
What is the outlier threshold?
The threshold defines how many standard deviations a value must be from the mean to be flagged as an outlier. The default is 2, but you can adjust this based on your specific requirements.
Can I scale the output to a specific range?
Yes, you can provide a custom output range to map your standardized data to a specific interval, such as 0 to 1, if required for your analysis.