Key Facts
- Category
- Data Analysis
- Input Types
- textarea, select, checkbox
- Output Type
- text
- Sample Coverage
- 4
- API Ready
- Yes
Overview
The Data Distribution Analyzer provides a comprehensive statistical toolkit to evaluate your datasets, offering automated normality testing, outlier detection, and goodness-of-fit assessments to help you understand the underlying structure of your data.
When to Use
- •When you need to verify if your dataset follows a normal distribution before performing parametric statistical tests.
- •When identifying anomalies or extreme values that may skew your analysis results.
- •When exploring the frequency and spread of data points to determine the best model for further predictive analysis.
How It Works
- •Input your numerical data as a comma-separated list or column, selecting whether to analyze a single set or flatten multiple columns.
- •Choose your desired significance level (0.01, 0.05, or 0.10) to calibrate the sensitivity of your statistical tests.
- •Enable specific modules such as normality tests (Shapiro-Wilk, Anderson-Darling), outlier detection (IQR, Z-score), and histogram generation.
- •Review the generated report to identify distribution patterns, statistical significance, and flagged data points.
Use Cases
Examples
1. Validating Experimental Data
Data Scientist- Background
- A researcher collected 50 samples from a chemical reaction and needs to confirm if the yield follows a normal distribution.
- Problem
- Unsure if the data is normally distributed, which is a prerequisite for the planned ANOVA test.
- How to Use
- Paste the 50 yield values into the Data Input field and enable 'Test Normality'.
- Example Config
-
significanceLevel: 0.05, testNormality: true - Outcome
- The tool returns p-values for Shapiro-Wilk and Anderson-Darling tests, confirming the data's normality and allowing the researcher to proceed with ANOVA.
2. Identifying Sensor Anomalies
IoT Engineer- Background
- An IoT sensor is reporting temperature readings that occasionally spike, potentially indicating hardware malfunction.
- Problem
- Need to distinguish between natural environmental variance and actual sensor errors.
- How to Use
- Input the daily temperature logs and enable 'Detect Outliers' to flag values outside the expected statistical range.
- Example Config
-
detectOutliers: true - Outcome
- The tool identifies specific timestamps where readings exceeded the Z-score threshold, highlighting potential sensor faults for maintenance.
Try with Samples
qrRelated Hubs
FAQ
What normality tests are supported?
The tool performs Anderson-Darling, Shapiro-Wilk, and Jarque-Bera tests to assess if your data follows a normal distribution.
How does the tool detect outliers?
It uses multiple robust statistical methods, including Interquartile Range (IQR) and Z-score analysis, to identify values that deviate significantly from the mean.
What does the significance level setting do?
It defines the threshold for statistical significance (p-value). A 0.05 level corresponds to 95% confidence, which is standard for most scientific research.
Can I analyze multiple columns of data at once?
Yes, by selecting the 'Multiple columns' format, the tool will flatten all provided values into a single dataset for unified analysis.
Does this tool provide visual charts?
While it does not generate image files, it provides comprehensive frequency distribution data and percentile information that can be used to construct histograms.