Show HN: go-stats-calculator, CLI for computing stats:mean,median,variance,etc.

A robust, command-line tool written in Go to compute a comprehensive set of descriptive statistics from a list of numbers. It can read data from a file or directly from standard input, making it a flexible utility for data analysis in any shell environment.

This program takes a simple, newline-delimited list of numbers (integers or floats) and calculates key statistical properties, including measures of central tendency (mean, median, mode), measures of spread (standard deviation, variance, IQR), and the shape of the distribution (skewness). It also identifies outliers based on the interquartile range method.

This program was vibe-coded by Gemini 2.5 Pro and Opus 4.5. As such, the author can't be held responsible for incorrect calculations. Please verify the results for any critical applications. That said, it has been validated through unit tests and independent verification. See Testing and Correctness for details.

The calculator computes the following statistics:

Count: Total number of valid data points.
Min / Max: The minimum and maximum values in the dataset.
Mean: The arithmetic average.
Median (p50): The 50th percentile, or the middle value of the dataset.
Mode: The value(s) that appear most frequently.
Standard Deviation: A measure of the amount of variation or dispersion.
Variance: The square of the standard deviation.
Quartiles (Q1, Q3): The 25th (p25) and 75th (p75) percentiles.
Percentiles (p95, p99): The 95th and 99th percentiles, useful for understanding tail distributions.
Custom Percentiles: Compute any percentile(s) between 0 and 100 using the -p flag.
Interquartile Range (IQR): The range between the first and third quartiles (Q3 - Q1).
Skewness: A formal measure of the asymmetry of the data distribution.
Outliers: Data points identified as abnormally distant from other values.

All numeric output uses full decimal notation (no scientific notation) with trailing zeros trimmed for readability.

brew tap jftuga/homebrew-tap; brew update; brew install jftuga/tap/stats

Clone the repository and build:

git clone https://github.com/jftuga/go-stats-calculator.git
cd go-stats-calculator
go build -ldflags="-s -w" -o stats stats.go

The program can be run in two ways: by providing a filename as a command-line argument or by piping data into it. The program automatically detects piped input, so the - argument is optional.

Provide the path to a file containing numbers, one per line.

Syntax:

Example:

Pipe data from other commands directly into the program. The program automatically detects piped input, so no special argument is needed. You can optionally use the - argument for explicit stdin reading.

Syntax:

Examples:

# Pipe the contents of a file
cat data.txt | ./stats # Pipe output from another command (e.g., extracting a column from a CSV)
awk -F',' '{print $3}' metrics.csv | ./stats # Explicit stdin reading (also works)
cat data.txt | ./stats - # Manually enter numbers (press Ctrl+D when finished)
./stats -
10
20
30
^D

Use the -p flag to compute additional percentiles. Provide a comma-separated list of values between 0 and 100.

Syntax:

./stats -p <percentiles> <filename>

Examples:

# Compute 10th and 90th percentiles
./stats -p "10,90" data.txt # Compute multiple percentiles including decimals
./stats -p "5,10,90,99.9" data.txt # Combined with stdin
cat data.txt | ./stats -p "10,50,90"

Given a file named sample_data.txt with the following content:

sample_data.txt

Running the command ./stats sample_data.txt will produce the following output:

--- Descriptive Statistics ---
Count:          15
Sum:            310.95
Min:            13.99
Max:            38.95

--- Measures of Central Tendency ---
Mean:           20.73
Median (p50):   18.92
Mode:           15.05

--- Measures of Spread & Distribution ---
Std Deviation:  7.4605
Variance:       55.6597
Quartile 1 (p25): 15.735
Quartile 3 (p75): 21.765
Percentile (p95): 36.801
Percentile (p99): 38.5202
IQR:            6.03
Skewness:       1.6862 (Highly Right Skewed)
Outliers:       [35.88 38.95]

Statistic	Description
Count	The total number of valid numeric entries processed.
Min	The smallest number in the dataset.
Max	The largest number in the dataset.
Mean	The "average" value. Highly sensitive to outliers.
Median (p50)	The middle value of the sorted dataset. Represents the "typical" value and is robust against outliers.
Mode	The number(s) that occur most frequently. If no number repeats, the mode is "None".
Std Deviation	Measures how spread out the numbers are from the mean. A low value indicates data is clustered tightly; a high value indicates data is spread out.
Variance	The square of the standard deviation.
Quartile 1 (p25)	The value below which 25% of the data falls.
Quartile 3 (p75)	The value below which 75% of the data falls.
Percentile (p95)	The value below which 95% of the data falls. Useful for understanding the upper tail of the distribution.
Percentile (p99)	The value below which 99% of the data falls. Useful for identifying extreme values and tail behavior.
Percentile (pN)	Custom percentiles requested via the `-p` flag. The value below which N% of the data falls.
IQR	The Interquartile Range (`Q3 - Q1`). It represents the middle 50% of the data and is a robust measure of spread.
Skewness	A measure of asymmetry. A value near 0 is symmetrical. A positive value indicates a "right skew" (a long tail of high values). A negative value indicates a "left skew".
Outliers	Values that fall outside the range of `Q1 - 1.5IQR` and `Q3 + 1.5IQR`. These are statistically unusual data points.

The program includes two layers of verification:

Standard Go unit tests cover the core statistical functions:

computeStats - verifies all computed statistics against a 31-number dataset
calculatePercentile - tests percentile interpolation at various points (p0, p25, p50, p75, p100)
calculateSkewness - validates skewness calculations for symmetric and skewed distributions

Run the tests with:

A shell script independently calculates statistics using bc (arbitrary precision calculator) and compares the results against the program's output. This provides external validation that the Go implementation produces correct results.

The script was developed and tested on MacOS Sequoia 15.7.3 using:

bc - arbitrary precision calculator for sum, mean, variance, standard deviation, and percentile calculations
sort - for ordering the dataset to verify percentile indices

Run the verification with:

The script exits with code 0 if all values match, or code 1 if any discrepancies are found.

The test dataset consists of 31 numbers designed to exercise common scenarios:

A mix of integers, decimals, and numbers with trailing zeros (e.g., 25.00, 35.0)
Repeated values to produce a defined mode
An outlier value to verify outlier detection

The tests focus on typical usage patterns and do not cover exotic edge cases, extreme values, or adversarial inputs. Users requiring high-assurance results for critical applications should perform additional validation appropriate to their use case.

This program is my own original idea, conceived and developed entirely:

On my own personal time, outside of work hours
For my own personal benefit and use
On my personally owned equipment
Without using any employer resources, proprietary information, or trade secrets
Without any connection to my employer's business, products, or services
Independent of any duties or responsibilities of my employment

This project does not relate to my employer's actual or demonstrably anticipated research, development, or business activities. No confidential or proprietary information from any employer was used in its creation.

Hacker News

Show HN: go-stats-calculator, CLI for computing stats:mean,median,variance,etc.

Show article

jftuga

Comments

HackerNews