Find Outliers in Excel and Remove Outliers to significantly enhance the quality and accuracy of your data analysis. This crucial process allows for cleaner datasets, ensuring that your conclusions, charts, and statistical models are based on reliable and representative information. By effectively identifying and addressing anomalies, you can avoid skewed results and make more informed decisions. Incorporate the practice of detecting and removing outliers into your Excel workflows to maintain data integrity and drive better outcomes in your projects and analyses.
In this tutorial, I’ll demonstrate how to locate outliers in Excel as well as a few handling strategies I’ve employed in the past.
This Tutorial Covers:
- What are Outliers and Why is it Important to Find these
- Find Outliers by Sorting the Data
- Finding Outliers Using the QUARTILE Functions
- Finding the Outliers Using the LARGE/SMALL functions
- How to Handle Outliers the Right Way
- Delete the Outliers
- Normalize the Outliers (Adjust the Value)
1. What are Outliers and Why is it Important to Find these?
A data point that greatly exceeds the other data points in the data set is an outlier. When you have an outlier in your data, it might skew your results and cause false conclusions to be drawn.
I’ll use a straightforward example.
Let’s imagine that 25 individuals are on a bus from point X to point Y. Everyone belongs to the same income and weight groups. Let’s assume for the purposes of this tutorial that the average weight is 170 pounds and the average annual income is $30,000 per person.
Elon Musk now boards the bus after it pauses in the middle of our ride.
What impact do you believe this would have on the bus passengers’ average weight and income?
While the average weight of those riding the bus is unlikely to alter significantly, their average income will.
This is due to the fact that Elon Musk’s income is an anomaly in our group, which causes us to interpret the data incorrectly. A few billion dollars would be the average annual revenue for each passenger on the bus, much above the vehicle’s actual value.
Excel allows you to work with real datasets, but you may encounter outliers in any direction (i.e., a positive outlier or a negative outlier).
You must locate these outliers in some way, decide how to handle them, and make sure your study is accurate after that.
Let’s now look at a few techniques for finding outliers in Excel.
2. Find Outliers by Sorting the Data
Sorting the data and manually going through some of the values at the top of this sorted data can quickly find outliers in small datasets.
Make sure to sort the data in ascending order first, then in descending order, and then go through the top values because there may be outliers in both directions.
I’ll give you an illustration.
I’ve got a dataset with sales reports below.
The steps to sort this data are listed below so that we may spot any outliers in the dataset
Step 1: You should choose the column’s header before sorting (cell B1 in this example)
Step 2: On the “Home “tab, click. Select the “Sort & Filter” icon in the “Editing” group. Then click on “Custom Sort…”
Step 3: Choose “Sales” from the Sort by drop-down and “Largest to Smallest” from the Order drop-down in the “Sort” dialog box. Hit “OK.”
Step 4: By following the procedures above, the Sales column would be sorted with the higher numbers at the top. The data can now be manually scanned to check for outliers.
I can see from our example that the first value is significantly higher than the other values (and the bottom one is way lower).
Note: It should be noted that this strategy only works with tiny datasets that can be manually scanned. Despite not being scientific, it is effective.
3. Finding Outliers Using the QUARTILE Functions
Let’s now discuss a more scientific method for determining whether or not there are any outliers excel.
A quartile in statistics refers to the fourth of the data collection. If there are 12 data points, for instance, the first quartile would consist of the worst three, the second quartile of the next three, and so on.
The data set where I’m looking for outlier excel is shown below. To accomplish this, I must first determine the first and third quartiles, utilizing those results to determine the upper and lower bounds.
How to find outliers in excel using QUARTILE Function in Excel:
Step 1: For cell F1, use the following formula to determine the first quartile:
=QUARTILE.INC($B$2:$B$13,1)
Step 2: For cell F2, use the following calculation to determine the third quartile:
=QUARTILE.INC($B$2:$B$13,3)
Step 3: The Interquartile Range (which is 50% of our data within the first and third quartiles) may now be obtained using the two calculations mentioned above.
=F2-F1
In order to determine the lowest and higher limits that would encompass the majority of our data, we will now use the interquartile range.
Then, anything that deviates from these lower and upper bounds would be regarded as an outlier.
Step 4: To determine the lower limit in cell F4, use the following formula:
=F1-1.5*F3
Step 5: For cell F5, use the following formula to determine the upper limit:
=F1+1.5*F3
With the upper and lower limits in our data set now, we can immediately spot numbers that do not fall inside this range by going back to the initial data.
Checking every item and returning a TRUE or FALSE in a new column would be a quick method to accomplish this.
Step 6: To determine TRUE for any numbers that are outliers, apply the OR function shown below.
=OR(B2<$F$4,B2>$F$5)
You may now filter the Outlier column to only display the entries with a TRUE value.
As an alternative, conditional formatting can be used to highlight all of the cells with a TRUE value.
Note: Although this is a more often used technique to identify statistical outliers. This approach seems a little bit unworkable in practical situations. Since the dataset we have can only be positive, the formula’s bottom limit in the example above is -103. Therefore, while this strategy can aid in locating outliers in one direction (high numbers), it is ineffective for doing so in the opposite direction.
4. Finding the Outliers Using the LARGE/SMALL functions
If your data set has several values over several columns, you can extract the largest and smallest three values to check for any outliers.
Without having to look through all the data in both directions, you will be able to spot any outliers.
Let’s say we have the dataset below and wish to find any outliers.
The formula that will provide you with the dataset’s highest value is listed below:
=LARGE($B$2:$B$13,1)
The second-largest value will similarly be provided by
=LARGE($B$2:$B$13,2)
The following formula will give you the three largest values from the dataset if you are not using Microsoft 365, which supports dynamic arrays:
Similarly, apply the following formula to get the lowest three values:
=SMALL($B$2:$B$13,ROW($1:3))
Likewise, if you don’t have dynamic arrays, the following:
=SMALL($B$2:$B$13,1)
When you know these numbers, it’s very simple to identify any outliers in the sample.
Depending on the size of your dataset, you may opt to receive 7 or 10 values instead of the largest and smallest 3, as I have done.
I’m not sure if this is a good way to detect outliers in Excel, but I did it that way a few years ago when I had to deal with a lot of financial data for work. This approach seemed to work the best out of all the ones discussed in the course.
5. How to Handle Outliers the Right Way?
We have already seen some of the techniques that will be used to locate outliers in our data set. But what should you do after discovering that there are outliers?
Here are a few approaches you may take to deal with outliers and ensure that your data analysis is accurate.
- Delete the Outliers:
Simply deleting outliers from your data collection is the simplest approach to do it. In this method, your analysis won’t be tainted.
When you have a huge dataset and the removal of a few outliers won’t affect the analysis as a whole, it’s a more practical option. Of course, you should make a copy of the data first and investigate the reasons for the outliers before destroying the original.
To determine the average of a given dataset while eliminating outliers, utilize the Excel TRIMMEAN function. The number of data points to omit is provided as a percentage. Both a decimal format and a percent format can be used to input percentage values.
How to remove outliers in excel
Let’s look at how to use the TRIMMEAN function to compute results with outliers removed.
Think about the photo up top. Data from Cells B2 to B13 is available. The majority of the numbers in this example range from 140 to 180, but two values—100 and 450—are somewhat outside of that range. These are the dataset’s outliers, then.
For our dataset, the formula to use the Excel TRIMMEAN function to obtain the mean (average) without the outlier values is,
=TRIMMEAN(B2:B13,0.2)
Here,
- B2:B13 = range of information to trim and determine the average outcome
- 2 (or 20%) = The quantity of data points to be omitted
Outliers are any numbers in the dataset that are 20% or more off from the rest of the dataset.
You will obtain the computed mean for your dataset without outliers if you enter the formula for your dataset and hit Enter. Cell E3 in our example contains the estimated mean, which is 156.
We use the AVERAGE function in Cell E2 to determine the accuracy of the response by returning the average (176) of all the values in the range B2 to B13. And in Cell E4, we manually entered all the cells excluding those that have outlier values when we ran another AVERAGE function, and we received a return of 156 as a consequence.
So, while calculating the average in Excel, the TRIMMEAN function may effectively exclude outliers from a given dataset.
- Normalize the Outliers (Adjust the Value):
When I had a full-time job, I used to try to normalize the outliers. I would just modify all the outlier values to a value that is marginally higher than the highest value in the data set.
This ensured that I wasn’t erasing the data while also making sure that my data wasn’t being skewed.
To give you an actual example, if you were evaluating the net profit margin of businesses, and most of them fell between -5% and 20%, but a few percentages were above 100%, I would just adjust these outlier figures to 25% or 30%.
These are a few of the techniques you can employ in Excel to identify outliers.
Once the outliers have been located, you can investigate the data to see what’s generating them while also selecting a method to deal with these outliers.
Application of Find Outliers in Excel and Remove Outliers
- Statistical Analysis Improvement: Detect and remove outliers to improve the accuracy and reliability of statistical measures like the mean, median, and standard deviation, ensuring more representative data analysis.
- Data Visualization Clarity: Identify and eliminate outliers in datasets to enhance the clarity and interpretability of charts and graphs, preventing skewed views and misinterpretations.
- Quality Control: Use outlier detection to identify anomalies in manufacturing or production data, helping to pinpoint and rectify defects, thus maintaining product quality standards.
- Financial Fraud Detection: Detect outliers in financial transactions to identify unusual patterns that could indicate fraudulent activity, enhancing financial security and compliance.
- Healthcare Data Accuracy: In medical data analysis, find and remove outliers to prevent erroneous conclusions and ensure accurate clinical decision-making and patient care.
- Market Research Refinement: Identify and exclude outliers from consumer behavior datasets to ensure market analysis is based on typical user responses, leading to more effective marketing strategies.
For ready-to-use Dashboard Templates: