If you have a range of data and need to cut just a portion of it (based on one or more criteria), the =AGGREGATE() function in Excel can help.
In this post, we will use a simple database of expenses with category and subcategory attached to every amount. Then we will see how to fetch only the records that match our interest in a particular dimension of data.
Get the file
The file below will be addressed in the paragraphs that follow:
The problem we try to solve
Let’s assume you are a fairly responsible adult who keeps track of his/her expenses. Also, you are addicted to Excel so you use a spreadsheet for the purpose.
You record your charges on sheet All Spendings. Data is entered into an Excel table named Expenses. The Category and Subcategory provide granularity to the amounts:
Before I start — In the first column on sheet Lists the unique categories are populated and will be used for the drop-down lists in the spreadsheet. As we will see later, the user will be able to select a particular category and then the subcategories for it will appear in the second column. Furthermore, there is a post that describes how this is done (Extracting the unique items from a list of entries).
Getting the rows for a single category
On sheet Extract Cat Only we will choose a category and then get all rows of data regardless of the subcategory.
The formula in each column is slightly different but they follow the same mask. I will be explaining the one in the Subcategory column.
From the inside out there are:
- The =AGGREGATE() function that will be examined in detail.
- In a nutshell, this is an =INDEX() function that looks into the Subcategory column and returns the value on the row that is supplied by the =AGGREGATE(). It sounds confusing but later we will see it is actually simple. (This is like an INDEX/MATCH combo but with =AGGREGATE() instead of =MATCH())
- An =IFERROR() error handler.
The =AGGREGATE() itself
The =AGGREGATE() function is going through the rows in the Expenses table and returns only the ones that match our criteria. The four arguments employed are:
- A — 15 is the code of the =AGGREGATE() function to be used. The =AGGREGATE is like a Swiss knife. It is capable of behaving like many other functions but builds on top of them by adding the parameter in the next bullet. In our case, the code (15) means SMALL, therefore, it will behave like the =SMALL() function.
- B — The option to ignore values lays with this argument. Using 6 means all error values will be ignored. Other options are available as well but in a moment we will see that we need this precise behaviour.
- C — This expression constructs the array from which the n‑th smallest row number will be produced.
- D — The position from the array (that we just built in the previous point) that should be returned (1st, 2nd, 3rd, …). In this case, I have decided to use the value in the dedicated column but you can come up with a more sophisticated approach.
Teaching the function to only match our category criteria
Focusing on the formula in the C section we have:
- First, we take the row numbers of the Date column. This can be any column from the Expenses table and to be honest, I picked this one at random. Nevertheless, it will produce precisely the same result.
- Then the row number of the table header is subtracted. The result is the position (in the Expenses table) for every row of it (1,2,3,…).
This seems like being over-engineered way to get the sequence but it is the best way to do so without a separate column. The reason is that it works perfectly regardless of the location of the table on the sheet — you can move it 1000 rows down and the resulting numbers will start from 1 and go up nonetheless.
- Then the values in the Category column are tested against the one for which we want to extract rows (indicated by the named range CatOnly_Category_Choice).
The comparison happens for every row of the table. It produces an array of TRUE/FALSE values.
- The row number is divided by the TRUE/FALSE. In Excel division by FALSE is the same as division by zero. Needless to say, this would result in an error.
At this point:
- We have a total of 20 rows of expenses in the table.
- The formula would first generate the numbers from 1–20 to represent them.
- Then it will perform 20 tests of the values in the Category column. If the value in the cells is the one we have chosen on sheet Extract Cat Only, the result will be TRUE. Otherwise, the evaluation will be FALSE. 20 boolean outcomes will be available.
- Dividing the numbers from the 2nd point with the booleans from the 3rd will return:
- The row number generated in (2) — as division by TRUE is like dividing by 1;
- Or DIV/0 error because division by FALSE is attempted.
- A total of 20 number or error figures will be produced and they will be an array used by the =AGGREGATE() function.
Confusing? See it in action!
Open the section below to see what values are generated for each row if the Groceries category is selected.Click to see how it all unfolds
| Date|| Category|| Subcategory|| Amount|| Row # of the cell in Date|| Row # of the Category column header|| Position array = -|| Category is the one chosen|| Result array = /|
The =AGGREGATE() handles the errors
The =AGGREGATE() now has everything needed to do its magic. It has been instructed to:
- Take the newly constructed array of numbers and errors;
- Ignore the #DIV/0 because there is no useful information (matching the desired category) available on the respective row in the Expenses table.
- Then look through the leftover number values and find the smallest n‑th among them. If n=1 then this would be the lowest numerical — the very first row from the Expenses table that has our category of choice. N=2 would be the second and so on.
Combining with the =INDEX() function
Knowing which row from the Expenses table we need means that the =INDEX() function can be used to get its value. In the INDEX/MATCH post, there is detail how this happens.
When we are looking for the Groceries category the first row of data that will be fetched will be the first one from the Expenses table. However, switching to the Car category means that the first row for which the =AGGREGATE() will pass value to the =INDEX() function is the 7th.
Adding a second criteria
On sheet Extract Cat & Subcat we take it all one step further by introducing second condition — the subcategory of the spendings. The formula remains the same with one small change:
When testing every line of the Expenses table for the appropriate category (resulting in TRUE/FALSE) we multiply the result by another boolean test — this time for the subcategory of choice. If any test fails and produces FALSE as a result, the multiplication will return FALSE overall. This will cause the =AGGREGATE() function to drop the row as demonstrated above.
Why not use another solution? Pivots? Power Query? Plain filters?
Solving the task as described has one advantage — it employs the real-time calculations made by Excel.
- Pivots need to be refreshed to get the new data entered.
- Power Query calls for refresh due to the same reason.
- Filters will visualize the result at least equally fast but then the data won’t be so well prepared for further manipulation.
However, using =AGGREGATE() is only one way to extract the rows needed and it is not necessarily always the best. For once, it is taxing on the CPU and once your database grows, this might be a problem. Furthermore, building it would take more time and it will be wasted if your project is not intended for an extended period of use. Also, if you happen to enjoy pivot slicers, your visual preferences might not be satisfied.