Getting all records that meet a criteria with the =AGGREGATE() function

If you have a range of data and need to cut just a por­tion of it (based on one or more cri­te­ria), the =AGGREGATE() func­tion in Excel can help.

In this post, we will use a sim­ple data­base of expens­es with cat­e­go­ry and sub­cat­e­go­ry attached to every amount. Then we will see how to fetch only the records that match our inter­est in a par­tic­u­lar dimen­sion of data.

Get the file

The file below will be addressed in the para­graphs that follow:


The problem we try to solve

Let’s assume you are a fair­ly respon­si­ble adult who keeps track of his/her expens­es. Also, you are addict­ed to Excel so you use a spread­sheet for the purpose.

You record your charges on sheet All Spend­ings. Data is entered into an Excel table named Expens­es. The Cat­e­go­ry and Sub­cat­e­go­ry pro­vide gran­u­lar­i­ty to the amounts:

Table with all expenses.

Before I start — In the first col­umn on sheet Lists the unique cat­e­gories are pop­u­lat­ed and will be used for the drop-down lists in the spread­sheet. As we will see lat­er, the user will be able to select a par­tic­u­lar cat­e­go­ry and then the sub­cat­e­gories for it will appear in the sec­ond col­umn. Fur­ther­more, there is a post that describes how this is done (Extract­ing the unique items from a list of entries).

Getting the rows for a single category

On sheet Extract Cat Only we will choose a cat­e­go­ry and then get all rows of data regard­less of the subcategory.

All rows of data for the selected category are listed (regardless of the subcategory).

The for­mu­la in each col­umn is slight­ly dif­fer­ent but they fol­low the same mask. I will be explain­ing the one in the Sub­cat­e­go­ry column.

Extracting rows based on one criteria.
=IFERROR(INDEX(Expenses[Subcategory];AGGREGATE(15;6;(ROW(Expenses[Date])-ROW(Expenses[[#Headers];[Category]]))/(Expenses[Category]=CatOnly_Category_Choice);D2);1);“”)

From the inside out there are:

  1. The =AGGREGATE() func­tion that will be exam­ined in detail.
  2. In a nut­shell, this is an =INDEX() func­tion that looks into the Sub­cat­e­go­ry col­umn and returns the val­ue on the row that is sup­plied by the =AGGREGATE(). It sounds con­fus­ing but lat­er we will see it is actu­al­ly sim­ple. (This is like an INDEX/MATCH com­bo but with =AGGREGATE() instead of =MATCH())
  3. An =IFERROR() error handler.
The =AGGREGATE() itself

The =AGGREGATE() func­tion is going through the rows in the Expens­es table and returns only the ones that match our cri­te­ria. The four argu­ments employed are:

  • A — 15 is the code of the =AGGREGATE() func­tion to be used. The =AGGREGATE is like a Swiss knife. It is capa­ble of behav­ing like many oth­er func­tions but builds on top of them by adding the para­me­ter in the next bul­let. In our case, the code (15) means SMALL, there­fore, it will behave like the =SMALL() function.
  • B — The option to ignore val­ues lays with this argu­ment. Using 6 means all error val­ues will be ignored. Oth­er options are avail­able as well but in a moment we will see that we need this pre­cise behaviour.
  • C — This expres­sion con­structs the array from which the n‑th small­est row num­ber will be produced.
  • D — The posi­tion from the array (that we just built in the pre­vi­ous point) that should be returned (1st, 2nd, 3rd, …). In this case, I have decid­ed to use the val­ue in the ded­i­cat­ed col­umn but you can come up with a more sophis­ti­cat­ed approach.
Teaching the function to only match our category criteria

Focus­ing on the for­mu­la in the C sec­tion we have:

Constructing the array for the =AGGREGATE() function.
(ROW(Expenses[Date])-ROW(Expenses[[#Headers];[Category]]))/(Expenses[Category]=CatOnly_Category_Choice);D2);1)
  • First, we take the row num­bers of the Date col­umn. This can be any col­umn from the Expens­es table and to be hon­est, I picked this one at ran­dom. Nev­er­the­less, it will pro­duce pre­cise­ly the same result.
  • Then the row num­ber of the table head­er is sub­tract­ed. The result is the posi­tion (in the Expens­es table) for every row of it (1,2,3,…).
    This seems like being over-engi­neered way to get the sequence but it is the best way to do so with­out a sep­a­rate col­umn. The rea­son is that it works per­fect­ly regard­less of the loca­tion of the table on the sheet — you can move it 1000 rows down and the result­ing num­bers will start from 1 and go up nonethe­less.
  • Then the val­ues in the Cat­e­go­ry col­umn are test­ed against the one for which we want to extract rows (indi­cat­ed by the named range CatOnly_Category_Choice).
    CatOnly_Category_Choice
    Expenses[Category]=CatOnly_Category_Choice
    The com­par­i­son hap­pens for every row of the table. It pro­duces an array of TRUE/FALSE values.
  • The row num­ber is divid­ed by the TRUE/FALSE. In Excel divi­sion by FALSE is the same as divi­sion by zero. Need­less to say, this would result in an error.

At this point:

  1. We have a total of 20 rows of expens­es in the table.
  2. The for­mu­la would first gen­er­ate the num­bers from 1–20 to rep­re­sent them.
  3. Then it will per­form 20 tests of the val­ues in the Cat­e­go­ry col­umn. If the val­ue in the cells is the one we have cho­sen on sheet Extract Cat Only, the result will be TRUE. Oth­er­wise, the eval­u­a­tion will be FALSE. 20 boolean out­comes will be available.
  4. Divid­ing the num­bers from the 2nd point with the booleans from the 3rd will return:
    • The row num­ber gen­er­at­ed in (2) — as divi­sion by TRUE is like divid­ing by 1;
    • Or DIV/0 error because divi­sion by FALSE is attempted.
  5. A total of 20 num­ber or error fig­ures will be pro­duced and they will be an array used by the =AGGREGATE() function.
Confusing? See it in action!

Open the sec­tion below to see what val­ues are gen­er­at­ed for each row if the Gro­ceries cat­e­go­ry is selected.

Click to see how it all unfolds
[1] Date[2] Cat­e­go­ry[3] Sub­cat­e­go­ry[4] Amount[5] Row # of the cell in Date[6] Row # of the Cat­e­go­ry col­umn head­er[7] Posi­tion array = [5]-[6][8] Cat­e­go­ry is the one cho­sen[9] Result array = [7]/[8]
16.6.2020Gro­ceriesBak­ery1,89321TRUE1
16.6.2020Gro­ceriesDairy3,59422TRUE2
16.6.2020Gro­ceriesMeat6,99523TRUE3
16.6.2020Gro­ceriesCondi­ments1,99624TRUE4
16.6.2020Gro­ceriesSpir­its10,99725TRUE5
16.6.2020Restau­rantsLunch7,40826FALSE#DIV/0!
16.6.2020CarFuel83,00927FALSE#DIV/0!
17.6.2020Util­i­tiesElec­tric­i­ty26,781028FALSE#DIV/0!
17.6.2020Restau­rantsLunch9,001129FALSE#DIV/0!
17.6.2020Cloth­ingAll49,9912210FALSE#DIV/0!
17.6.2020CarAcces­sories10,9913211FALSE#DIV/0!
17.6.2020GiftsAll31,7014212FALSE#DIV/0!
17.6.2020Restau­rantsDin­ner26,0015213FALSE#DIV/0!
18.6.2020Med­icalDen­tist140,0016214FALSE#DIV/0!
18.6.2020Gro­ceriesFish14,9917215TRUE15
18.6.2020Gro­ceriesBak­ery1,8918216TRUE16
18.6.2020Gro­ceriesSpir­its12,9919217TRUE17
18.6.2020Gro­ceriesDairy3,5920218TRUE18
18.6.2020Restau­rantsLunch6,8021219FALSE#DIV/0!
The =AGGREGATE() handles the errors

The =AGGREGATE() now has every­thing need­ed to do its mag­ic. It has been instruct­ed to:

  1. Take the new­ly con­struct­ed array of num­bers and errors;
  2. Ignore the #DIV/0 because there is no use­ful infor­ma­tion (match­ing the desired cat­e­go­ry) avail­able on the respec­tive row in the Expens­es table.
  3. Then look through the left­over num­ber val­ues and find the small­est n‑th among them. If n=1 then this would be the low­est numer­i­cal — the very first row from the Expens­es table that has our cat­e­go­ry of choice. N=2 would be the sec­ond and so on.
Combining with the =INDEX() function

Know­ing which row from the Expens­es table we need means that the =INDEX() func­tion can be used to get its val­ue. In the INDEX/MATCH post, there is detail how this happens.

When we are look­ing for the Gro­ceries cat­e­go­ry the first row of data that will be fetched will be the first one from the Expens­es table. How­ev­er, switch­ing to the Car cat­e­go­ry means that the first row for which the =AGGREGATE() will pass val­ue to the =INDEX() func­tion is the 7th.

Depending on the category selected, different data rows will be passed to the results first.
Adding a second criteria

On sheet Extract Cat & Sub­cat we take it all one step fur­ther by intro­duc­ing sec­ond con­di­tion — the sub­cat­e­go­ry of the spend­ings. The for­mu­la remains the same with one small change:

Highlighting the change in the formula

When test­ing every line of the Expens­es table for the appro­pri­ate cat­e­go­ry (result­ing in TRUE/FALSE) we mul­ti­ply the result by anoth­er boolean test — this time for the sub­cat­e­go­ry of choice. If any test fails and pro­duces FALSE as a result, the mul­ti­pli­ca­tion will return FALSE over­all. This will cause the =AGGREGATE() func­tion to drop the row as demon­strat­ed above.

Why not use another solution? Pivots? Power Query? Plain filters?

Solv­ing the task as described has one advan­tage — it employs the real-time cal­cu­la­tions made by Excel.

  • Piv­ots need to be refreshed to get the new data entered.
  • Pow­er Query calls for refresh due to the same reason.
  • Fil­ters will visu­al­ize the result at least equal­ly fast but then the data won’t be so well pre­pared for fur­ther manipulation.

How­ev­er, using =AGGREGATE() is only one way to extract the rows need­ed and it is not nec­es­sar­i­ly always the best. For once, it is tax­ing on the CPU and once your data­base grows, this might be a prob­lem. Fur­ther­more, build­ing it would take more time and it will be wast­ed if your project is not intend­ed for an extend­ed peri­od of use. Also, if you hap­pen to enjoy piv­ot slicers, your visu­al pref­er­ences might not be satisfied.