Tips on large number of SUMIFS slowing down the spreadsheet

Regard­less of how good you live, even­tu­al­ly, you will be sum­ma­riz­ing large amounts of data with a count­less num­ber of =SUMIFS(). Let’s assume you have tens of thou­sands of records that you aggre­gate based on cri­te­ria in 6 columns. If you do it in a sin­gle cell Excel is going to be ready in a heart­beat. How­ev­er, you have to do it in 24 columns on 500 rows for 25 sheets. As a result, your spread­sheet is painful­ly slow and takes for­ev­er to cal­cu­late. So what can you do if =SUMIFS() slow down the spread­sheet you work on?


In a hurry? Don’t have the time to read long posts?

Here is an out­line for the steps to follow:

  1. Change your source data struc­ture to have (much) low­er vol­ume — export of mil­lion rows can be piv­ot­ed or deliv­ered already aggre­gat­ed into much more com­pact set. Best way to tack­le the issue is to reduce the size of the database.
  2. Be obsessed with not hav­ing to cal­cu­late the =SUMIFS(). This includes:
    • Set­ting prop­er rules when to go into the calculation;
    • Get­ting as much fig­ures as pos­si­ble from cells & sheets where fig­ures have already been crunched.
    • Get­ting rid of any­thing non-essen­tial in terms of aggre­ga­tions that put extra strain on your CPU.
  3. Last but not least — if this is pure­ly report­ing exer­cise, try to move it to a BI sys­tem. Excel is great but it does­n’t mean oth­er plat­forms do not exist — often sig­nif­i­cant­ly bet­ter at han­dling tons of data.

A few words before I start

I guess I have been avoid­ing the sit­u­a­tion described in the begin­ning sim­ply because (large­ly uncon­scious­ly) I have always been prac­tis­ing some of the advice I am giv­ing next. OK, not always — only after the first time I cre­at­ed a dread­ful report — and then con­clud­ed that one’s job is tremen­dous­ly bor­ing when stuck wait­ing for a cal­cu­la­tion to fin­ish. Sad­ly, I recent­ly inher­it­ed one such mon­stros­i­ty of a file and now I have to deal with it. On the bright side, I get to write a new blog post (long-delayed because of my new job).

Before I start, a hand­ful of notes on the =SUMIFS():

  • It is not only fast but smart as well — I don’t know pre­cise­ly how it is cod­ed but my expe­ri­ence (and that of many oth­ers) shows that if you have emp­ty ele­ments in the ranges, they won’t have too much impact on the speed of exe­cu­tion. Often peo­ple would advise you not to include the entire row/column in the func­tion argu­ments and only stick to the actu­al range size you are sum­ming. This is good advice but I would argue that

    and

    have per­for­mance sim­i­lar enough.
    This does­n’t mean I encour­age the first type of address­ing (quite the oppo­site) but when focus­ing on speed only, I would­n’t mind it. 
  • In its most clas­sic syn­tax, shown in the pre­vi­ous point, the func­tion is volatile. Essen­tial­ly it means that when Excel recal­cu­lates some­thing, any­where in your open work­books (plur­al!), it will cause the =SUMIFS() to recal­cu­late as well. This is impor­tant as it is an obvi­ous drag on the per­for­mance of even small tasks. Prob­a­bly, it hap­pens to be one of the main rea­sons for your suffering.
  • As of 2020 using this for­mu­la in most offices is still con­sid­ered advanced. I will debate this anoth­er time but I am mak­ing the point to come back to it later.

So… what do we do?

The first question to ask yourself is if you really need to =SUMIFS()?

This is a legit ques­tion. Fre­quent­ly peo­ple would take the entire data export and put in a sheet. I get it — it is con­ve­nient to be there when you need to inves­ti­gate fig­ures; it is quick and easy; and you have been work­ing that way since dial-up was the way to con­nect to the Inter­net. How­ev­er, that giant DB is sta­t­ic and con­stant­ly loop­ing through it is inefficient.

Compress your data

Data can be struc­tured in a way that is good enough to feed your spread­sheet while not being too tax­ing on the per­for­mance. In my case, I used Pow­er Query to manip­u­late the data first. The same can be achieved with piv­ots or (most prob­a­bly) with the soft­ware you use to make your extracts. As a rule of thumb, the more sophis­ti­cat­ed your tool, the bet­ter. I used PQ because (apart from it being the tool I have) I can eas­i­ly make dif­fer­ent tables and append them to one anoth­er — some­thing hard­ly achieved with just piv­ots. How­ev­er, when years go I had this option, I would hap­pi­ly do the same in SAS while get­ting fig­ures from the DWH.

Cut down on the number of rows

You want to get from 700000 rows of this (avail­able in this archive)

DB regular view.

to as few­er lines of sales aggre­ga­tions as possible:

This piv­ot­ing can dra­mat­i­cal­ly cut the amount of data. Fur­ther, if you are not cer­tain what kind of col­umn com­bi­na­tions you will have to use when sum­ming — gran­u­lar­i­ty is not lost.

Reduce the number of conditions

Hav­ing 6 cri­te­ria columns means that Excel will make the top-to-bot­tom trip half a dozen times before it arrives at the fig­ure you need. This can be avoid­ed by com­bin­ing the cri­te­ria columns into 1. Such fold­ing is pos­si­ble only when gran­u­lar­i­ty is not a prob­lem (and even then there are ways around it). You can con­vert the piv­ot above into this form:

DB pivoted with indices.

From there you can still you use =SUMIFS() or the good old INDEX/MATCH combination.

When I used the last approach on the report I was fight­ing, cal­cu­la­tion time dropped from sev­er­al min­utes to a few seconds.


But…

There is a seri­ous draw­back with this solu­tion. You are prob­a­bly not work­ing alone. There­fore, you have to con­sid­er your peers when design­ing your spread­sheet. Most Excel users don’t know about the exis­tence of Pow­er Query and they are even less aware of the oth­er soft­ware you might involve. There­fore, you might find your­self fight­ing not only Excel’s cal­cu­la­tion engine but human iner­tia as well. This is a very dif­fer­ent ball game.

Due to var­i­ous fac­tors, you might have to stick to =SUMIFS() and a large data­base. This does­n’t mean there is noth­ing you can do.

SUMIFS only when needed

I might sound like Cap­tain Obvi­ous but peo­ple would often just drag their for­mu­la and wait for the result. Per­fect­ly fine as long as the large amount of under­ly­ing data is not call­ing for a more sen­si­ble approach:

  • First, check if the for­mu­la cell real­ly need be cal­cu­lat­ed. Putting =IF() before a CPU-heavy expres­sion has the poten­tial to pre­vent it from going off. As long as your log­i­cal tests are not equal­ly inten­sive as the cal­cu­la­tions they pre­vent, their pres­ence is jus­ti­fied. As a gen­er­al rule, I would not shy away from adding extra row/col if the flags in it help me to offload some work from the processor.
  • Clean up your work­sheet from all cal­cu­la­tions that are not real­ly need­ed. Some­times we would add rows/columns sim­ply because we con­sid­er them good to have. If they con­tribute to the slow down, it is bet­ter to trim down.
  • Check if data can’t be obtained from fig­ures already cal­cu­lat­ed. A subto­tal, for exam­ple, can be the very light sum of its ele­ments or an entire­ly new =SUMIFS(). Take the time to get rid of sim­i­lar cas­es and make your file more read­able at the same time.
  • Keep in mind that some­where else in your work­book, the data might be read­i­ly avail­able. Your divi­sion lev­el report can scan your source data or sim­ply sum the sheets/figures for the subdivisions.
Move it all to BI

Unless your design is spec­tac­u­lar­ly bad, chances are that wait­ing for =SUMIFS() to fin­ish means you had to push Excel beyond its lim­its. Those are eas­i­ly reached once data starts to grow from ‘ordi­nary’ towards ‘big’. At one point, chang­ing the plat­form might be a much bet­ter choice than strug­gling with a spread­sheet. This is true both in report­ing and mod­el­ling. You might have reached the point where a BI solu­tion, albeit not as flex­i­ble as Excel, would fare better.

Most of the BI tools:

  • Are run­ning on a servers way more pow­er­ful than your base station/laptop;
  • Go hand in hand with data com­pres­sion or at least bet­ter data architecture;
  • Would have the entire job sched­uled or ran at a sin­gle click;
  • Can eas­i­ly scale up to deal with what­ev­er vol­ume you throw at them;
  • And even though they are not as eas­i­ly main­tained as a spread­sheet, this might actu­al­ly dis­cour­age too fre­quent changes — some­thing much appre­ci­at­ed if the man­age­ment acts erratically.

Some peo­ple argue that they actu­al­ly need to make changes and mas­sage the data once they see the result. I would hard­ly believe it is human­ly pos­si­ble to deal man­u­al­ly with more data that Excel can stom­ach. This sit­u­a­tion would sug­gest a more fun­da­men­tal prob­lem. Tak­ing a step back to recon­sid­er your way of doing the job might be the appro­pri­ate step.


Some­where in the Microsoft doc­u­men­ta­tion, I read that peo­ple tend to switch off the auto­mat­ic cal­cu­la­tion if it takes more than 3–10 sec­onds for Excel to crunch the num­bers. This is easy to reach. Try hard not to be in that place.