Info

You are currently browsing the archives for the array functions category.

May 2012
M T W T F S S
« Apr    
 123456
78910111213
14151617181920
21222324252627
28293031  

Archive for the array functions Category

Pivoting with medians and filtering frequency distributions in Excel

I was recently doing some analytic work using  spreadsheets; I was using pivot tables because they’re quick to make, but also felt that the median average best represented my data sets. The Excel pivot table does not support the median as a calculated value.

To solve this problem I used boolean functions and named arrays.  This technique can also be used to filter frequency distributions.

Once again here is my data.

The Sample Data

Figure 1: Data Set

I name the columns in the data array, in this case, I have named the names ‘names3′ and the values ‘values3′ because they are on the third tab called ‘Booleans + FDs’ in the sample spreadsheet.

Functions such as MAX & MEDIAN will take boolean functions instead of an array value and evaluate the boolean function as a filter. These functions must be saved as array functions i.e. by using the [Ctrl][SHIFT] & [ENTER] keys.

{=MAX(IF(names3=J9,values3))}

where J9 contains a string which exists within the names column of the data set.

{=MEDIAN(IF(names3=J9,values3))}

does the same for using the Median function and the picture below shows the effect of using these functions. This is common Excel *IF syntax. The first argument specifies a condition where a 1-dimensional array i.e. a column is specified by name and evaluated, in this case, against a value. In the example above the value is specified as the contents of a cell, an explicit string declaration also works. The second IF function argument is an array of values, and the corresponding value by position in the array is included in the results set. (I have not tested the effect of having different sized arrays.) This IF function replaces the array in the outer function, in the case of our examples above, MAX & MEDIAN.

Technically, this is not a pivot table; the names, Alfie, Bill … etc, would need to be on the horizontal axis for this to be true, but the Excel techniques will work in that case as well. The sample spreadsheet now includes a pivot table example. (DFL 7 Feb 2011.)

Figure 2: Filtered Median and Max

This technique also works for frequency distributions. In order to build a frequency distribution, one needs to define the class buckets. I am going to build f(Dave) which has the set values (4,8,8) .  My FD maximum is going to be 10, and I am going to have an interval of 1. My buckets are thus  (0,1,2,…,10). There are eleven elements. In the example file, I name the bucket array , “buckets”. In order to create an FD using the FREQUENCY function one must select an empty array of the same size as the bucket array and then insert the following function,

{=FREQUENCY(IF(names3=F1,values3),buckets)}

The {} means that this is an array function, and must be saved using  [Ctrl][SHIFT] & [ENTER] keys. The frequency distribution appears below.

a filtered frequency distribution

Figure 3: Filtered Frequency Distribution

In case you missed the referene above, I have updated the the sample spreadsheet which is available over http to include the examples in this blog article. The example sheet now includes a pivot table example. (DFL 7 Feb 2011.)

I discovered the boolean values with the help of my Citihub colleague, Tim Reczek. Thanks Tim. He also pointed me at these three links, which I found useful while solving this problem, these sites also have answers to a bunch of other problems.

Edited: Dave 7th Feb 2011

|