Info

You are currently browsing the archives for the excel category.

May 2012
M T W T F S S
« Apr    
 123456
78910111213
14151617181920
21222324252627
28293031  

Archive for the excel Category

Box Charts in Excel 2007

In my last piece of work, I needed to create some box charts using excel 2007. I have always planned to document this for the world, (or my little readership at least) but when my colleague Jim Barrett showed me the button in the Sparklines menu that does them in Office 2010, I decided to get this done in a hurry. As I said in my last blog article on Excell, I am losing track off Open Office, but this technique also works in Open Office 3.2.1. Most of this article is about making it work in Excel, but I have described how to set error bars in Open Office.

I have amended my trusty example file to include the sample data and some intermediate steps and to generate the pictures for this article. The example is on the tab named ‘Box Charts’. I also uploaded an example Open Office spreadsheet. I have generated the sample data set and created a frequency distribution and associated graph. The final piece of pre-preparation is to create an item/values table to capture the important facts about the distribution.

Example Data Summary for Box Plots

I  use a ’stacked bar chart’ chart type with error bars and have created a table that has as its first three elements the first quartile value, the median value minus the first quartile, and the third quartile value minus the median.  Column 4 is the -ive error bar and is first quartile minus the minimum value , and the final column is the +ive error bar which is the maximum value minus the third quartile. This transformation is required because we are using stacked bar charts, therefore each charted element needs to be the difference between the quartiles. The table illustrated below describes the calaculation as well as performs it. Mu represents the Median.

table driving a box chart, detailing differences

I then created a stacked horizontal bar chart using the wizard, having selected the first three columns only.

Insert Stacked Bar Charts

I have used the Design Chart Styles Widget to change the design to make all three series green and to ensure they have a border; we will need a border to illustrate the median. Now it is necessary to insert the error bars and make the bar chart element representing zero to the first quartile invisible. If you do it in this order, it’s probably marginally easier. The insert error bars dialog is on the Chart Tools - Layout menu.

Chart Tools, Layout Buttons

I insert the error bars one at a time. In order to insert the -ive error bar, select series 1, i.e. the graph bar representing column 1,  and use the “Error Bar”, “More Error Bars” options. Select “Custom Values” and assign the value of ‘Min to Q1′, Column 4 in my example to the -ive entry box. The fourth quartile is represented by assigning a +ive error bar to Series 3, i.e column 3 using the Max to Q3 value, which is Column 5 in the example. It is the assignment of the +ive error bars that is illustrated immediatly below. Since we are inserting the error bars one at a time and assiging them to different series in the chart, the radio button widget must be set to ‘plus’, and the ‘Cap’ option set.

Box Plots, setting an error bar

For Open Office V3, when you selet the data series, it offers you the context menu option of Insert Y error bars. Invoke this dialogue and set the ‘Error Category’ radio button to ‘Cell Range’ and then set the appropriate cell range values to Column 4 for the -ive bar, assigned to Series 1 and Column 5, for the +ive bar assigned to Series 3.

Open Office Error Bar Dialogue Box

I then use Format Series to set Series 1’s display options to No Fill & No Line. The final graph appears below.

Box Plot, the finished article

The box chart now shows the 1st and 4th quartiles as horizontal lines, the second and third quartiles as boxes, and the median as a vertical line seperating the two boxes.

Selecting from an excel array using two or more criteria

In some of the more complex spreadsheets I have built recently I have wanted to select data from a table based on two conditions. This can be done using the functions COUNTIFS() and SUMIFS(). Once again using range names helps. These functions offer one the the equivilent of boolean AND, and they can be made to implement OR as well.

I have built an example which is in the tab named IFS in my example spreadsheet, which also contains the examples from my previous articles about excel.

My sample data set consists of a table, with our friends who have a key of their name. Each person has a nationality, a gender and a value column.

The Sample Data

I have then built a summary table using the two functions. For COUNTIFS(), the syntax consists of pairs of a ranges and a value. The ranges can be names of ranges or explicit rnage declarations and the value can be a cell or a value. (I have used the UNIX shell || as an “or” in the example below.

=COUNTIFS(nationality4, ${cellcontaining a value}||Value)

the range, value pairs can be repeated as they are in the example file.

Going back to our example data table, see above, in order to add the value column according to two criteria, for example in order to know the (nationality,  gender, sum(value)) matrix, you need SUMIFS(). This is similar to COUNTIFS() but preprends the summed column to the range value pairs, so

=SUMIFS(values4,  nationality4, ${cellcontaining a value}, gender4, ${cellcontaining a value})

The data results and example syntax are shown below.

Results Table

To explain the syntax strings, the values in the range/value pairs are explict cell references. Column G is the list of Countries on the left of the table and row 10 contains the Male/Female values.

These functions implement logical AND, if you want OR functions you’ll need to add two expressions together, but see below and if you want to implement NOT, then you’ll need to subtract. If you want an exclusive OR, i.e the sum of those cases which meet condition 1 or condition2 then you will need to add the two sums together and subtract those cases which meet both criteria.

Sum(Value, OR) = sumifs(V, R1,C1) + sumifs(V,R2,C2)

- sumifs(V, R1,C1,R2,C2)

This is because those cases which meet both criteria will be in both the first two sums. If you wanted a true exclusive OR, then you should subtract two times the value of the final function so as to subtract the cases where both C1 & C2 are true from both the first and second function. Some with more recent mathematics training could probably make something of that.

These functions are not available in my version of Open Office V3.2.1 I obviously need to move forward to Libre Office, the check for updates on my OO install failed. Sigh! The Open Office file has been updated and is available here….

There is a Microsoft Help page on SUMIFS,  my bookmarks tagged excell are on delicious.com.

Pivoting with medians and filtering frequency distributions in Excel

I was recently doing some analytic work using  spreadsheets; I was using pivot tables because they’re quick to make, but also felt that the median average best represented my data sets. The Excel pivot table does not support the median as a calculated value.

To solve this problem I used boolean functions and named arrays.  This technique can also be used to filter frequency distributions.

Once again here is my data.

The Sample Data

Figure 1: Data Set

I name the columns in the data array, in this case, I have named the names ‘names3′ and the values ‘values3′ because they are on the third tab called ‘Booleans + FDs’ in the sample spreadsheet.

Functions such as MAX & MEDIAN will take boolean functions instead of an array value and evaluate the boolean function as a filter. These functions must be saved as array functions i.e. by using the [Ctrl][SHIFT] & [ENTER] keys.

{=MAX(IF(names3=J9,values3))}

where J9 contains a string which exists within the names column of the data set.

{=MEDIAN(IF(names3=J9,values3))}

does the same for using the Median function and the picture below shows the effect of using these functions. This is common Excel *IF syntax. The first argument specifies a condition where a 1-dimensional array i.e. a column is specified by name and evaluated, in this case, against a value. In the example above the value is specified as the contents of a cell, an explicit string declaration also works. The second IF function argument is an array of values, and the corresponding value by position in the array is included in the results set. (I have not tested the effect of having different sized arrays.) This IF function replaces the array in the outer function, in the case of our examples above, MAX & MEDIAN.

Technically, this is not a pivot table; the names, Alfie, Bill … etc, would need to be on the horizontal axis for this to be true, but the Excel techniques will work in that case as well. The sample spreadsheet now includes a pivot table example. (DFL 7 Feb 2011.)

Figure 2: Filtered Median and Max

This technique also works for frequency distributions. In order to build a frequency distribution, one needs to define the class buckets. I am going to build f(Dave) which has the set values (4,8,8) .  My FD maximum is going to be 10, and I am going to have an interval of 1. My buckets are thus  (0,1,2,…,10). There are eleven elements. In the example file, I name the bucket array , “buckets”. In order to create an FD using the FREQUENCY function one must select an empty array of the same size as the bucket array and then insert the following function,

{=FREQUENCY(IF(names3=F1,values3),buckets)}

The {} means that this is an array function, and must be saved using  [Ctrl][SHIFT] & [ENTER] keys. The frequency distribution appears below.

a filtered frequency distribution

Figure 3: Filtered Frequency Distribution

In case you missed the referene above, I have updated the the sample spreadsheet which is available over http to include the examples in this blog article. The example sheet now includes a pivot table example. (DFL 7 Feb 2011.)

I discovered the boolean values with the help of my Citihub colleague, Tim Reczek. Thanks Tim. He also pointed me at these three links, which I found useful while solving this problem, these sites also have answers to a bunch of other problems.

Edited: Dave 7th Feb 2011

Pie of Pie in Excel

I used an Excell 2007 Pie of Pie chart the other day for the first time. It may be that these are not very sound statistically but they can be powerfull graphics. Here’s how I did it.
I created a data table which occupies B2:C13. This contains all the name value pairs that exist, and I have created a secondary/summary table at E2:I5.  I shall plot the sum of the values in my base data table. The diagram immediately below shows how the summary table is constructed. It uses the=SUMIF function and named ranges. B2:B13 is named “names”, and C2:C13 is named “values”. SUMIF takes three arguments, the test range, the test value and the range to summed. In this case, I have used a relative test value, held in row 2. The Summary table holds the data to be plotted in the Pie Chart.I used the Insert >> Pie Chart to create the Pie Chart at the bottom of the picture below, and used Insert/Format Data Labels to label the Pie Chart. The reason I have gone through this is that a Pie Chart only represents one series of data and in order to make the Pie of Pie charts work, one has to build a new table; a Pie of Pie charts emphasis a sub set of the data plotted, but not by category.
Excell Pie of Pie

Figure 1: A standard Pie Chart.

Since Dave is my higest scoring category, I decide to explode Dave using the secondary pie chart in the Pie of Pie Chart. As stated I need a new table, which has each entry for Dave as a seperate line in the table. This is shown in Figures 2 & 3 below.

Pie of Pie, and menu graphic

Figure 2: Default Pie of Pie Chart.

The table at the top of Figure 2  is the re-arrangement of the data, which you can compare with the original data set in Figure 1. The Graph shows you what “Insert >> Chart >> Pie of Pie” creates as a default. Now we need to set the labels, change the membership of the smaller pie chart and change the size of the smaller pie chart. You need to use “Format Data Series” to do this, by selecting the included category in the larger pie chart and using the right click pull down menu. This opens the Format Data Series menu.

Format Data Series Graphic

Figure 3: Format Data Series.

The Format Data Series menu allows you to manipulate the number of members of the second set, and the size of the second plot. The spreadsheet in which I did this is available for download here….

Having written and published this article, I was thinking about it one the way home and came to the conclusion, that if one had a data series with a few very large members, and a lot of similar sized smaller members, then Pie-of-a-Pie could be used to illustrate the distribution because it offers two scales. You could collect the smaller members into the second pie. This would not involve creating a new table, but you would have to sort your data table so that the candidate members of the smaller chart are contiguous. I have created an example, see Figure 4 below, and this is now in a second tab in the demo spreadsheet DL 17-12-2010

Two Pie Charts

Figure 4: Two Charts showing labeled large and small values.

I have tried to import the speadsheet into Open Office, but OO doesn’t support Pie of Pie.

|