Numbers and US

Story that numbers tell us

Interaction Variable

leave a comment »

Suppose in a laboratory experiment, we are trying to figure out sweetness of tea as a function of variable ‘quantity of sugar’ and ‘frequency of stirring’ .  With full day of methodical experiment we entered data in your experiment book.

 Sugar Exp

Now we run a linear regression analysis on our data to get following equation with adjusted R square value of 0.90 .

Sweetness = 1.93 sugar + 5.37 stirring freq – 6.43

Though adjusted R^2 is good enough, we create one more variable sugar*stirring freq. We run the regression model again with assumed relationship  Sweetness = c1 * sugar + c2 *sugar* sf + c3 .

Eureka!  we see that now we have the adjusted R^2  of 0.9966 .

Now the valid question would be, how could you have thought of adding a variable like sugar*sf ? True the choices are plenty,  we could have sugar*sugar* Sf or sf*sf .

The answer lies in exploratory data analysis. It’s always very helpful and insightful to plot all the explanatory variables with dependent variable, and see how do they change with respect to each other.


 Looking at the plot above, it would not have been difficult to try the equation that we tried to get such a good result. Varibles like these are called interaction variables. Think of the experiment we just tried, it seems logical that stirring would have more effect on sweetness when sugar quantity is high and vice versa.

Though a good look or understanding of explanatory variables is best guide to create an interaction variables, but when there exists higher order interaction effects, it gets cumbersome. Automatic Interaction detector or CHAID are statistical methods that have been developed to save us from this strainful mental exercise.

A real life examples:

  • Sale of Opera ticket: A statistical profile of opera ticket buyers reveal that they are both highly educated and upper income. This information can be leveraged to build a model for opera ticket buyers, but as we know all upper income segment are not highly educated; nor all highly educated belong to high income strata. In this case we would like to have a third variable that reflects the fact that a person is both highly educated and upper income. This third variable is interaction variable.

Written by SK

August 9, 2009 at 12:54 pm

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Humor, Sarcasm from & on Silicon Valley

Let's have a laugh together

Product Thinking

Peeling the layers of products that delight is the best place for your personal blog or business site.

%d bloggers like this: