A few days ago, Google released a new experimental service called Google Correlate. It is similar to Google Trends in that it analyses the numbers of times search terms are used on Google. The difference, as the name suggests, is that it allows you to find the correlations between different search terms. In simple terms, a correlation is used to show how often a high level of one thing is found at the same time as a high level of another. Correlation doesn’t necessarily mean that one thing causes the other, just that the 2 things are seen together at the same time. So the number of wrinkles counted on the forehead is correlated to the number of heart attacks someone has had. Wrinkles don’t cause heart attacks or vice versa. But there is a correlation between them (one increases as the other increases).
When you do a search on Google Correlate, it returns the search terms that are most highly correlated with the terms you entered. Unfortunately that all too often results in misspellings of those terms being highly correlated with your search such as ‘obama’ being correlated with ‘0bama’ (Obama misspelt with a zero). Unfortunately, Google correlate launched with a feature lacking, which I think would be the most fun: finding the correlation between any 2 terms. I have made a little page that will force google correlate to do this for you.
Using this method I came up with some interesting observations. My first search was to look at Easter and Christmas:
You can see as each of these holidays approach, the numbers of searches increases. However they aren’t correlated (r=-0.1292). In fact, if you look below you’ll notice that when people are search for one they are pretty much not searching for the other!
I thought that two terms that would be more correlated with each other would be Easter and chocolate. To my surprise, it’s not the case. In fact there’s a bigger correlation between Christmas and Chocolate (see below).
Another feature of Google Correlate is that it allows you to map out where in the US terms are highly correlated. So it turns out that in states where people are searching for Christmas, they are also searching for signs of colon cancer:
Does this means that Christmas causes colon cancer? No Scrooge, it doesn’t. It simply means that those states that search for one search for the other. Maybe it has to do with the average socio-economic background of those states, or religious beliefs (or lack thereof) that lead to a person worrying about what might be growing inside their belly.
The last couple of features I would like to introduce to you are the data upload and draw features. If you have your own raw data, you can upload it to see how it correlates. As most of us don’t have data to upload, it is much more fun to use the draw tool to find correlations. One that I drew randomly was a wave that gets high around 2006, drops down, then goes up again to 2009, then drops again. It turns out that the most highly correlated term is “2006 mercedes benz”. This is also true of “2006 audi” and “2006 jaguar”. It seems that people are more interesting in buying luxury cars about 3 years after they are released, to give them time to depreciate a little.