Figure 1. Frequency from 2/2004 to 2/2008
Figure 2. Frequency from 4/2005 to 2/2008
Every once in a while, I get the inclination to try to do a little math. It’s a dangerous endeavor, but sometimes I can’t help myself. But before I get ahead of myself, I should give a little back ground.
Sometimes you learn a word and all of the sudden you see it everywhere. The nagging question is, was is always being used and you just glossed over it, or is there a change in the frequency of its use? You discovered the word precisely because it was starting to be used for more often.
Not too long ago, I found the word “awesomeness” entering my general vocabulary to signify great approval. Awesomeness feels fresh, a slight tweak on the new vintage “awesome” from the 80s, a personally influential era. However, in the past week, I’ve seen the word Awesomeness appear in a lot of places, from friends and strangers alike. I started thinking if the word has gaining traction in the public at large, so last weekend I started the quest to figure out ways to capture the growing use of the word.
Word frequency counts is nothing new. Media studies have been doing this type of research in newspaper and magazines for decades. However, it is becoming much more democratic, mostly because of the decreasing cost of computing and the related increase span of the web, which has been collecting data much easier. I decided to use the blogosphere because it seems like a pretty good proxy for general language usage. As well, the Google blog search feature allows you do set the dates for your search. Of course there are drawbacks to uses the word count of Goggle, including that the blogosphere is a obviously a subset of general usage and also because I have no idea how Goggle tracking and tallying the blogs it is indexing. Nevertheless, I can live with the approximation, given that the only cost for retrieving the data is bandwidth and time.
I searched for “awesome,” “awesomeness,” and “are” for each month from February 2004 to February 2008.
The word “are” was used as a control of sorts. Because the blogosphere itself it constantly growing, the number of times a word appears in a given month is expected to increase. Using an often used word such as “are” can be a proxy of the overall growth of the blogosphere. One could not expect the rate of the word “are” to fluctuate greater from month to month. Any true increase of the word usage would have to outpace the growth rate of “are.”
I also tracked the usage of the root word “awesome” for a couple of reasons. Sometimes search engines clump different variations of the same root word together in its search results. I wanted to check to see that “awesomeness” wasn’t being put together with “awesome.” The two are also an interesting comparison. If both increased at similar rates, then maybe what I am seeing is just an overall revival of 80s idioms. However, if “awesomeness” is also increasing at a great rate than “awesome” my original suspicions would be validated.
In the short time I started on this little math adventure, the word kept on appearing, and in the write up of my findings, I came across the the ultimate reference, apparently a website which declared March 10 (last week) International Day of Awesomeness.
Just looking at the words appearance in the past two full months, this is what I found:
February-2008: Awesomeness: 17,182 ; Awesome: 736,783 ; Are: 61,531,049
January-2008: Awesomeness: 9,627 ; Awesome: 429,769; Are: 57,214,958
There is clear jump in the past two months, but what does that jump mean, if the total number of blog pages continues to grow? Both Awesomeness and Awesome almost doubled as compared to Are, but how do you measure the significance of that? Graphing all the frequency of these three words against each other is hard because are orders of magnitude higher then the others. My math coach Pam (yes, I actually call her that) suggested I take the log of my data to make it more comparable. If your recall high school math, log (1,000,000) = 6, log(100,000) = 5, and log(10,000) = 4. Now, if you take the log of all your data points, the curves can fit on a single graph of manageable size. It even gets better, because it translates an exponential curve into linear curve, which makes finding the growth rate (i.e. slope of the curves, which is the rise over run of the function) much easier.
If you look at the two figures, you’ll notice an upwardly trend. There is a peculiar elbow in the spring of 2005, which could be a big spurt of growth or some aberration of Goggle’s indexing. After looking at the first graph, I decided to draw another graph to focus in other growth of all three term’s use in the past year and focus the analysis on that because I wanted to fit a linear line to the curves and removing the bend would give me a closer fit. (Is that cheating?)
I fed the curves into excel to fit linear functions, and can see that Awesomeness has a slope of 0.0003 versus Awesome which has a slope of 0.0001. This is good, because it means Awesomeness is being used at rate that is 3 time more than Awesome. However, Are (our base line) has a slope of 0.001. This is sort of bad, because 80s slang doesn’t seem to be outpacing the general growth of blogs, which I was hoping to see.
I’m not sure what to make of it, in the end. However, it does have me thinking about blogs from a higher altitude and that math is pretty awesome. Many thanks to Pam and Wojciech who gave me some good nudges. Of course, I’ll take the blame for the conclusions. I’m curious to hear what my math friends say, especially if they find mistakes in my logic. Also, it has taken me much too long to post this, which is why I’m just throwing what I have up. I’ll post any corrections later including typos.