Martian vacation hotspots: why aliens love Seattle.

I’ve been having a lovely time playing with a dataset containing 60,000 UFO sightings. They come with locations (city/state), and a free-text description field that’s pretty fascinating to read.

To get an idea of the distribution of sightings, I plotted the frequency of UFO sightings by state, normalized by the number of residents. There’s a high number of sightings in northern New England, but the west coast seems to be the biggest epicenter of UFO sightings. Washington state in particular has vastly more UFO sightings per capita than any other state.


Clearly this indicates that when aliens visit Earth, they want to visit the space needle. To get a better sense of what the people reporting the sightings were experiencing, I made a word cloud of the free-text descriptions accompanying the reports from Washington.


Lots of bright lights moving around in the sky!

Continue reading

Freeing my Fitbit data

I love my Fitbit, but I want to play with the data myself, not just look at the graphs the fine folks at Fitbit have decided I should see. Fortunately, there’s a great step-by-step guide with a script to download Fitbit data into a Google Drive spreadsheet, which handles interfacing with the Fitbit API. 

I downloaded my data beginning with April 2012 when I started using Fitbit. There were 18 days without any steps recorded (days when I forgot to wear the Fitbit or it ran out of batteries, plus a few days when it broke and I had to have it replaced), which I eliminated from the analysis. Other than those few days, I have a complete record of my activity for the past 21 months, which is pretty cool. 

I used Python to plot the daily step counts along with a time-smoothed rolling average to make it easier to see long-term trends. 


The most obvious features to me are the high-step-count spikes during the summer, when I’m most likely to go adventuring outdoors on the weekends. To get a better sense of the distribution of my steps, I plotted a histogram:


I average about 10,000 – 11,000 steps per day, but there’s a lot of variation. I usually end up recording at least 12,000 steps on days when I bike to work, because I find that each pedal rotation registers as a step. I suspect that the tail on the left side of the histogram (<5000 step days) are mostly days when I was sick or otherwise feeling down, because I generally find it downright difficult to walk fewer than 5000 steps in a day unless I have a cold or similar. 

Finally, I broke the histogram down by days of the week. It got a little messy to look at, so I used gaussian kernel smoothing to turn the histogram into a density plot that’s easier on the eyes: 


Most of the days are similar to each other, but Saturday clearly has the most variation, and the most days with very high step counts. In fact, I walk 15,000+ steps 17% of the time on Saturdays, but less than 10% of the time on other days of the week. And on that note, the dog and I are going for a walk. 

Here’s the code I used to generate the graphs: 

[gist /]

Dental year-in-review

Earlier this year in May, I started tracking my dental hygiene habits by giving myself a sticker on a calendar every time I flossed my teeth. This started because I really hate flossing — there’s nothing inherently fun or rewarding about it, so I wondered if there was anything I could do to incentivize it or make it more fun.

I just kept going with it, and now I have eight months of flossing history to look at. I entered the data into Excel (giving myself a ‘1’ on days where I had a sticker on the calendar and a ‘0’ otherwise), exported it as a CSV, and used python to graph the results broken down by days of the week.

I used python’s matplotlib to make the graph, then I also used RPy to make the graph using R since I’ve never really directly compared graphs made using the two methods. Here’s the matplotlib plot:


and here’s the RPy plot:


I prefer the bars of the RPy plot because they line up nicely without any effort, but I do enjoy how matplotlib turned the y-axis labels horizontal for me. Of course for any more complex plotting I think the advantages of RPy would become much more apparent.

But more importantly: what can I learn about my oral health from this analysis? My flossing habits during the week are quite good, with >80% success rate regardless of the day, but on the weekends I tend to sleep in, fail to follow a standardized morning routine, and consequently end up promising myself I’ll floss at night (hint: it never happens). Don’t tell my dentist.

Here’s the code I used to generate these images: