Searching for rainy tweets
To help assess the potential of using social media for early-warning and public safety communications, we wanted to explore whether there was a Twitter ‘signal’ from local rain events. Key to this challenge was seeing if there was enough geographic metadata in the data to detect it. As described in Part 1 of this series, we interviewed managers of early-warning systems across the United States, and with their help identified ten rain events of local significance. In our previous post we presented data from two events in Las Vegas that showed promise in finding a correlation between a local rain gauge and Twitter data.
We continue our discussion by looking at an extreme rain and flood event that occurred in Louisville, KY on August 4-5, 2009. During this storm rainfall rates of more than 8 inches per hour occurred, producing widespread flooding. In hydrologic terms, this event has been characterized as having a 1000-year return period.
Learn more about this August 2009 Louisville flood
During this 48-hour period in 2009, there were approximately 30 million tweets posted from around the world. (While that may seem like a lot of tweets, keep in mind that there are now more than 400 millions tweets per day.) Using “filtering” methods based on weather-related keywords and geographic metadata, we set off to find a local Twitter response to this particular rain event.
Domain-based Searching – Developing your business logic
Our first round of filtering focused on developing a set of “business logic” keywords around our domain of interest, in this case rain events. Developing how you filter data from any social media firehose is an iterative process involving analyzing collected data and applying new insights. Since we were focusing on rain events, words with the substring “rain” were searched for, along with other weather-related words. Accordingly, we first searched with this set of keywords and substrings:
- Keywords: weather, hail, lightning, pouring
- Substrings: rain, storm, flood, precip
Applying these filters to the 30 million tweets resulted in approximately 630,000 matches. We soon found out that there are many, many tweets about training programs, brain dumps, and hundreds of other words containing the substring ‘rain.’ So, we made adjustments to our filters, including focusing on the specific keywords of interest: rain, raining, rainfall, and rained. By using these domain-specific words we were able to reduce the amount of non-rain ‘noise’ by over 28% and ended up with approximately 450,000 rain- and weather-related tweets from around the world. But how many were from the Louisville area?
Finding Tweets at the County and City Level – Finding the needle in the haystack
The second step was mining this Twitter data for geographic metadata that would allow us to geo-reference these weather-related tweets to the Louisville, KY area. There are generally three methods for geo-referencing Twitter data
- Activity Location: tweets that are geo-tagged by the user.
- Profile Location: parsing the Twitter Account Profile location provided by the user.
- “I live in Louisville, home of the Derby!”
- Mentioned Location: parsing the tweet message for geographic location.
- “I’m in Louisville and it is raining cats and dogs”
Having a tweet explicitly tied to a specific location or a Twitter Place is extremely useful for any geographic analysis. However, the percentage of tweets with an Activity Location is less than 2%, and these were not available for this 2009 event. Given that, what chance was there to be able to correlate tweet activity with local rain events?
For this event we searched for any tweet that used one of our weather-related items, and either mentioned “Louisville” in the tweet, or came from an Twitter account with a Profile Location setting including “Louisville.” It’s worth noting that since we live near Louisville, CO, we explicitly excluded account locations that mentioned “CO” or “Colorado.” (By the way, the Twitter Profile Geo Enrichments announced yesterday would have really helped our efforts.)
After applying these geographic filters, the number of tweets went from 457,000 to 4,085. So, based on these tweets, did we have any success in finding a Twitter response to this extreme rain event in Louisville?
Did Louisville Tweet about this event?
Figure 1 compares tweets per hour with hourly rainfall from a gauge located just west of downtown Louisville on the Ohio River. As with the Las Vegas data presented previously, the tweets occurring during the rain event display a clear response, especially when compared to the “baseline” level of tweets before the event occurred. Tweets around this event spiked as the storm entered the Louisville area. The number of tweets per hour peaked as the heaviest rain hit central Louisville and remained elevated as the flooding aftermath unfolded.
Figure 1 – Louisville, KY, August 4-5, 2009. Event had 4085 activities, baseline had 178.
Other examples of Twitter signal compared with local rain gauges
Figure 2 – Boulder event 1, July 13-14, 2011. Event had 1620 activities, baseline has 546.
Figure 3 – Boulder event 2, July 29-30, 2012. Event had 507 activities, baseline has 416.
Figure 4 – San Diego event, December 19-23, 2010. Event had 15,529 activities, baseline had 2,673.
Figure 5 – Santa Barbara event 1, December 17-23, 2010. Event had 1,724 activities, baseline had 204.
Figure 6 – Santa Barbara event 2, March 19-22, 2011. Event had 856 activities, baseline had 154.
Figure 7 – Little Rock event 1, October 9-10, 2009. Event had 327 activities, baseline had 203.
Figure 8 – Little Rock event 2, November 21-22, 2011. Event had 2,248 activities, baseline 151.
Based on the ten events we analyzed it is clear that social media is a popular method of public communication during significant rain and flood events.
In Part 3, we’ll discuss the opportunities and challenges social media communication brings to government agencies charged with public safety and operating early-warning systems.