[Pycon] [new paper] "Michael Salib" - Does rain cause traffic delays? Weather and geophysical processing for data scientists.

info a pycon.it info a pycon.it
Gio 3 Gen 2019 17:14:50 CET


Title: Does rain cause traffic delays? Weather and geophysical processing for data scientists.
Duration: 45 (includes Q&A)
Q&A Session: 0
Language: en
Type: Talk

Abstract: We're going to learn how rain impacts traffic delays in Chicago! I'll present a worked-example of correlating traffic delays with historical weather data (but the same techniques also work for real time weather data and short term forecasts).

We'll be using congestion data from Chicago derived from the city's bus system. For weather data, we're going to get NOAA's High Resolution Rapid Refresh dataset which is a free rasterized gridded data product that describes weather variables like:

 - temperature
 - wind direction and speed
 - rain rate

Along the way, we'll learn how to deal with rasterized gridded weather data products. That means learning:

 - how to handle weird data formats
 - how to perform simple geospatial transforms and joins since data variables are stored in an array where each point corresponds to a point on a conical grid projection
 - because HRRR data covers the entire US and has many data fields, it is a massive data set, so we'll use a special protocol to pull down only the data fields we need in the region of time and space that we care about

I'm focusing on intermediate Python programmers who have had some experience with Numpy or Pandas but know little about weather or geospatial data processing. That definitely includes data scientists who are not application developers. There is a huge amount of weather data and satellite imagery available for free that could drive all sorts of cool applications and analyses but there are too many barriers to entry in this field. I want to dismantle some of those barriers.

I want attendees to walk away knowing how to find and fetch free weather data, the basics of how geophysical datasets are stored (i.e., gridded data rasters with metadata), and how to extract a geographical and temporal subset from a large rain dataset. I also want them to develop a sense for how we measure meteorological data and how reliable those measurements are (and when they're not).



Tags: [u'analytics', u'data-analysis', u'matplotlib', u'bigdata', u'scientific-computing', u'pandas', u'pydata']


Maggiori informazioni sulla lista Pycon