The answer is provided by Generative Adversarial Networks (GAN) and Bayesian Networks (BN): GAN is providing me with an approximation of the distribution, with a BN this distribution is conditioned on a given route, and temperatures are shifted to warmer conditions.
As a skier, I already observed in situ the impact of climate change: snow coming later in the season, less snow at medium elevations… As a data scientist, I am wondering what will happen next: With a warmer climate, will it be possible to ride the same slopes?
Ski touring is the off-road variant of skiing in resorts: on the way up it looks like trail running or trekking but on skis, on the way down it looks alike free-ride even though the skis are slimmer and lighter. As for off-road vehicles, the play-ground is much larger than the skiing resorts.
However, this activity is dependent on the snow cover quality and quantity, and on the avalanche risk. Getting prior information on the routes to follow is the raison d’être of Camp-To-Camp. Up to date information is collected through outing reports, a bit similar to Strava’s reports but with more quantitative and qualitative information on the outdoor conditions. Reports are an indication of the conditions: no or bad report on a given route or date probably means conditions are not fit.
For my machine learning task, outing reports have been selected on the Mount-Blanc region, the French administrative section of Haute-Savoie. It is the most varied ski region with summit elevations ranging from 1000 m up to 4810 m at the Mount-Blanc. But the extent of the region is not wide enough to bear radically different climate conditions. The outing reports span from 2009 to 2019. There is a total of 6656 outing reports, discarding all the incomplete reports.
Reports are both textual and quantitative. In the previous post “Full NLP use case with fastText and Tensorflow 2”, I have focused on the textual description of the routes. In this post, I will deal with the quantitative features of the outing reports. I have selected the following features that are relevant to our goal, and widely available in reports:
- Maximum elevation [meter], which is a feature of the landscape, but also a choice of the skiers (e.g.: only high elevations are compatible with skiing in May).
- Skis on, way up [meter], telling when the snow cover is thick enough to use skis (ski tourers are accustomed to carrying skis on the back-pack), filled with Minimum elevation if missing.
- Skis off, way down [meter], similar to previous but going down which requires more snow, or a different route (e.g. partially up with a cable car, and down on the skis), filled with Minimum elevation if missing.
- Ski rating, defined for the route (not the outing), on the scale of the editor Volopress , from 1.1 to 5.5, the “normal” (for all people) are in between 2.1 to 3.3. Indirectly an indication of the terrain topography (slope, cover…).
- Condition rating, a subjective evaluation of the outing by the skier on a 5 level scale from awful to excellent. It is transformed from ordinal to numerical.
- Day of Season, which is extracted from the Start date as the offset to closest February 15th, the middle of winter. It is transformed from ordinal to numerical.
There are also subjective evaluations of the snow cover quantity and quality but they were missing on most reports.
Weather conditions are captured through temperatures. I have used temperature in the morning [°C] from www.historique-data.net in the resort of Megève which is located at circa 1000 m elevation. The actual temperature on the outing spot is different but it is correlated. To account for weather conditions in the time vicinity of the outing, two extra features are derived from the morning temperature: the average of the past 7 and 30 days before the outing.
Total I now have 9 features quantitatively describing a ski outing.
Correlation on the features is quite informative. The three temperatures are correlated, even the temperature of the day and the 30 last day average. Elevations are also correlated, with an expected strong correlation on skis on and off elevations. Day of the season is mostly correlated with the three temperatures and maximum elevation. But condition rating has little correlation to any other feature, even less correlated than ski rating that depends mostly on the topography!
Looking at marginal distributions using histograms, I find that:
- Seasonality is evident, not many outings at summertime 🙂
- Outing season is quite uniformly distributed across the years on the range 2010–2019, this is good for statistics
- The season spans from 100 days before and 110 days after February 15th
The observation time range (2010–2019) is too small to account for yearly variations and get some strong indications of climate evolution. Let’s compare anyway the beginning (2010–2012) and end (2017–2019) of the observations.
From the following histogram, the day of season distribution is modified on the left hand of the central peak: the season starts later. Local observations are confirming that snow conditions are getting fine later in the season, after Christmas, sometimes after mid-January.
The morning temperature has shifted up with a mean change of 1.8°C!