# The ensemble of CMIP6 daily predictor variables for statistical downscaling

Coupled Model Intercomparison Project Phase 5 (CMIP5) predictors documentation is available here

This is the technical documentation for the daily predictor variables of a subset of Coupled Model Intercomparison Project Phase 6 (CMIP6) global climate models (GCMs) that can be used for statistical downscaling. The documentation provides a general description of the datasets included in the ensemble of predictor variables, the methodology for how the variables were created, a description of how the folders and files of data are organized for download, and a summary of how CanESM2 and CanESM5 predictor variables available through the Canadian Climate Data and Scenarios (CCDS) site may differ.

Overview
1. Equilibrium climate sensitivity
2. Description input datasets
3. Preprocessing of predictor variables
4. Format of predictor datasets
5. Differences between CanESM5 and CanESM2 predictor variables
6. Dataset licence
7. References

## Overview

One of the ways of obtaining local-scale climate change scenarios is to use regression-based statistical downscaling of GCMs. In this approach, an empirical relationship between GCM predictors (i.e., near-surface and upper-level atmosphere circulation variables) and surface predictands (such as observed temperature or precipitation from a station) is derived by linear or non-linear transfer functions. For this purpose, an ensemble of daily predictor variables are produced from CanESM5, MPI-ESM1.2-HR, NorESM2-MM, and two reanalysis datasets.

A total of 26 predictor variables are included in each ensemble, composed of both raw and derived variables, with multiple atmospheric variables available at three different pressure levels. Predictor variables are available at the daily scale on a 64 by 128 latitude-longitude global Gaussian grid with T42 spectral truncation. The historical simulation for 1979-2014 as well as the four Tier 1 Shared Socioeconomic Pathways (SSPs) prioritized by the Intergovernmental Panel on Climate Change (IPCC) and Scenario Model Intercomparison Project (ScenarioMIP) (SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5) and SSP1-1.9 (due to its relevance for the Paris Agreement) for 2015-2100 are available for each GCM.Reference 7 Two reanalysis dataset options are available for the historical period 1979-2014 (ECMWF ERA5 and NCEP-DOE Reanalysis 2).

GCMs chosen for inclusion into the CMIP6 predictors dataset was determined by three factors. Firstly, the equilibrium climate sensitivity (ECS) must have been calculated according to the Gregory methodology and the selected GCMs must cover a range of ECS values (see sections 1.1. and 1.2.). Secondly, the GCM must have run the historical simulation and as many of the five SSPs as possible (SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, and SSP5-8.5). Thirdly, for the relevant simulations, the seven base variables at all three included pressure levels (if applicable) must be available for download on Earth System Grid Federation (ESGF) website.

## 1. Equilibrium climate sensitivity

Inclusion of a subset of CMIP6 GCMs for the CMIP6 predictors dataset was done with the intent to include GCMs that span a range of ECS values. This includes CanESM5 and GCMs produced by other, non-Canadian, modelling organizations.

### 1.1. What is equilibrium climate sensitivity?

Understanding the Earth’s response to changes in atmospheric carbon dioxide (CO2) and determining its sensitivity to any perturbations in CO2 level, is a fundamental goal to those studying climate science.Reference 13 One of the earliest and simplest concepts applied to gauge the climate sensitivity of climate models, equilibrium climate sensitivity (ECS) is a measurement extensively used by the climate modelling community.Reference 4

ECS is a hypothetical value representing the increase in globally averaged surface temperature once a climate system reaches equilibrium after an instantaneous doubling of atmospheric CO2 concentration.Reference 4Reference 13 The most common method for calculating the ECS of a GCM is the Gregory method, used prior to and since the CMIP5, it produces a value also termed the effective ECS.Reference 4 Using the Gregory method, atmospheric CO2 is instantaneously quadrupled, instead of doubled, and the model is run for 150 years, instead of to equilibrium.Reference 4 The surface temperature at equilibrium can then be extrapolated for a doubling of CO2, assuming that the response of the model is roughly linear and half of the warming that should occur from a quadrupling of CO2.Reference 4

### 1.2. Involvement of ECS in the IPCC assessment reports and CMIPs

Since the establishment of ECS as a standard metric of climate sensitivity and model response to atmospheric CO2, the CMIP has mandated that every contributing GCM estimate the ECS as one of the requirements for participation.Reference 4 The Diagnostic, Evaluation and Characterization of Klima (DECK) experiments are experiments that every GCM must produce simulations for as a condition for entry into the CMIP.Reference 4 The instantaneous quadrupling of CO2 and resultant run are one of these required experiments.Reference 4 As such, each generation of GCMs prepared for the CMIPs has produced an ECS range. While past ranges have all been quite consistent with each other over generations of models (1.5°K to 4.5°K), the GCMs participating in the current CMIP (phase 6) have produced a wider ECS range (1.8°K to 5.6°K) with a greater number of models producing higher values and, numerous models which exceed the previous upper limit of the range.Reference 4Reference 13 Calculations of ECS started with the first IPCC report in the 1990s, and the CMIP6 ECS range is the largest of any generation of models since that time.Reference 4

### 1.3. Limitations of ECS

It should be noted that ECS is an uncertain quantity and not without weaknesses or assumptions. As previously mentioned, ECS is a hypothetical quantity as a large instantaneous change in atmospheric CO2 is not a realistic scenario for a climate system. An instantaneous change does not allow for any time-dependant or time-varying responses and effects such as feedbacks. The ECS also does not measure any quantity aside from the change in temperature. However, despite any shortcomings due to its simplicity, ECS is a widely used measure in climate science as it provides highly relevant information about how a climate system responds to perturbations and targets for global temperature thresholds.Reference 13

### 1.4. Selection of CMIP6 GCMs for the predictors dataset

Based on the criteria listed in the Overview section, the GCMs currently included in the CMIP6 predictors dataset are CanESM5, NorESM2-MM, and MPI-ESM1.2-HR. In the case that more than one version of the same model met the aforementioned criteria, the model with the higher resolution was selected. As models with a finer grid generally are able to reproduce climate responses and systems with less error and bias when compared to observations, preference was shown for models with higher atmospheric spatial resolution. Additional models may be added to the predictors dataset in the near future. See Table 1 for a full list of all datasets included in the predictors ensemble and an overview of each dataset.

Table 1. Availability of datasets for each model included in the predictors ensemble. Pressure levels apply for all non surface-level variables. Reanalysis datasets do not have an ECS or a variant ID, and are only available for the historical time period; therefore, multiple columns are marked ‘not applicable’ (n/a).

Model ECS (°K) Pressure levels (hPa) SSPs Variant Leap years
ECMWF ERA5 n/a 500, 850, 1000 n/a n/a Yes
NCEP-DOE Reanalysis 2 n/a 500, 850, 1000 n/a n/a Yes
CanESM5 5.6 500, 850, 1000 SSP1-1.9, SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 r1i1p1f1 No
NorESM2-MM 2.5 500, 850, 1000 SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 r1i1p1f1 No
MPI-ESM1.2-HR 3.0 500, 850, 1000 SSP1-2.6, SSP2-4.5, SSP3-7.0, SSP5-8.5 r1i1p1f1 Yes

## 2. Description input datasets

### 2.1. Reanalysis datasets

The National Centers for Environmental Prediction-Department of Energy (NCEP-DOE) Atmospheric Model Intercomparison Project (AMIP)-II Reanalysis (also called NCEP-DOE Reanalysis 2) as well as European Centre for Medium-Range Weather Forecasts (ECMWF) Atmospheric Reanalysis Fifth Generation (ERA5) datasets were included as part of the predictors dataset.

NCEP-DOE Reanalysis 2 is an improved version of its predecessor, NCEP/NCAR Reanalysis 1, as it includes updated parameterizations of physical processes and error fixes.Reference 6 ERA5 builds on past ECMWF reanalysis datasets, includes the latest systems and features, and was constructed using research and information from ECMWF and ECMWF partners.Reference 2 ERA5, compared to ERA-Interim, provides higher spatial and temporal resolution, and has advancements such as improved troposphere, improved representation of tropical cyclones, better global balance of precipitation and evaporation, better precipitation over land in the deep tropics, better soil moisture, and more consistent sea surface temperature and sea ice.Reference 1

### 2.2. CanESM5

The Canadian Earth System Model version 5 (CanESM5) experiments were prepared as part of CMIP6. CanESM5 is the current version of Canadian Centre for Climate Modelling and Analysis’s (CCCma) earth system model and is an updated version of CanESM2 made available for CMIP5. For additional details on CanESM5, please see Swart et al. (2019).Reference 9

### 2.3. MPI-ESM1.2-HR

The Max Planck Institute for Meteorology Earth System Model version 1.2 (MPI-ESM1.2) experiments were prepared as part of CMIP6. The MPI-ESM1.2 is the current version of the Plank Institute for Meteorology’s GCM and is an updated version of the MPI-ESM prepared for the CMIP5. Five coupled model configurations of the MPI-ESM1.2 are available, though of these only two versions meet the inclusion criteria for the predictors dataset as defined in the Overview section. These versions are the MPI-ESM1.2-LR, a low-resolution version, and the MPI-ESM1.2-HR, a high-resolution version. The atmospheric grid spacing of each model is approximately 200 km and 100 km, respectively. The ECS of both versions of the MPI-ESM1.2 is the same as it was tuned explicitly to 3°K. Therefore, the version with the higher resolution, MPI-ESM1.2-HR, was included in the ensemble of predictor variables (see section 1.4.). For additional details on the MPI-ESM1.2-HR, please see Mauritsen et al. (2019) and Müller et al. (2018).Reference 3Reference 5

### 2.4. NorESM2-MM

The Norwegian Earth System Model version 2 (NorESM2) experiments were prepared as part of CMIP6. The NorESM2 is the current version of the Norwegian Climate Center’s GCM and is an updated version of the NorESM1 prepared for CMIP5. Like its predecessor, the NorESM1, multiple versions of the NorESM2 were produced, primarily a low-resolution (NorESM2-LM) and a medium-resolution version (NorESM2-MM). The atmosphere-land resolution of each aforementioned model is approximately 1° and 2°, respectively. The two resolutions of NorESM2 have very similar ECS values at 2.54°K for NorESM2-LM and 2.50°K for NorESM2-MM. Thus, the higher resolution NorESM2-MM was selected for inclusion into the ensemble of predictor variables (see section 1.4.). For additional details on the NorESM2-MM, please see Seland et al. (2020).Reference 8

## 3. Preprocessing of predictor variables

CanESM5, NorESM2-MM, MPI-ESM1.2-HR, and NCEP-DOE data are available from online databases in NetCDF format as global daily time series. Since ERA5 data are only available at hourly or monthly time frequencies, daily means were calculated using values from the hours of 00:00, 06:00, 12:00, and 18:00. These time steps were chosen as NCEP-DOE daily means are calculated using the same times. Total precipitation was the only ERA5 variable to be downloaded for all 24 hours as it was the only variable calculated as a sum and not a mean value. The method for calculating daily total precipitation was based on the method provided by the ECMWF.Reference 12 It should be noted that at the time of the calculation of the predictor variables, data prior to 1979 was not yet available for ERA5, thus the sum of daily total precipitation for January 1 1979 only begins at 07:00 hours.

ERA5 data was downloaded in NetCDF format utilizing Copernicus's Climate Data Store (CDS) API for the hours 00:00, 06:00, 12:00, and 18:00. Surface level variables were downloaded from the ‘reanalysis-era5-single-levels’ dataset and multiple level variables from the 'reanalysis-era5-pressure-levels' dataset. Variables were downloaded on the native grid (0.25°x0.25°) without altering the grid or interpolating the data in the API request. Interpolation occurred at a later step using the same method as was used for the NCEP-DOE Reanalysis 2 variables to ensure consistency (see section 3.5.).

### 3.2. Variables

The variables downloaded from each climate dataset are listed in Table 2. Variables that did not require further analyses are listed as ‘raw’ in Table 2. In addition to these variables, four were derived. The four derived variables are wind variables that were manually calculated from U- and V- wind components using NCAR Command Language (NCL) functions. ERA5 datasets provide two of these derived variables (divergence and relative vorticity), thus no calculations were necessary. The final derived variable, specific humidity, only needed to be calculated for NCEP-DOE Reanalysis 2 datasets using air temperature and relative humidity. Other datasets provided specific humidity as a variable and therefore, it is listed as a raw variable.

### 3.3. Programs used for processing

The ensemble of scripts used to extract and process the datasets as well as formulate predictor files were executed on a Unix system in a Bourne-again shell (bash) environment. Python version 3.7.6 and NCL version 6.6.2 were used to produce the predictors (specific functions named in section 3.5. and in Table 2). The majority of the preprocessing methodology was the same across all datasets with the goal of producing datasets that are comparable.

Main steps of preprocessing:

2. Convert variables to double precision
3. Interpolation (all datasets except CanESM5)
4. Calculate derived variables
5. Conversion of units (if necessary)
6. Standardization

Table 2. Basic description of raw and derived predictor variables. NCL functions used to calculate the derived variables are also listed underneath the data type in the type column.

Variable Unit Level Type
Air temperature °C 2 metres Raw
Total precipitation mm Surface Raw
Mean sea level pressure Pa Mean sea level Raw
Specific humidity1 kg/kg Pressure levels Raw
mixhum_ptrh
Geopotential height m Pressure levels Raw
Zonal wind m/s Pressure levels Raw
Meridional wind m/s Pressure levels Raw
Divergence2,3 s-1 Pressure levels Derived
uv2dvG_Wrap
Relative vorticity2,3 s-1 Pressure levels Derived
uv2dvG_Wrap
Wind direction2,4 0-360° Pressure levels Derived
wind_direction
Wind speed2 m/s Pressure levels Derived
wind_speed
1 Variable not available and derived for NCEP-DOE predictors only. 2 Variable derived using listed NCL function and U- and V- wind components. 3 Variable available in ERA5 dataset and not calculated using NCL function for ERA5 predictors only. 4 Wind direction in degrees corresponds to: 0° pointing north, 90° pointing east, 180° pointing south, and 270° pointing west.

### 3.4. Double precision and pressure level isolation

Desired pressure levels (1000, 850, and 500 hPa) were isolated for multiple level atmospheric variables. This step was not necessary for ERA5 data as pressure levels can be selected and downloaded individually. Values were then converted to double precision prior to any calculation to retain as much raw original information as possible.

### 3.5. Interpolation

Additional preprocessing for all datasets except CanESM5 consisted, primarily, of interpolation. Both reanalysis datasets and GCMs MPI-ESM1.2-HR and NorESM2-MM were interpolated to match the T42 global Gaussian grid of the CanESM5 data using the specialized NCL function ‘f2gsh_Wrap’. The function interpolates scalar values on fixed grids onto a Gaussian grid with optional triangular truncation, which, in this case, was set to 42. The resultant grid is 64 degrees of latitude by 128 degrees of longitude, with a uniform longitudinal resolution of 2.8125° and a nearly uniform latitudinal resolution of 2.8125°. ERA5 data required the additional step of conversion from hourly to daily datasets. The NCL function ‘calculate_daily_value’ was used following conversion to double precision to calculate daily means from hourly ERA5 data. The sum of daily total precipitation for the ERA5 dataset was calculated in Python using the ‘Dataset.resample’ function of the Xarray package.

### 3.6. Unit Conversion

Unit conversion was done for a few variables.

All GCMs and NCEP-DOE Reanalysis 2:

• 2m air temperature (°K converted to °C)
• total precipitation (kg/m/s converted to mm/day)
ECMWF ERA5:
• 2m air temperature (°K converted to °C)
• total precipitation (m/day converted to mm/day)
• geopotential height (geopotential (m2/s2) converted to geopotential height (m))

### 3.7. Standardization

The final step for producing the predictor variables was to standardize the values according to the historical reference period, 1981-2010, for each dataset (each GCM, NCEP-DOE, ERA5) while retaining the original values. Standardization is, in this case, according to a long-term climatic mean and standard deviation over the historical reference period. The 1981-2010 date range was selected as the reference period for standardization of the CMIP6 predictor variables as it is commonly used in climate science.Reference 10Reference 11 All predictor variables were standardized according to the 1981-2010 reference period except for wind direction, for which a standardized value would serve no purpose. As a variable, wind direction is not continuous and is not normal in distribution as it varies drastically in space and time. Additionally, standardizing wind direction would remove all information relating to direction. Standardized values (n) are produced from predictor values (x) utilizing the mean (µ) and standard deviation (σ) over the 1981-2010 reference period for each data source and according to individual grid box using the following expression:

${n}_{i}=\frac{\left({x}_{i}-{µ}_{\mathrm{1981-2010}}\right)}{{\sigma }_{\mathrm{1981-2010}}}$

## 4. Format of predictor datasets

### 4.1. Structure of grid-box directories and predictor files

Each grid cell is assigned numbers according to indexed latitude and longitude coordinates. From each grid cell, a folder named Box_iiiX_jjY can be downloaded where iii ranges from 001 to 128, the longitudinal index, and jj ranges from 01 to 64, the latitudinal index (see Table 5 and Table 6). Each grid box contains many subfolders identifying the source of the dataset used to calculate the predictor variables (i.e. GCMs have individual folders for each simulation (historical and each SSP), while reanalysis datasets have one subfolder each) and the year range. Within each subfolder is a second set of subfolders that separate standardized and original values. A detailed description of folder names is described in Table 3.

Folders of original data (i.e., not standardized) contain 26 predictor variables, while folders of standardized data contain 23 predictor variables as wind direction, at all three pressure levels, was not standardized. Each file contains one column of data in a csv format. The naming structure of the files is derived from the CMIP6 naming template with each file using the format:

variable ID_time frequency_source ID_experiment ID_member ID_grid label_time range_type.csv

Variable IDs, or variable names, are listed below in Table 4, and file formats for each source dataset can be found in Table 3. It should be noted that reanalysis datasets do not possess member IDs and, therefore, the label is omitted from their file names. Grid label is ‘gn’ for CanESM5 predictors as the data are represented on its native grid. For reanalysis predictors, NCEP-DOE and ERA5, as well as all other GCMs, the grid label is ‘gr’ as the data has been regridded. The extra category ‘type’ was added to the naming template to differentiate between files containing standardized, ‘sd,’ and original, ‘og,’ data. It should also be noted that files containing CanESM5 and NorESM2 data have fewer values than those containing NCEP-DOE, ERA5, or MPI-ESM1.2-HR data as CanESM5 and NorESM2 use a 365-day calendar and, therefore, do not include leap years (see Table 1).

Table 3. List of dataset subfolders and the template for file formats.

Subfolders for datasets Time frame Structure of file name 1
NCEP-DOE2_1979-2014 1979 to 2014 varID_day_NCEP-DOE_RE2_gr_19790101-20141231_type.csv
ECMWF_ERA5_1979-2014 1979 to 2014 varID_day_ECMWF_ERA5_gr_19790101-20141231_type.csv
CanESM5_historical_1979-2014 1979 to 2014 varID_day_CanESM5_historical_r1i1p1f1_gn_19790101-20141231_type.csv
CanESM5_ssp119_2015-2100 2015 to 2100 varID_day_CanESM5_ssp119_r1i1p1f1_gn_20150101-21001231_type.csv
CanESM5_ssp126_2015-2100 2015 to 2100 varID_day_CanESM5_ssp126_r1i1p1f1_gn_20150101-21001231_type.csv
CanESM5_ssp245_2015-2100 2015 to 2100 varID_day_CanESM5_ssp245_r1i1p1f1_gn_20150101-21001231_type.csv
CanESM5_ssp370_2015-2100 2015 to 2100 varID_day_CanESM5_ssp370_r1i1p1f1_gn_20150101-21001231_type.csv
CanESM5_ssp585_2015-2100 2015 to 2100 varID_day_CanESM5_ssp585_r1i1p1f1_gn_20150101-21001231_type.csv
SourceID_historical_1979-2014 1979 to 2014 varID_day_SourceID_historical_r1i1p1f1_gr_19790101-20141231_type.csv
SourceID_ssp126_2015-2100 2015 to 2100 varID_day_SourceID_ssp126_r1i1p1f1_gr_20150101-21001231_type.csv
SourceID_ssp245_2015-2100 2015 to 2100 varID_day_SourceID_ssp245_r1i1p1f1_gr_20150101-21001231_type.csv
SourceID_ssp370_2015-2100 2015 to 2100 varID_day_SourceID_ssp370_r1i1p1f1_gr_20150101-21001231_type.csv
SourceID_ssp585_2015-2100 2015 to 2100 varID_day_SourceID_ssp585_r1i1p1f1_gr_20150101-21001231_type.csv
1 Note that portions of file names that are in bold, SourceID (GCM name), varID (variable ID/name) and type, are subject to change based on the GCM, predictor variable, and standardized or original values, respectively. The remainder of each name in regular font is consistent across all files for the associated source dataset.

Table 4. List of the 26 predictor variable IDs and corresponding variable names.

No. Variable ID Predictor variable
1 mslp Mean sea level pressure
2 p1_f 1000 hPa Wind speed
3 p1_u 1000 hPa Zonal wind component
4 p1_v 1000 hPa Meridional wind component
5 p1_z 1000 hPa Relative vorticity of true wind
6 p1th 1000 hPa Wind direction
7 p1zh 1000 hPa Divergence of true wind
8 p5_f 500 hPa Wind speed
9 p5_u 500 hPa Zonal wind component
10 p5_v 500 hPa Meridional wind component
11 p5_z 500 hPa Relative vorticity of true wind
12 p5th 500 hPa Wind direction
13 p5zh 500 hPa Divergence of true wind
14 p8_f 850 hPa Wind Speed
15 p8_u 850 hPa Zonal wind component
16 p8_v 850 hPa Meridional wind component
17 p8_z 850 hPa Relative vorticity of true wind
18 p8th 850 hPa Wind direction
19 p8zh 850 hPa Divergence of true wind
20 p500 500 hPa Geopotential
21 p850 850 hPa Geopotential
22 prcp Total precipitation
23 s500 500 hPa Specific humidity
24 s850 850 hPa Specific humidity
25 shum 1000 hPa Specific humidity
26 temp Air temperature at 2 m

Table 5. Latitude coordinates rounded to four decimal places for the 64 by 128 latitude-longitude global Gaussian grid shown with the associated grid box number according to indexed latitude. Latitudes are indexed from south to north and represent the Y index of the grid box numbering system (Box_iiiX_jjY). Note that latitude coordinates correspond to grid box centres.

jj (Y) Latitude
1 87.8638°S
2 85.0965°S
3 82.3129°S
4 79.5256°S
5 76.7369°S
6 73.9475°S
7 71.1578°S
8 68.3678°S
9 65.5776°S
10 62.7874°S
11 59.997°S
12 57.2066°S
13 54.4162°S
14 51.6257°S
15 48.8352°S
16 46.0447°S
17 43.2542°S
18 40.4636°S
19 37.6731°S
20 34.8825°S
21 32.0919°S
22 29.3014°S
23 26.5108°S
24 23.7202°S
25 20.9296°S
26 18.139°S
27 15.3484°S
28 12.5578°S
29 9.7671°S
30 6.9765°S
31 4.1859°S
32 1.3953°S
33 1.3953°N
34 4.1859°N
35 6.9765°N
36 9.7671°N
37 12.5578°N
38 15.3484°N
39 18.139°N
40 20.9296°N
41 23.7202°N
42 26.5108°N
43 29.3014°N
44 32.0919°N
45 34.8825°N
46 37.6731°N
47 40.4636°N
48 43.2542°N
49 46.0447°N
50 48.8352°N
51 51.6257°N
52 54.4162°N
53 57.2066°N
54 59.997°N
55 62.7874°N
56 65.5776°N
57 68.3678°N
58 71.1578°N
59 73.9475°N
60 76.7369°N
61 79.5256°N
62 82.3129°N
63 85.0965°N
64 87.8638°N

Table 6. Longitude coordinates for the 64 by 128 latitude-longitude global Gaussian grid shown with the associated grid box number according to indexed longitude. Longitudes are indexed from the Greenwich meridian towards the east and are represented as the X index of the grid box numbering system (Box_iiiX_jjY). Note that longitude coordinates correspond to grid box centres.

iii (X) Longitude (°East)
1 0
2 2.8125
3 5.625
4 8.4375
5 11.25
6 14.0625
7 16.875
8 19.6875
9 22.5
10 25.3125
11 28.125
12 30.9375
13 33.75
14 36.5625
15 39.375
16 42.1875
17 45
18 47.8125
19 50.625
20 53.4375
21 56.25
22 59.0625
23 61.875
24 64.6875
25 67.5
26 70.3125
27 73.125
28 75.9375
29 78.75
30 81.5625
31 84.375
32 87.1875
33 90
34 92.8125
35 95.625
36 98.4375
37 101.25
38 104.0625
39 106.875
40 109.6875
41 112.5
42 115.3125
43 118.125
44 120.9375
45 123.75
46 126.5625
47 129.375
48 132.1875
49 135
50 137.8125
51 140.625
52 143.4375
53 146.25
54 149.0625
55 151.875
56 154.6875
57 157.5
58 160.3125
59 163.125
60 165.9375
61 168.75
62 171.5625
63 174.375
64 177.1875
65 180
66 182.8125
67 185.625
68 188.4375
69 191.25
70 194.0625
71 196.875
72 199.6875
73 202.5
74 205.3125
75 208.125
76 210.9375
77 213.75
78 216.5625
79 219.375
80 222.1875
81 225
82 227.8125
83 230.625
84 233.4375
85 236.25
86 239.0625
87 241.875
88 244.6875
89 247.5
90 250.3125
91 253.125
92 255.9375
93 258.75
94 261.5625
95 264.375
96 267.1875
97 270
98 272.8125
99 275.625
100 278.4375
101 281.25
102 284.0625
103 286.875
104 289.6875
105 292.5
106 295.3125
107 298.125
108 300.9375
109 303.75
110 306.5625
111 309.375
112 312.1875
113 315
114 317.8125
115 320.625
116 323.4375
117 326.25
118 329.0625
119 331.875
120 334.6875
121 337.5
122 340.3125
123 343.125
124 345.9375
125 348.75
126 351.5625
127 354.375
128 357.1875

## 5. Differences between CanESM5 and CanESM2 predictor variables

The core methodology for calculating CMIP6 (CanESM5) predictor variables was based on the methodology used for calculating CanESM2 predictors. Nonetheless, there are few steps that differ in producing the CMIP6 predictors. For the CanESM2 predictors, National Centers for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) Reanalysis 1 data was also processed and made available to users, whereas for the CMIP6 predictors, NCEP-DOE Reanalysis 2 as well as ECMWF ERA5 datasets are included. Additionally, the CMIP6 predictors include a number of additional GCMs to cover a range of ECS values. Moreover, while only the CanESM2 dataset was converted to double precision, all datasets were converted to double precision for the CMIP6 predictors (see section 3.4.). Lastly, the naming scheme of folders and file names has been changed to reflect the CMIP6 naming convention, however, variable names will remain the same to reduce confusion when comparing between the predictor projects (see section 4; see Table 3).

### 5.1. Availability of standardized datasets

While previous predictor datasets (i.e., CanESM2) consisted of only standardized values, both original (non-standardized) values and standardized values are available to users for all datasets. The decision to make both original and standardized values available to users occurred for three reasons. Firstly, and most importantly, to justify standardization, values must follow a normal distribution; generally, this is not the case for precipitation and wind variables. Secondly, by standardizing values, much of the valuable information contained within the data is lost such as the mean, standard deviation, and minimum and maximum values. Finally, providing original data allows users the option to standardize the data using a baseline time period of their choosing. Nevertheless, standardized values were provided for comparison purposes with other predictor datasets.