How to create a subgrid info file for CESM2’s CLM processing?

Why we need this script to save the a “CESM2_subgrid_info.nc” for CESM CLM processing?
Because the CLM files uploaded to AWS only contain:
- dimension: (member_id, time, landunit)
- coordinates: member_id, and time
In general, CESM’s CLM varibales are saved as (time, landunit).
But usually we want to analyze the datasets with the format (time, lat, lon).
So the workflow for CESM’s CLM variables would be - (:, landunit) [1D] -> (:, landtype, lat, lon) [3D] -> (:, lat, lon) [2D]
As a result, we at least need to know:
- land1d_ixy (landunit): for mapping from 1D to 3D
- land1d_jxy (landunit): for mapping from 1D to 3D
- land1d_ityplunit (landunit): for mapping from 1D to 3D
- lat: for setting up the list of lat
- lon: for setting up the list of lon

reference: https://github.com/zzheng93/CLM-1D-to-2D

[1]:
#https://github.com/NCAR/ctsm_python_gallery/blob/master/notebooks/PFT-Gridding.ipynb
import numpy as np
import xarray as xr

class load_clm:
    def __init__(self, args):
        self.ds = xr.open_dataset(args)
        self.lat = self.ds.lat
        self.lon = self.ds.lon
        self.time = self.ds.time
        self.ixy = self.ds.land1d_ixy
        self.jxy = self.ds.land1d_jxy
        self.ltype = self.ds.land1d_ityplunit
        self.ltype_dict = {value:key for key, value in self.ds.attrs.items() if 'ltype_' in key.lower()}
    def get2D(self, var_str):
        var = self.ds[var_str]
        nlat = len(self.lat.values)
        nlon = len(self.lon.values)
        ntim = len(self.time.values)
        nltype = len(self.ltype_dict)
        # create an empty array
        gridded = np.full([ntim,nltype,nlat,nlon],np.nan)
        # assign the values
        gridded[:,
                self.ltype.values.astype(int) - 1, # Fortran arrays start at 1
                self.jxy.values.astype(int) - 1,
                self.ixy.values.astype(int) - 1] = var.values
        grid_dims = xr.DataArray(gridded, dims=("time","ltype","lat","lon"))
        grid_dims = grid_dims.assign_coords(time=self.time,
                                            ltype=[i for i in range(self.ltype.values.min(),
                                                                    self.ltype.values.max()+1)],
                                            lat=self.lat.values,
                                            lon=self.lon.values)
        grid_dims.name = var_str
        return grid_dims
[2]:
fp = "/glade/campaign/cgd/cesm/CESM2-LE/timeseries/lnd/proc/tseries/day_1/TREFMXAV/"
fn = "b.e21.BSSP370smbb.f09_g17.LE2-1281.019.clm2.h6.TREFMXAV.20950101-21001231.nc"
clm = load_clm(fp+fn)
clm.ltype_dict
[2]:
{1: 'ltype_vegetated_or_bare_soil',
 2: 'ltype_crop',
 3: 'ltype_UNUSED',
 4: 'ltype_landice_multiple_elevation_classes',
 5: 'ltype_deep_lake',
 6: 'ltype_wetland',
 7: 'ltype_urban_tbd',
 8: 'ltype_urban_hd',
 9: 'ltype_urban_md'}
[3]:
clm.ds[["lat","lon",
        "land1d_ixy","land1d_jxy","land1d_ityplunit",
        "land1d_lon","land1d_lat",
        "landfrac","landmask","land1d_wtgcell","land1d_active"]].to_netcdf("./CESM2_subgrid_info.nc")
clm.ds
[3]:
<xarray.Dataset>
Dimensions:             (levgrnd: 25, levlak: 10, levdcmp: 25, lon: 288,
                         lat: 192, gridcell: 21013, landunit: 62125,
                         column: 554298, pft: 848480, time: 2191,
                         hist_interval: 2)
Coordinates:
  * levgrnd             (levgrnd) float32 0.01 0.04 0.09 ... 19.48 28.87 42.0
  * levlak              (levlak) float32 0.05 0.6 2.1 4.6 ... 25.6 34.33 44.78
  * levdcmp             (levdcmp) float32 0.01 0.04 0.09 ... 19.48 28.87 42.0
  * lon                 (lon) float32 0.0 1.25 2.5 3.75 ... 356.2 357.5 358.8
  * lat                 (lat) float32 -90.0 -89.06 -88.12 ... 88.12 89.06 90.0
  * time                (time) object 2095-01-01 00:00:00 ... 2101-01-01 00:0...
Dimensions without coordinates: gridcell, landunit, column, pft, hist_interval
Data variables: (12/45)
    area                (lat, lon) float32 29.95 29.95 29.95 ... nan nan nan
    landfrac            (lat, lon) float32 1.0 1.0 1.0 1.0 ... nan nan nan nan
    landmask            (lat, lon) float64 1.0 1.0 1.0 1.0 ... nan nan nan nan
    pftmask             (lat, lon) float64 1.0 1.0 1.0 1.0 ... nan nan nan nan
    nbedrock            (lat, lon) float64 20.0 20.0 20.0 20.0 ... nan nan nan
    grid1d_lon          (gridcell) float64 0.0 1.25 2.5 ... 335.0 328.8 330.0
    ...                  ...
    mscur               (time) int32 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0
    nstep               (time) int32 1401600 1401648 1401696 ... 1506672 1506720
    time_bounds         (time, hist_interval) object 2094-12-31 00:00:00 ... ...
    date_written        (time) |S16 b'02/01/21' b'02/01/21' ... b'02/02/21'
    time_written        (time) |S16 b'23:11:01' b'23:25:32' ... b'11:34:39'
    TREFMXAV            (time, landunit) float32 ...
Attributes: (12/102)
    title:                                     CLM History file information
    comment:                                   NOTE: None of the variables ar...
    Conventions:                               CF-1.0
    history:                                   created on 02/01/21 23:11:01
    source:                                    Community Land Model CLM4.0
    hostname:                                  aleph
    ...                                        ...
    cft_irrigated_tropical_corn:               62
    cft_tropical_soybean:                      63
    cft_irrigated_tropical_soybean:            64
    time_period_freq:                          day_1
    Time_constant_3Dvars_filename:             ./b.e21.BSSP370smbb.f09_g17.LE...
    Time_constant_3Dvars:                      ZSOI:DZSOI:WATSAT:SUCSAT:BSW:H...
[4]:
clm.ds.landunit
[4]:
<xarray.DataArray 'landunit' (landunit: 62125)>
array([    0,     1,     2, ..., 62122, 62123, 62124])
Dimensions without coordinates: landunit
[5]:
clm.ds.land1d_ixy
[5]:
<xarray.DataArray 'land1d_ixy' (landunit: 62125)>
array([  1,   1,   1, ..., 265, 265, 265], dtype=int32)
Dimensions without coordinates: landunit
Attributes:
    long_name:  2d longitude index of corresponding landunit
[6]:
clm.ds.land1d_jxy
[6]:
<xarray.DataArray 'land1d_jxy' (landunit: 62125)>
array([  1,   1,   1, ..., 186, 186, 186], dtype=int32)
Dimensions without coordinates: landunit
Attributes:
    long_name:  2d latitude index of corresponding landunit