Collect partitioning results#

Author: Lukas Hörtnagl (holukas@ethz.ch)

Info#

  • Previously gap-filled TA, SW_IN (Rg) and VPD were used to partition NEE (daytime and nighttime method)

  • Using NEE fluxes gap-filled with random forest

  • Using the partitioning algorithm implementations in REddyProc

Imports#

from pathlib import Path
import pandas as pd
from diive.core.io.files import save_parquet, load_parquet
from diive.core.times.times import TimestampSanitizer

Load partitioning results for fluxes gap-filled with random forest#

partitioning_results = "61.2_CH-CHA_NEE_RF-GAPF_PART_RP-20250319215835.csv"
results = pd.read_csv(partitioning_results)
results = results.set_index("TIMESTAMP")
results.index.name = "TIMESTAMP_END"
results = TimestampSanitizer(data=results).get()
results
Tair_orig Tair_f Tair_fqc Tair_fall Tair_fall_qc Tair_fnum Tair_fsd Tair_fmeth Tair_fwin Rg_orig Rg_f Rg_fqc Rg_fall Rg_fall_qc Rg_fnum ... FP_GPP2000 FP_k FP_beta FP_alpha FP_RRef FP_E0 FP_k_sd FP_beta_sd FP_alpha_sd FP_RRef_sd FP_E0_sd Reco_DT_U50 GPP_DT_U50 Reco_DT_U50_SD GPP_DT_U50_SD
TIMESTAMP_MIDDLE
2005-01-01 00:15:00 1.566667 1.566667 0 1.566667 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.093071 0.0 0.080016 0.0
2005-01-01 00:45:00 1.533333 1.533333 0 1.533333 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.092682 0.0 0.079688 0.0
2005-01-01 01:15:00 1.566667 1.566667 0 1.566667 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.093071 0.0 0.080016 0.0
2005-01-01 01:45:00 1.566667 1.566667 0 1.566667 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.093071 0.0 0.080016 0.0
2005-01-01 02:15:00 1.500000 1.500000 0 1.500000 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0.092295 0.0 0.079361 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-12-31 21:45:00 -1.919472 -1.919472 0 -1.919472 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.091028 0.0 0.265808 0.0
2024-12-31 22:15:00 -2.104678 -2.104678 0 -2.104678 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.078751 0.0 0.264327 0.0
2024-12-31 22:45:00 -2.089444 -2.089444 0 -2.089444 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.079759 0.0 0.264447 0.0
2024-12-31 23:15:00 -2.355761 -2.355761 0 -2.355761 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.062164 0.0 0.262373 0.0
2024-12-31 23:45:00 -2.578839 -2.578839 0 -2.578839 NaN NaN NaN NaN NaN 0.0 0.0 0 0.0 NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.047483 0.0 0.260688 0.0

350640 rows × 129 columns

Identify GPP and RECO columns#

partcols = [c for c in results.columns if any(substring in c for substring in ["GPP", "Reco"])];
partcols = [c for c in partcols if not str(c).endswith("_fqc")]  # These are from REddyProc's MDS gap-filling, but data were already gap-filled, therefore not needed
partcols.remove("FP_GPP2000")  # Not needed
partcols
['Reco_U16',
 'GPP_U16_f',
 'Reco_DT_U16',
 'GPP_DT_U16',
 'Reco_DT_U16_SD',
 'GPP_DT_U16_SD',
 'Reco_U84',
 'GPP_U84_f',
 'Reco_DT_U84',
 'GPP_DT_U84',
 'Reco_DT_U84_SD',
 'GPP_DT_U84_SD',
 'Reco_U50',
 'GPP_U50_f',
 'Reco_DT_U50',
 'GPP_DT_U50',
 'Reco_DT_U50_SD',
 'GPP_DT_U50_SD']

Create subset with GPP and RECO columns#

subset_partcols = results[partcols].copy()
subset_partcols
Reco_U16 GPP_U16_f Reco_DT_U16 GPP_DT_U16 Reco_DT_U16_SD GPP_DT_U16_SD Reco_U84 GPP_U84_f Reco_DT_U84 GPP_DT_U84 Reco_DT_U84_SD GPP_DT_U84_SD Reco_U50 GPP_U50_f Reco_DT_U50 GPP_DT_U50 Reco_DT_U50_SD GPP_DT_U50_SD
TIMESTAMP_MIDDLE
2005-01-01 00:15:00 1.800748 0.562911 0.476922 0.0 0.293865 0.0 1.746895 1.105825 0.089205 0.0 0.122912 0.0 1.830543 0.918553 0.093071 0.0 0.080016 0.0
2005-01-01 00:45:00 1.799303 0.575336 0.475155 0.0 0.292817 0.0 1.744107 1.101436 0.088843 0.0 0.122413 0.0 1.828898 0.917972 0.092682 0.0 0.079688 0.0
2005-01-01 01:15:00 1.800748 0.170341 0.476922 0.0 0.293865 0.0 1.746895 0.462104 0.089205 0.0 0.122912 0.0 1.830543 0.163001 0.093071 0.0 0.080016 0.0
2005-01-01 01:45:00 1.800748 0.277298 0.476922 0.0 0.293865 0.0 1.746895 0.460866 0.089205 0.0 0.122912 0.0 1.830543 0.190890 0.093071 0.0 0.080016 0.0
2005-01-01 02:15:00 1.797856 0.189333 0.473392 0.0 0.291772 0.0 1.741320 0.402870 0.088482 0.0 0.121916 0.0 1.827253 0.167042 0.092295 0.0 0.079361 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-12-31 21:45:00 0.763007 -0.401174 0.743788 0.0 0.365033 0.0 0.946163 -0.264334 1.500257 0.0 0.321281 0.0 0.825725 -0.334996 1.091028 0.0 0.265808 0.0
2024-12-31 22:15:00 0.759039 -0.369966 0.732102 0.0 0.359192 0.0 0.936404 -0.261633 1.482391 0.0 0.321697 0.0 0.820921 -0.310533 1.078751 0.0 0.264327 0.0
2024-12-31 22:45:00 0.759366 -0.287601 0.733060 0.0 0.359670 0.0 0.937206 -0.109761 1.483858 0.0 0.321663 0.0 0.821317 -0.225651 1.079759 0.0 0.264447 0.0
2024-12-31 23:15:00 0.753639 -0.619035 0.716399 0.0 0.351404 0.0 0.923204 -0.449470 1.458274 0.0 0.322279 0.0 0.814389 -0.558285 1.062164 0.0 0.262373 0.0
2024-12-31 23:45:00 0.748823 -0.287036 0.702581 0.0 0.344608 0.0 0.911507 -0.234188 1.436947 0.0 0.322807 0.0 0.808567 -0.317543 1.047483 0.0 0.260688 0.0

350640 rows × 18 columns

Rename partitioning variables#

These original NEE flux columns were renamed and then used during partitioning:

  • NEE_L3.1_L3.3_CUT_16_QCF_gfRF

  • NEE_L3.1_L3.3_CUT_50_QCF_gfRF

  • NEE_L3.1_L3.3_CUT_84_QCF_gfRF

renaming_dict = {
    
    'GPP_DT_U16': 'GPP_DT_CUT_16_gfRF',
    'GPP_DT_U16_SD': 'GPP_DT_CUT_16_gfRF_SD',
    'GPP_DT_U50': 'GPP_DT_CUT_50_gfRF',
    'GPP_DT_U50_SD': 'GPP_DT_CUT_50_gfRF_SD',
    'GPP_DT_U84': 'GPP_DT_CUT_84_gfRF',
    'GPP_DT_U84_SD': 'GPP_DT_CUT_84_gfRF_SD',

    'GPP_U16_f': 'GPP_NT_CUT_16_gfRF',    
    'GPP_U50_f': 'GPP_NT_CUT_50_gfRF',    
    'GPP_U84_f': 'GPP_NT_CUT_84_gfRF',    

    'Reco_DT_U16': 'RECO_DT_CUT_16_gfRF',
    'Reco_DT_U16_SD': 'RECO_DT_CUT_16_gfRF_SD',
    'Reco_DT_U50': 'RECO_DT_CUT_50_gfRF',
    'Reco_DT_U50_SD': 'RECO_DT_CUT_50_gfRF_SD',
    'Reco_DT_U84': 'RECO_DT_CUT_84_gfRF',
    'Reco_DT_U84_SD': 'RECO_DT_CUT_84_gfRF_SD',

    'Reco_U16': 'RECO_NT_CUT_16_gfRF',
    'Reco_U50': 'RECO_NT_CUT_50_gfRF',
    'Reco_U84': 'RECO_NT_CUT_84_gfRF',
}
subset_partcols = subset_partcols.rename(columns=renaming_dict, inplace=False)
subset_partcols
RECO_NT_CUT_16_gfRF GPP_NT_CUT_16_gfRF RECO_DT_CUT_16_gfRF GPP_DT_CUT_16_gfRF RECO_DT_CUT_16_gfRF_SD GPP_DT_CUT_16_gfRF_SD RECO_NT_CUT_84_gfRF GPP_NT_CUT_84_gfRF RECO_DT_CUT_84_gfRF GPP_DT_CUT_84_gfRF RECO_DT_CUT_84_gfRF_SD GPP_DT_CUT_84_gfRF_SD RECO_NT_CUT_50_gfRF GPP_NT_CUT_50_gfRF RECO_DT_CUT_50_gfRF GPP_DT_CUT_50_gfRF RECO_DT_CUT_50_gfRF_SD GPP_DT_CUT_50_gfRF_SD
TIMESTAMP_MIDDLE
2005-01-01 00:15:00 1.800748 0.562911 0.476922 0.0 0.293865 0.0 1.746895 1.105825 0.089205 0.0 0.122912 0.0 1.830543 0.918553 0.093071 0.0 0.080016 0.0
2005-01-01 00:45:00 1.799303 0.575336 0.475155 0.0 0.292817 0.0 1.744107 1.101436 0.088843 0.0 0.122413 0.0 1.828898 0.917972 0.092682 0.0 0.079688 0.0
2005-01-01 01:15:00 1.800748 0.170341 0.476922 0.0 0.293865 0.0 1.746895 0.462104 0.089205 0.0 0.122912 0.0 1.830543 0.163001 0.093071 0.0 0.080016 0.0
2005-01-01 01:45:00 1.800748 0.277298 0.476922 0.0 0.293865 0.0 1.746895 0.460866 0.089205 0.0 0.122912 0.0 1.830543 0.190890 0.093071 0.0 0.080016 0.0
2005-01-01 02:15:00 1.797856 0.189333 0.473392 0.0 0.291772 0.0 1.741320 0.402870 0.088482 0.0 0.121916 0.0 1.827253 0.167042 0.092295 0.0 0.079361 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2024-12-31 21:45:00 0.763007 -0.401174 0.743788 0.0 0.365033 0.0 0.946163 -0.264334 1.500257 0.0 0.321281 0.0 0.825725 -0.334996 1.091028 0.0 0.265808 0.0
2024-12-31 22:15:00 0.759039 -0.369966 0.732102 0.0 0.359192 0.0 0.936404 -0.261633 1.482391 0.0 0.321697 0.0 0.820921 -0.310533 1.078751 0.0 0.264327 0.0
2024-12-31 22:45:00 0.759366 -0.287601 0.733060 0.0 0.359670 0.0 0.937206 -0.109761 1.483858 0.0 0.321663 0.0 0.821317 -0.225651 1.079759 0.0 0.264447 0.0
2024-12-31 23:15:00 0.753639 -0.619035 0.716399 0.0 0.351404 0.0 0.923204 -0.449470 1.458274 0.0 0.322279 0.0 0.814389 -0.558285 1.062164 0.0 0.262373 0.0
2024-12-31 23:45:00 0.748823 -0.287036 0.702581 0.0 0.344608 0.0 0.911507 -0.234188 1.436947 0.0 0.322807 0.0 0.808567 -0.317543 1.047483 0.0 0.260688 0.0

350640 rows × 18 columns

Export#

Export all data#

filename = "61.4_PARTITIONED_FLUXES_GPP_RECO"
# subset_partcols.to_csv(f"{filename}.csv", index=True)
save_parquet(data=subset_partcols, filename=filename)
Saved file 61.4_PARTITIONED_FLUXES_GPP_RECO.parquet (0.456 seconds).
'61.4_PARTITIONED_FLUXES_GPP_RECO.parquet'

End of notebook#

Congratulations, you reached the end of this notebook! Before you go let’s store your finish time.

from datetime import datetime
dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"Finished. {dt_string}")
Finished. 2025-05-14 12:14:22