Collect partitioning results#
Author: Lukas Hörtnagl (holukas@ethz.ch)
Info#
Previously gap-filled TA, SW_IN (Rg) and VPD were used to partition NEE (daytime and nighttime method)
Using NEE fluxes gap-filled with random forest
Using the partitioning algorithm implementations in
REddyProc
Imports#
from pathlib import Path
import pandas as pd
from diive.core.io.files import save_parquet, load_parquet
from diive.core.times.times import TimestampSanitizer
Load partitioning results for fluxes gap-filled with random forest#
partitioning_results = "61.2_CH-CHA_NEE_RF-GAPF_PART_RP-20250319215835.csv"
results = pd.read_csv(partitioning_results)
results = results.set_index("TIMESTAMP")
results.index.name = "TIMESTAMP_END"
results = TimestampSanitizer(data=results).get()
results
Tair_orig | Tair_f | Tair_fqc | Tair_fall | Tair_fall_qc | Tair_fnum | Tair_fsd | Tair_fmeth | Tair_fwin | Rg_orig | Rg_f | Rg_fqc | Rg_fall | Rg_fall_qc | Rg_fnum | ... | FP_GPP2000 | FP_k | FP_beta | FP_alpha | FP_RRef | FP_E0 | FP_k_sd | FP_beta_sd | FP_alpha_sd | FP_RRef_sd | FP_E0_sd | Reco_DT_U50 | GPP_DT_U50 | Reco_DT_U50_SD | GPP_DT_U50_SD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TIMESTAMP_MIDDLE | |||||||||||||||||||||||||||||||
2005-01-01 00:15:00 | 1.566667 | 1.566667 | 0 | 1.566667 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 00:45:00 | 1.533333 | 1.533333 | 0 | 1.533333 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.092682 | 0.0 | 0.079688 | 0.0 |
2005-01-01 01:15:00 | 1.566667 | 1.566667 | 0 | 1.566667 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 01:45:00 | 1.566667 | 1.566667 | 0 | 1.566667 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 02:15:00 | 1.500000 | 1.500000 | 0 | 1.500000 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.092295 | 0.0 | 0.079361 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-12-31 21:45:00 | -1.919472 | -1.919472 | 0 | -1.919472 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.091028 | 0.0 | 0.265808 | 0.0 |
2024-12-31 22:15:00 | -2.104678 | -2.104678 | 0 | -2.104678 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.078751 | 0.0 | 0.264327 | 0.0 |
2024-12-31 22:45:00 | -2.089444 | -2.089444 | 0 | -2.089444 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.079759 | 0.0 | 0.264447 | 0.0 |
2024-12-31 23:15:00 | -2.355761 | -2.355761 | 0 | -2.355761 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.062164 | 0.0 | 0.262373 | 0.0 |
2024-12-31 23:45:00 | -2.578839 | -2.578839 | 0 | -2.578839 | NaN | NaN | NaN | NaN | NaN | 0.0 | 0.0 | 0 | 0.0 | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1.047483 | 0.0 | 0.260688 | 0.0 |
350640 rows × 129 columns
Identify GPP and RECO columns#
partcols = [c for c in results.columns if any(substring in c for substring in ["GPP", "Reco"])];
partcols = [c for c in partcols if not str(c).endswith("_fqc")] # These are from REddyProc's MDS gap-filling, but data were already gap-filled, therefore not needed
partcols.remove("FP_GPP2000") # Not needed
partcols
['Reco_U16',
'GPP_U16_f',
'Reco_DT_U16',
'GPP_DT_U16',
'Reco_DT_U16_SD',
'GPP_DT_U16_SD',
'Reco_U84',
'GPP_U84_f',
'Reco_DT_U84',
'GPP_DT_U84',
'Reco_DT_U84_SD',
'GPP_DT_U84_SD',
'Reco_U50',
'GPP_U50_f',
'Reco_DT_U50',
'GPP_DT_U50',
'Reco_DT_U50_SD',
'GPP_DT_U50_SD']
Create subset with GPP and RECO columns#
subset_partcols = results[partcols].copy()
subset_partcols
Reco_U16 | GPP_U16_f | Reco_DT_U16 | GPP_DT_U16 | Reco_DT_U16_SD | GPP_DT_U16_SD | Reco_U84 | GPP_U84_f | Reco_DT_U84 | GPP_DT_U84 | Reco_DT_U84_SD | GPP_DT_U84_SD | Reco_U50 | GPP_U50_f | Reco_DT_U50 | GPP_DT_U50 | Reco_DT_U50_SD | GPP_DT_U50_SD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TIMESTAMP_MIDDLE | ||||||||||||||||||
2005-01-01 00:15:00 | 1.800748 | 0.562911 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 1.105825 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.918553 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 00:45:00 | 1.799303 | 0.575336 | 0.475155 | 0.0 | 0.292817 | 0.0 | 1.744107 | 1.101436 | 0.088843 | 0.0 | 0.122413 | 0.0 | 1.828898 | 0.917972 | 0.092682 | 0.0 | 0.079688 | 0.0 |
2005-01-01 01:15:00 | 1.800748 | 0.170341 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 0.462104 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.163001 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 01:45:00 | 1.800748 | 0.277298 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 0.460866 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.190890 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 02:15:00 | 1.797856 | 0.189333 | 0.473392 | 0.0 | 0.291772 | 0.0 | 1.741320 | 0.402870 | 0.088482 | 0.0 | 0.121916 | 0.0 | 1.827253 | 0.167042 | 0.092295 | 0.0 | 0.079361 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-12-31 21:45:00 | 0.763007 | -0.401174 | 0.743788 | 0.0 | 0.365033 | 0.0 | 0.946163 | -0.264334 | 1.500257 | 0.0 | 0.321281 | 0.0 | 0.825725 | -0.334996 | 1.091028 | 0.0 | 0.265808 | 0.0 |
2024-12-31 22:15:00 | 0.759039 | -0.369966 | 0.732102 | 0.0 | 0.359192 | 0.0 | 0.936404 | -0.261633 | 1.482391 | 0.0 | 0.321697 | 0.0 | 0.820921 | -0.310533 | 1.078751 | 0.0 | 0.264327 | 0.0 |
2024-12-31 22:45:00 | 0.759366 | -0.287601 | 0.733060 | 0.0 | 0.359670 | 0.0 | 0.937206 | -0.109761 | 1.483858 | 0.0 | 0.321663 | 0.0 | 0.821317 | -0.225651 | 1.079759 | 0.0 | 0.264447 | 0.0 |
2024-12-31 23:15:00 | 0.753639 | -0.619035 | 0.716399 | 0.0 | 0.351404 | 0.0 | 0.923204 | -0.449470 | 1.458274 | 0.0 | 0.322279 | 0.0 | 0.814389 | -0.558285 | 1.062164 | 0.0 | 0.262373 | 0.0 |
2024-12-31 23:45:00 | 0.748823 | -0.287036 | 0.702581 | 0.0 | 0.344608 | 0.0 | 0.911507 | -0.234188 | 1.436947 | 0.0 | 0.322807 | 0.0 | 0.808567 | -0.317543 | 1.047483 | 0.0 | 0.260688 | 0.0 |
350640 rows × 18 columns
Rename partitioning variables#
These original NEE flux columns were renamed and then used during partitioning:
NEE_L3.1_L3.3_CUT_16_QCF_gfRF
NEE_L3.1_L3.3_CUT_50_QCF_gfRF
NEE_L3.1_L3.3_CUT_84_QCF_gfRF
renaming_dict = {
'GPP_DT_U16': 'GPP_DT_CUT_16_gfRF',
'GPP_DT_U16_SD': 'GPP_DT_CUT_16_gfRF_SD',
'GPP_DT_U50': 'GPP_DT_CUT_50_gfRF',
'GPP_DT_U50_SD': 'GPP_DT_CUT_50_gfRF_SD',
'GPP_DT_U84': 'GPP_DT_CUT_84_gfRF',
'GPP_DT_U84_SD': 'GPP_DT_CUT_84_gfRF_SD',
'GPP_U16_f': 'GPP_NT_CUT_16_gfRF',
'GPP_U50_f': 'GPP_NT_CUT_50_gfRF',
'GPP_U84_f': 'GPP_NT_CUT_84_gfRF',
'Reco_DT_U16': 'RECO_DT_CUT_16_gfRF',
'Reco_DT_U16_SD': 'RECO_DT_CUT_16_gfRF_SD',
'Reco_DT_U50': 'RECO_DT_CUT_50_gfRF',
'Reco_DT_U50_SD': 'RECO_DT_CUT_50_gfRF_SD',
'Reco_DT_U84': 'RECO_DT_CUT_84_gfRF',
'Reco_DT_U84_SD': 'RECO_DT_CUT_84_gfRF_SD',
'Reco_U16': 'RECO_NT_CUT_16_gfRF',
'Reco_U50': 'RECO_NT_CUT_50_gfRF',
'Reco_U84': 'RECO_NT_CUT_84_gfRF',
}
subset_partcols = subset_partcols.rename(columns=renaming_dict, inplace=False)
subset_partcols
RECO_NT_CUT_16_gfRF | GPP_NT_CUT_16_gfRF | RECO_DT_CUT_16_gfRF | GPP_DT_CUT_16_gfRF | RECO_DT_CUT_16_gfRF_SD | GPP_DT_CUT_16_gfRF_SD | RECO_NT_CUT_84_gfRF | GPP_NT_CUT_84_gfRF | RECO_DT_CUT_84_gfRF | GPP_DT_CUT_84_gfRF | RECO_DT_CUT_84_gfRF_SD | GPP_DT_CUT_84_gfRF_SD | RECO_NT_CUT_50_gfRF | GPP_NT_CUT_50_gfRF | RECO_DT_CUT_50_gfRF | GPP_DT_CUT_50_gfRF | RECO_DT_CUT_50_gfRF_SD | GPP_DT_CUT_50_gfRF_SD | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
TIMESTAMP_MIDDLE | ||||||||||||||||||
2005-01-01 00:15:00 | 1.800748 | 0.562911 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 1.105825 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.918553 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 00:45:00 | 1.799303 | 0.575336 | 0.475155 | 0.0 | 0.292817 | 0.0 | 1.744107 | 1.101436 | 0.088843 | 0.0 | 0.122413 | 0.0 | 1.828898 | 0.917972 | 0.092682 | 0.0 | 0.079688 | 0.0 |
2005-01-01 01:15:00 | 1.800748 | 0.170341 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 0.462104 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.163001 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 01:45:00 | 1.800748 | 0.277298 | 0.476922 | 0.0 | 0.293865 | 0.0 | 1.746895 | 0.460866 | 0.089205 | 0.0 | 0.122912 | 0.0 | 1.830543 | 0.190890 | 0.093071 | 0.0 | 0.080016 | 0.0 |
2005-01-01 02:15:00 | 1.797856 | 0.189333 | 0.473392 | 0.0 | 0.291772 | 0.0 | 1.741320 | 0.402870 | 0.088482 | 0.0 | 0.121916 | 0.0 | 1.827253 | 0.167042 | 0.092295 | 0.0 | 0.079361 | 0.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2024-12-31 21:45:00 | 0.763007 | -0.401174 | 0.743788 | 0.0 | 0.365033 | 0.0 | 0.946163 | -0.264334 | 1.500257 | 0.0 | 0.321281 | 0.0 | 0.825725 | -0.334996 | 1.091028 | 0.0 | 0.265808 | 0.0 |
2024-12-31 22:15:00 | 0.759039 | -0.369966 | 0.732102 | 0.0 | 0.359192 | 0.0 | 0.936404 | -0.261633 | 1.482391 | 0.0 | 0.321697 | 0.0 | 0.820921 | -0.310533 | 1.078751 | 0.0 | 0.264327 | 0.0 |
2024-12-31 22:45:00 | 0.759366 | -0.287601 | 0.733060 | 0.0 | 0.359670 | 0.0 | 0.937206 | -0.109761 | 1.483858 | 0.0 | 0.321663 | 0.0 | 0.821317 | -0.225651 | 1.079759 | 0.0 | 0.264447 | 0.0 |
2024-12-31 23:15:00 | 0.753639 | -0.619035 | 0.716399 | 0.0 | 0.351404 | 0.0 | 0.923204 | -0.449470 | 1.458274 | 0.0 | 0.322279 | 0.0 | 0.814389 | -0.558285 | 1.062164 | 0.0 | 0.262373 | 0.0 |
2024-12-31 23:45:00 | 0.748823 | -0.287036 | 0.702581 | 0.0 | 0.344608 | 0.0 | 0.911507 | -0.234188 | 1.436947 | 0.0 | 0.322807 | 0.0 | 0.808567 | -0.317543 | 1.047483 | 0.0 | 0.260688 | 0.0 |
350640 rows × 18 columns
Export#
Export all data#
filename = "61.4_PARTITIONED_FLUXES_GPP_RECO"
# subset_partcols.to_csv(f"{filename}.csv", index=True)
save_parquet(data=subset_partcols, filename=filename)
Saved file 61.4_PARTITIONED_FLUXES_GPP_RECO.parquet (0.456 seconds).
'61.4_PARTITIONED_FLUXES_GPP_RECO.parquet'
End of notebook#
Congratulations, you reached the end of this notebook! Before you go let’s store your finish time.
from datetime import datetime
dt_string = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
print(f"Finished. {dt_string}")
Finished. 2025-05-14 12:14:22