Level-3.2: Outlier flagging

Level-3.2: Outlier flagging#

Info

Level-3.2 creates additional quality flags. No data are removed in this step, only the flags are created.

Outlier tests are based on data where some outliers were already removed. In particular, after the creation of additional quality flags in Level-2, the individual Level-2 flags are combined into one overall quality flag QCF. Storage-corrected fluxes from Level-3.1 were then temporarily filtered with this QCF to remove fluxes of low quality, and the filtered data were then used for detecting outliers. This way the outlier removal functions do not take into account values that were already flagged for removal in Level-2.

Outlier tests were run sequentially: results from one test were based on results from the previous test. Running all tests on the original, non-outlier removed data yielded unsatisfactory results.

Description of outlier functions#

Generally, the following outlier tests were used:

Absolute limits: flag values outside a physically plausible range.
Manual flag: flag specific time periods, e.g., due to known instrument failure
Hampel filter, separate for daytime and nighttime. The Hampel filter identifies anomalies in time-series data using a sliding window of adjustable size. Within each window, it compares each data point to the Median Absolute Deviation (MAD). Points exceeding the MAD by a specified multiple (adjustable) are flagged as outliers.
Local standard deviation, with rolling median and constant standard deviation (SD). SD was calculated across all data and then used in combination with the rolling window.
Local outlier factor, separate for daytime and nighttime. Local Outlier Factor (LOF) is an unsupervised anomaly detection method. It calculates an anomaly score based on the local density deviation of a sample compared to its k-nearest neighbors. Samples with significantly lower density than their neighbors are identified as outliers. See also the official description here.
Rolling z-score, identify outliers based on the rolling z-score of records. For each record, the rolling z-score is calculated from the rolling mean and rolling standard deviation, centered on the respective value.

Outlier detection settings#

Outlier methods are given for each flux in the order of sequential application.

NEE (net ecosystem exchange of CO₂)#

Units: µmol CO₂ m^-2 s^-1

Absolute limits: flag data outside [-50, 50]
Manual flag: flag data between the two dates ['2008-12-01', '2009-05-01']
Hampel filter separate for daytime and nighttime with the settings window_length=48*13 (corresponds to 13 days of half-hourly data), n_sigma_dt=3.5 and n_sigma_nt=3.5 (same n_sigma for daytime and nighttime). This test worked well for NEE. Test repeated until all outliers removed.
Local standard deviation, with rolling median and constant standard deviation with the settings n_sd=3.5 and winsize=48*13. Test repeated until all outliers removed.

LE (latent heat)#

Units: W m^-2

Absolute limits: flag data outside [-50, 800]
Manual flag: flag data between the two dates ['2008-12-01', '2009-05-01']
Hampel filter separate for daytime and nighttime with the settings window_length=48*13 (corresponds to 13 days of half-hourly data), n_sigma_dt=3.5 and n_sigma_nt=3.5 (same n_sigma for daytime and nighttime). Test repeated until all outliers removed.
Local standard deviation, with rolling median and constant standard deviation with the settings n_sd=4.5 and winsize=48*13. Test repeated until all outliers removed.
Local outlier factor, separate for daytime and nighttime with the settings n_neighbors=50 and contamination=None. Test not repeated, only run once.

H (sensible heat)#

Units: W m^-2

Absolute limits: flag data outside [-200, 400]
Manual flag: flag data between the two dates ['2008-12-01', '2009-05-01']
Hampel filter separate for daytime and nighttime with the settings window_length=48*13 (corresponds to 13 days of half-hourly data), n_sigma_dt=3.5 and n_sigma_nt=3.5 (same n_sigma for daytime and nighttime). Test repeated until all outliers removed.
Local standard deviation, with rolling median and constant standard deviation with the settings n_sd=5 and winsize=48*13. Test repeated until all outliers removed.

FN2O (nitrous oxide flux)#

Units: nmol N₂O m^-2 s^-1

Absolute limits: flag data outside [-5, 70]
Rolling z-score, with the settings winsize=48*3 and thres_zscore=10. Test repeated until all outliers removed.
Local standard deviation, with rolling median and rolling standard deviation with the settings n_sd=8 and winsize=48*3. Test repeated until all outliers removed.

FCH4 (methane flux)#

Units: nmol CH₄ m^-2 s^-1

Absolute limits: flag data outside [-100, 1100]
Rolling z-score, with the settings winsize=48*3 and thres_zscore=8. Test repeated until all outliers removed.
Local standard deviation, with rolling median and rolling standard deviation with the settings n_sd=7 and winsize=48*3. Test repeated until all outliers removed.