FEEL (Framework for Emotion Evaluation) is the first large-scale benchmarking study for emotion recognition using EDA and PPG signals across 19 publicly available datasets, enabling systematic analysis of model generalizability.
Datasets
19
Publicly Available
Architectures
16
Models Benchmarked
Paradigms
4
Modeling Approaches
Signals
2
EDA & PPG
📝 Research Timeline
CSCSW 2024
Translating Emotions to Annotations: A Participant's Perspective of Physiological Emotion Data Collection
Four-Class Classification (HAPV, HANV, LAPV, LANV) - All 19 Datasets
Rank
Dataset
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
1
WESAD
0.987
RF
0.794
RF
0.987
LDA
2
ForDigitStress
0.682
LDA
0.821
RF
0.826
RF
3
ScientISST MOVE
0.701
CLSP MLP 25%
0.740
CLSP CNN 50%
0.800
CLSP CNN 50%
4
MAUS
0.700
HC-MLP
0.705
RF
0.728
RF
5
PhyMER
0.723
CLSP CNN 50%
0.300
RF
0.342
RF
6
UBFCPHYS
0.705
CLSP Zero-Shot
0.551
LDA
0.622
LDA
7
MOCAS
0.701
CLSP MLP 25%
0.357
RF
0.366
RF
8
EMOGNITION
0.572
RF
0.601
CLSP CNN 50%
0.513
RF
9
Dapper
0.434
RF
0.426
RF
0.555
RF
10
Exercise
0.552
CLSP CNN 25%
0.438
HC-MLP
0.480
RF
11
LAUREATE
0.527
CLSP MLP 5%
0.460
RF
0.461
RF
12
CASE
0.476
RF
0.397
RF
0.498
RF
13
VERBIO
0.480
CLSP Zero-Shot
0.582
CLSP Zero-Shot
0.436
CLSP Zero-Shot
14
NURSE
0.433
CLSP Zero-Shot
0.667
CLSP Zero-Shot
0.520
CLSP Zero-Shot
15
CLAS
0.430
RF
0.408
HC-MLP
0.459
RF
16
Unobtrusive
0.402
RF
0.409
CLSP Zero-Shot
0.393
HC-MLP
17
ADARP
0.269
CLSP Zero-Shot
0.433
CLSP Zero-Shot
0.354
CLSP Zero-Shot
18
CEAP-360VR
0.285
CLSP MLP 25%
0.307
RF
0.314
RF
19
EmoWear
0.293
CLSP CNN 50%
0.270
HC-MLP
0.282
HC-MLP
Arousal Classification
Testing Cohort
Training Cohort
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Lab
Real
0.72
CLSP CNN 5%
0.57
CLSP MLP 5%
0.71
CLSP MLP 5%
Lab
Constraint
0.56
CLSP MLP 50%
0.61
RF
0.60
RF
Lab
Lab
0.50
RF
0.50
RF
0.52
RF
Constraint
Real
0.68
RF
0.51
RF
0.64
CLSP MLP 5%
Constraint
Lab
0.44
HCMLP
0.67
LDA
0.64
LDA
Constraint
Constraint
0.48
HCMLP
0.48
RF
0.48
RF
Real
Constraint
0.65
CLSP MLP 5%
0.59
RF
0.73
CLSP MLP 5%
Real
Lab
0.59
HCMLP
0.69
LDA
0.72
CLSP MLP 25%
Real
Real
0.49
HCMLP
0.48
RF
0.46
RF
Valence Classification
Testing Cohort
Training Cohort
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Lab
Real
0.79
CLSP MLP 5%
0.69
RF
0.79
CLSP MLP 25%
Lab
Constraint
0.66
CLSP MLP 25%
0.67
CLSP CNN 5%
0.68
CLSP MLP 25%
Lab
Lab
0.54
RF
0.50
HCMLP
0.51
HCMLP
Constraint
Real
0.76
RF
0.78
RF
0.77
RF
Constraint
Lab
0.76
RF
0.72
RF
0.74
RF
Constraint
Constraint
0.63
RF
0.64
RF
0.65
RF
Real
Constraint
0.76
RF
0.70
RF
0.88
RF
Real
Lab
0.72
RF
0.64
CLSP MLP 25%
0.76
RF
Real
Real
0.41
HCMLP
0.41
HCMLP
0.42
HCMLP
Arousal Classification
Testing Device
Training Device
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Custom Wearable
E4 Wearable
0.65
CLSP MLP 50%
0.82
RF
0.73
CLSP MLP 50%
Custom Wearable
Lab-Based
0.62
RF
0.77
RF
0.81
CLSP CNN 50%
Custom Wearable
Custom Wearable
0.34
RF
0.26
RF
0.30
RF
E4 Wearable
Lab-Based
0.67
CLSP CNN 50%
0.73
CLSP CNN 50%
0.73
RF
E4 Wearable
Custom Wearable
0.64
CLSP CNN 50%
0.57
RF
0.66
CLSP CNN 50%
E4 Wearable
E4 Wearable
0.62
RF
0.60
RF
0.61
RF
Lab-Based
Lab-Based
0.60
RF
0.62
RF
0.62
RF
Lab-Based
E4 Wearable
0.45
RF
0.52
HCMLP
0.57
HCMLP
Lab-Based
Custom Wearable
0.51
RF
0.53
RF
0.54
RF
Valence Classification
Testing Device
Training Device
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Custom Wearable
E4 Wearable
0.70
CLSP MLP 50%
0.82
RF
0.82
CLSP MLP 50%
Custom Wearable
Lab-Based
0.71
LDA
0.81
LDA
0.81
LDA
Custom Wearable
Custom Wearable
0.34
RF
0.26
RF
0.28
RF
E4 Wearable
Lab-Based
0.67
CLSP MLP 25%
0.64
RF
0.73
CLSP CNN 50%
E4 Wearable
Custom Wearable
0.60
CLSP CNN 50%
0.55
CLSP MLP 50%
0.62
CLSP MLP 50%
E4 Wearable
E4 Wearable
0.59
RF
0.56
HCMLP
0.61
HCMLP
Lab-Based
Custom Wearable
0.62
CLSP CNN 5%
0.61
RF
0.62
RF
Lab-Based
E4 Wearable
0.52
HCMLP
0.57
HCMLP
0.54
HCMLP
Lab-Based
Lab-Based
0.52
RF
0.45
HCMLP
0.47
HCMLP
Arousal Classification
Testing Label
Training Label
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Stimulus-Label
Expert-Annotated
0.64
CLSP MLP 5%
0.72
RF
0.65
CLSP MLP 50%
Stimulus-Label
Self-report
0.62
RF
0.44
CLSP CNN 5%
0.57
CLSP CNN 5%
Stimulus-Label
Stimulus-Label
0.54
RF
0.51
RF
0.55
HCMLP
Self-report
Expert-Annotated
0.65
CLSP MLP 5%
0.64
CLSP CNN 50%
0.69
CLSP MLP 5%
Self-report
Stimulus-Label
0.57
HCMLP
0.51
CLSP CNN 50%
0.63
RF
Self-report
Self-report
0.53
HCMLP
0.52
HCMLP
0.52
HCMLP
Expert-Annotated
Self-report
0.87
RF
0.69
LDA
0.84
RF
Expert-Annotated
Stimulus-Label
0.79
CLSP CNN 50%
0.70
CLSP MLP 50%
0.82
RF
Expert-Annotated
Expert-Annotated
0.52
RF
0.28
RF
0.48
HCMLP
Valence Classification
Testing Label
Training Label
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Stimulus-Label
Expert-Annotated
0.65
CLSP MLP 5%
0.65
CLSP CNN 50%
0.65
CLSP CNN 25%
Stimulus-Label
Self-report
0.63
CLSP CNN 25%
0.61
CLSP CNN 5%
0.61
CLSP CNN 5%
Stimulus-Label
Stimulus-Label
0.61
RF
0.53
RF
0.52
RF
Self-report
Expert-Annotated
0.69
CLSP MLP 50%
0.72
RF
0.76
CLSP CNN 50%
Self-report
Stimulus-Label
0.57
LDA
0.59
CLSP MLP 5%
0.56
LDA
Self-report
Self-report
0.53
RF
0.48
HCMLP
0.52
HCMLP
Expert-Annotated
Stimulus-Label
0.87
LDA
0.85
RF
0.87
CLSP CNN 5%
Expert-Annotated
Self-report
0.83
CLSP CNN 25%
0.85
CLSP CNN 50%
0.74
CLSP CNN 50%
Expert-Annotated
Expert-Annotated
0.56
HCMLP
0.42
RF
0.49
HCMLP
Gender-Based Transfer - Arousal Classification
Testing Group
Training Group
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Male
Female
0.56
HCMLP
0.51
LDA
0.54
LDA
Male
Male
0.56
RF
0.51
HCMLP
0.56
RF
Female
Female
0.52
RF
0.55
HCMLP
0.56
HCMLP
Female
Male
0.50
LDA
0.51
LDA
0.53
LDA
Gender-Based Transfer - Valence Classification
Testing Group
Training Group
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Male
Female
0.69
CLSP MLP 25%
0.71
RF
0.70
CLSP CNN 50%
Male
Male
0.53
HCMLP
0.52
HCMLP
0.47
RF
Female
Male
0.71
CLSP MLP 50%
0.70
CLSP CNN 50%
0.70
CLSP MLP 25%
Female
Female
0.55
HCMLP
0.49
RF
0.54
HCMLP
Age-Based Transfer - Arousal Classification
Testing Group
Training Group
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Old (>25 years)
Young (18-25 years)
0.51
LDA
0.56
HCMLP
0.56
HCMLP
Old (>25 years)
Old (>25 years)
0.55
RF
0.55
RF
0.53
RF
Young (18-25 years)
Young (18-25 years)
0.55
HCMLP
0.53
RF
0.58
HCMLP
Young (18-25 years)
Old (>25 years)
0.50
LDA
0.43
LDA
0.47
CLSP MLP 50%
Age-Based Transfer - Valence Classification
Testing Group
Training Group
EDA
PPG
Combined
F1
Best Model
F1
Best Model
F1
Best Model
Old (>25 years)
Young (18-25 years)
0.73
CLSP MLP 50%
0.72
CLSP CNN 50%
0.73
RF
Old (>25 years)
Old (>25 years)
0.53
RF
0.57
RF
0.53
RF
Young (18-25 years)
Old (>25 years)
0.72
CLSP MLP 5%
0.67
RF
0.69
RF
Young (18-25 years)
Young (18-25 years)
0.54
RF
0.51
RF
0.48
RF
Model Architectures
CLSP fine-tuning with conditional context optimization (CoCoOp)
Four Modeling Paradigms (16 Architectures)
1. Traditional ML
Models: RF, LDA
Input: Handcrafted features
Random Forest
LDA
Top: 59/171
2. DL + Handcrafted
Models: 4 variants
Input: Handcrafted features
MLP
ResNet
LSTM+MLP
Attention+MLP
Top: 21/171
3. DL on Raw Signals
Models: 3 variants
Input: Raw time-series
Signal ResNet
Signal LSTM+MLP
CNN+Transformer
Top: 3/171
4. Pretrained CLSP
Models: 7 variants
Input: Pretrained embeddings
Zero-Shot
MLP (5/25/50%)
CNN (5/25/50%)
Top: 88/171
Key Model Insights
CLSP is the overall winner with 88/171 (51.5%) - dominates binary tasks classification
Few-Shot Power: 23 top instances with only 5% training data
Classical ML Competitive: RF and LDA remain strong for small datasets
Handcrafted Features Win: 166/171 (97%) top models use domain knowledge
🎯 Contribute to FEEL
Help expand the FEEL benchmark by submitting your model results or proposing new datasets. We welcome contributions that evaluate novel architectures, introduce new preprocessing techniques, or extend analysis to additional heterogeneity dimensions.