Full Result Table for RETAILROCKET

You sort the results by clicking on the table headers.


Metrics MRR@3 MRR@5 MRR@10 MRR@20 HR@3 HR@5 HR@10 HR@20 Coverage@20 Popularity@20 T-time(s) P-time (s)
VSTAN 0.597 0.621 0.630 0.631 0.804 0.906 0.971 0.980 0.045 0.034 4.997 0.013
SR-GNN 0.608 0.621 0.629 0.631 0.776 0.833 0.886 0.931 0.054 0.044 4411.257 0.033
SFSKNN 0.555 0.581 0.599 0.603 0.702 0.812 0.939 0.980 0.036 0.031 4.785 0.004
CM-HGNN 0.533 0.552 0.562 0.568 0.661 0.739 0.812 0.890 0.051 0.035 7489.880 0.005
GNRRW 0.524 0.542 0.553 0.558 0.649 0.722 0.804 0.869 0.047 0.036 1135.769 0.008
MGS 0.520 0.541 0.553 0.558 0.637 0.727 0.820 0.878 0.049 0.040 38992.823 0.022
COTREC 0.517 0.542 0.551 0.556 0.620 0.731 0.792 0.861 0.048 0.031 37114.802 1.269
STAN 0.508 0.530 0.544 0.548 0.665 0.763 0.873 0.931 0.052 0.037 2.546 0.009
TAGNN 0.508 0.525 0.540 0.545 0.624 0.698 0.804 0.882 0.052 0.049 23463.542 0.006
SR 0.505 0.539 0.553 0.560 0.624 0.771 0.865 0.959 0.045 0.034 4.887 0.004
VSKNN 0.489 0.511 0.529 0.537 0.624 0.722 0.849 0.947 0.046 0.043 3.792 0.015
GCE-GNN 0.405 0.417 0.423 0.429 0.498 0.547 0.596 0.678 0.047 0.045 7313.486 0.653
FLCSP 0.399 0.432 0.451 0.455 0.514 0.661 0.800 0.865 0.057 0.037 1939.250 0.010

One-way ANOVA

We perform a significance test by calculating session-individual metric values e.g., obtain the mean MRR values of all sessions in the test data and then use the ANOVA test to analyze whether the difference between the compared models is significant.

Null hypothesis (Ho): There is no difference in the model means
Alternative hypothesis (H1): At least one model's mean differs from the others
Significance level = 0.05
F-value = 3.725
P-value = 0.00004

The P-value is less than the value of of significance level. So, we have enough evidence to reject the Ho. We now know that at least the mean of a model is different from the others. Therefore, the post hoc analysis was performed using the Tukey test to determine the mean of which model differs from the others. Our analysis reveals that the following models have different means while others have the same mean. Moreover, in the first table, we can see that the MRR values of the best performing models e.g., SFSKNN, STAN, VTSAN, SR, MGS and COTREC are nearly equal. Therefore, the ANOVA test reveals that top-performing models do not have significant difference among their mean MRR values.

Note: we report only those cases of the Tukey test where the H0 is rejected.
group1 group2 meandiff p-adj lower upper reject
FLCSP GGNN 0.1368 0.001 0.0332 0.2404 TRUE
FLCSP VSTAN 0.1405 0.0006 0.0369 0.2442 TRUE
FLCSP sfcknn 0.1246 0.005 0.0209 0.2282 TRUE
GCEGNN GGNN 0.1239 0.0054 0.02 0.2276 TRUE
GCEGNN VSTAN 0.1277 0.0034 0.0241 0.2313 TRUE
GCEGNN sfcknn 0.1117 0.0219 0.0081 0.2153 TRUE