You sort the results by clicking on the table headers.
Metrics | MRR@3 | MRR@5 | MRR@10 | MRR@20 | HR@3 | HR@5 | HR@10 | HR@20 | Coverage@20 | Popularity@20 | T-time(s) | P-time (s) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
VSTAN | 0.597 | 0.621 | 0.630 | 0.631 | 0.804 | 0.906 | 0.971 | 0.980 | 0.045 | 0.034 | 4.997 | 0.013 |
SR-GNN | 0.608 | 0.621 | 0.629 | 0.631 | 0.776 | 0.833 | 0.886 | 0.931 | 0.054 | 0.044 | 4411.257 | 0.033 |
SFSKNN | 0.555 | 0.581 | 0.599 | 0.603 | 0.702 | 0.812 | 0.939 | 0.980 | 0.036 | 0.031 | 4.785 | 0.004 |
CM-HGNN | 0.533 | 0.552 | 0.562 | 0.568 | 0.661 | 0.739 | 0.812 | 0.890 | 0.051 | 0.035 | 7489.880 | 0.005 |
GNRRW | 0.524 | 0.542 | 0.553 | 0.558 | 0.649 | 0.722 | 0.804 | 0.869 | 0.047 | 0.036 | 1135.769 | 0.008 |
MGS | 0.520 | 0.541 | 0.553 | 0.558 | 0.637 | 0.727 | 0.820 | 0.878 | 0.049 | 0.040 | 38992.823 | 0.022 |
COTREC | 0.517 | 0.542 | 0.551 | 0.556 | 0.620 | 0.731 | 0.792 | 0.861 | 0.048 | 0.031 | 37114.802 | 1.269 |
STAN | 0.508 | 0.530 | 0.544 | 0.548 | 0.665 | 0.763 | 0.873 | 0.931 | 0.052 | 0.037 | 2.546 | 0.009 |
TAGNN | 0.508 | 0.525 | 0.540 | 0.545 | 0.624 | 0.698 | 0.804 | 0.882 | 0.052 | 0.049 | 23463.542 | 0.006 |
SR | 0.505 | 0.539 | 0.553 | 0.560 | 0.624 | 0.771 | 0.865 | 0.959 | 0.045 | 0.034 | 4.887 | 0.004 |
VSKNN | 0.489 | 0.511 | 0.529 | 0.537 | 0.624 | 0.722 | 0.849 | 0.947 | 0.046 | 0.043 | 3.792 | 0.015 |
GCE-GNN | 0.405 | 0.417 | 0.423 | 0.429 | 0.498 | 0.547 | 0.596 | 0.678 | 0.047 | 0.045 | 7313.486 | 0.653 |
FLCSP | 0.399 | 0.432 | 0.451 | 0.455 | 0.514 | 0.661 | 0.800 | 0.865 | 0.057 | 0.037 | 1939.250 | 0.010 |
We perform a significance test by calculating session-individual metric values e.g., obtain the mean MRR values of all sessions in the test data and then use the ANOVA test to analyze whether the difference between the compared models is significant.
Null hypothesis (Ho): There is no difference in the model means
Alternative hypothesis (H1): At least one model's mean differs from the others
Significance level = 0.05
F-value = 3.725
P-value = 0.00004
The P-value is less than the value of of significance level. So, we have enough evidence to reject the Ho. We now know that at least the mean of a model is different from the others. Therefore, the post hoc analysis was performed using the Tukey test to determine the mean of which model differs from the others. Our analysis reveals that the following models have different means while others have the same mean. Moreover, in the first table, we can see that the MRR values of the best performing models e.g., SFSKNN, STAN, VTSAN, SR, MGS and COTREC are nearly equal. Therefore, the ANOVA test reveals that top-performing models do not have significant difference among their mean MRR values.
Note: we report only those cases of the Tukey test where the H0 is rejected.group1 | group2 | meandiff | p-adj | lower | upper | reject |
FLCSP | GGNN | 0.1368 | 0.001 | 0.0332 | 0.2404 | TRUE |
FLCSP | VSTAN | 0.1405 | 0.0006 | 0.0369 | 0.2442 | TRUE |
FLCSP | sfcknn | 0.1246 | 0.005 | 0.0209 | 0.2282 | TRUE |
GCEGNN | GGNN | 0.1239 | 0.0054 | 0.02 | 0.2276 | TRUE |
GCEGNN | VSTAN | 0.1277 | 0.0034 | 0.0241 | 0.2313 | TRUE |
GCEGNN | sfcknn | 0.1117 | 0.0219 | 0.0081 | 0.2153 | TRUE |