Full Result Table for RETAILROCKET

You sort the results by clicking on the table headers.

Metrics	MRR@3	MRR@5	MRR@10	MRR@20	HR@3	HR@5	HR@10	HR@20	Coverage@20	Popularity@20	T-time(s)	P-time (s)
VSTAN	0.597	0.621	0.630	0.631	0.804	0.906	0.971	0.980	0.045	0.034	4.997	0.013
SR-GNN	0.608	0.621	0.629	0.631	0.776	0.833	0.886	0.931	0.054	0.044	4411.257	0.033
SFSKNN	0.555	0.581	0.599	0.603	0.702	0.812	0.939	0.980	0.036	0.031	4.785	0.004
CM-HGNN	0.533	0.552	0.562	0.568	0.661	0.739	0.812	0.890	0.051	0.035	7489.880	0.005
GNRRW	0.524	0.542	0.553	0.558	0.649	0.722	0.804	0.869	0.047	0.036	1135.769	0.008
MGS	0.520	0.541	0.553	0.558	0.637	0.727	0.820	0.878	0.049	0.040	38992.823	0.022
COTREC	0.517	0.542	0.551	0.556	0.620	0.731	0.792	0.861	0.048	0.031	37114.802	1.269
STAN	0.508	0.530	0.544	0.548	0.665	0.763	0.873	0.931	0.052	0.037	2.546	0.009
TAGNN	0.508	0.525	0.540	0.545	0.624	0.698	0.804	0.882	0.052	0.049	23463.542	0.006
SR	0.505	0.539	0.553	0.560	0.624	0.771	0.865	0.959	0.045	0.034	4.887	0.004
VSKNN	0.489	0.511	0.529	0.537	0.624	0.722	0.849	0.947	0.046	0.043	3.792	0.015
GCE-GNN	0.405	0.417	0.423	0.429	0.498	0.547	0.596	0.678	0.047	0.045	7313.486	0.653
FLCSP	0.399	0.432	0.451	0.455	0.514	0.661	0.800	0.865	0.057	0.037	1939.250	0.010

One-way ANOVA

We perform a significance test by calculating session-individual metric values e.g., obtain the mean MRR values of all sessions in the test data and then use the ANOVA test to analyze whether the difference between the compared models is significant.

Null hypothesis (H_o): There is no difference in the model means
Alternative hypothesis (H₁): At least one model's mean differs from the others
Significance level = 0.05
F-value = 3.725
P-value = 0.00004

The P-value is less than the value of of significance level. So, we have enough evidence to reject the H_o. We now know that at least the mean of a model is different from the others. Therefore, the post hoc analysis was performed using the Tukey test to determine the mean of which model differs from the others. Our analysis reveals that the following models have different means while others have the same mean. Moreover, in the first table, we can see that the MRR values of the best performing models e.g., SFSKNN, STAN, VTSAN, SR, MGS and COTREC are nearly equal. Therefore, the ANOVA test reveals that top-performing models do not have significant difference among their mean MRR values.

Note: we report only those cases of the Tukey test where the H0 is rejected.

group1	group2	meandiff	p-adj	lower	upper	reject
FLCSP	GGNN	0.1368	0.001	0.0332	0.2404	TRUE
FLCSP	VSTAN	0.1405	0.0006	0.0369	0.2442	TRUE
FLCSP	sfcknn	0.1246	0.005	0.0209	0.2282	TRUE
GCEGNN	GGNN	0.1239	0.0054	0.02	0.2276	TRUE
GCEGNN	VSTAN	0.1277	0.0034	0.0241	0.2313	TRUE
GCEGNN	sfcknn	0.1117	0.0219	0.0081	0.2153	TRUE