Norbert Fuhr's recommendations for gaining scientific knowledge from experiments:
1. Do not use MRR or MAP;
2. Instead of relative improvements, regard the effect size!
3. For multiple significance tests, use a correction, such as Bonferoni or Tukey's HSD (NB comparing only to the 2nd best method does not help!)
4. There are no significant improvements for re-usable test collections! (hypotheses have to be formulated before the work)
5. Ignore results for collections when there are no baseline from independent research;
6. Test collections wear out! Expected maximum result increases with number of runs (on see leaderboards! -- Carterette's SIGIR paper)
7. Conferences and journals need to accept papers with "null" results. (to prevent the busy beaver / the p-hackers) Reproducibility is important: Publish your code and data
8. Evaluation initiatives are important (but they should only run proper measures and methods)
Towards better experimentation! #SIGIR2020
Advise for #SIGIR:
The "unofficial" Information Retrieval Mastodon Instance.
Goal: Make idf.social a viable and valuable social space for anyone working in Information Retrieval and related scientific research.
Everyone welcome but expect some level of geekiness on the instance and federated timelines.