Norbert Fuhr's recommendations for gaining scientific knowledge from experiments:

1. Do not use MRR or MAP;
2. Instead of relative improvements, regard the effect size!
3. For multiple significance tests, use a correction, such as Bonferoni or Tukey's HSD (NB comparing only to the 2nd best method does not help!)
4. There are no significant improvements for re-usable test collections! (hypotheses have to be formulated before the work)

5. Ignore results for collections when there are no baseline from independent research;
6. Test collections wear out! Expected maximum result increases with number of runs (on see leaderboards! -- Carterette's SIGIR paper)
7. Conferences and journals need to accept papers with "null" results. (to prevent the busy beaver / the p-hackers) Reproducibility is important: Publish your code and data
8. Evaluation initiatives are important (but they should only run proper measures and methods)

Sign in to participate in the conversation

The "unofficial" Information Retrieval Mastodon Instance.

Goal: Make a viable and valuable social space for anyone working in Information Retrieval and related scientific research.

Everyone welcome but expect some level of geekiness on the instance and federated timelines.