Oops MRR, triggering a strong Norbert Fuhr gets angry feeling.

@arjen but... R-precision = MRR if there's just one relevant result. Angry about precision too?

@djoerd Norbert Fuhr complained about MRR as used in a critical article for the SIGIR Forum, paragraph 2.1:

I think in this specific case, the problem did not assume 1 single answer.

I think that, indeed, averaging the Rprec is problematic too for same reasons, but might be justified probabilistic ally? Not sure...

@arjen Fuhr's concerns are interesting and relevant, but if my user model assumes: "go down the rank list until you find the first relevant document, then stop" (which is not unrealistic) then what should I measure?

@djoerd you could accumulate number of docs accessed (and minimize that) instead of doing the reciprocal, just as one example! ESL would be interesting.

