Hannah Bast's slides on the European Symposium on Algorithms 2018 Track B experiment (two independent program committees decided on the same set of papers and then the conference accepted the union of their acceptances): http://ad-publications.informatik.uni-freiburg.de/ESA_experiment_Bast_2018.pdf
Some conclusions: the initial scoring is remarkably consistent, and per-paper discussions to reconcile differences of scoring are useful, but the final decision on which "gray zone" papers to keep is random and could be replaced by a simple threshold.