Thread - TVC XII — 31st March-2nd April

And I'll have to agree to disagree. I'd rather have an imperfect system with a fixed ceiling than a different imperfect system with a variable ceiling and much greater risk to overweight a given test, particularly when I see the largest problem coming from the range of scoring systems and bonus sizes on the monthly tests which make them a lot more like apples, oranges, and umbrellas than just a pile of apples. Some give partial time bonus for n-1/n correct. Others do not. Some give proportional time bonus. Others do not. Some have 50 finishers, others have one or none. And then there are outliers like my Decathlon test (huge points for last puzzle) or Tom's Nikoli Selection (huge points for puzzles you aren't intended to finish) which are built for huge point differences exclusively for the top 5 or so but for no one else and certainly not the 10th percentile who don't get to the big puzzles. Curve around the Nikoli Selection and I bet it counts as 1.7 tests for H.Jo and 1.5 for me, compared to say the Screen Test. So am I wrong to think you would give H. Jo 1700 in your system? Why should Tom's test be valued more than others, when it is just because of the particular scoring and timing that it became an issue? Imagine those individual marathons were each worth say 50 more points on the Nikoli Selection. The point value was arbitrary. Now H. Jo might earn 2000 points. His relative performance is not changed at all. So if we cannot get objective measures for relative performance uniform across tests, I do not want any system that blows up those performances without fixed bound. I'll accept a "less valuable" 1000 as a result of normalization when a test is an oddball over an artificially valuable 2000 any day. I wouldn't mind curving 800 to the 80th percentile too or something like that. The median is probably too low for the other pivot point, given all 0 tests are dropped anyway.

If I was designing a yearly scoring system from scratch, I would never consider test "points" at all. I would make a system that projected finish times based on puzzle solves/time throughout the test and then use exactly the real and projected finish times for everyone's solving. Some good implementation of instant grading could collect enough time-dependent data to make this modeling fair, and to separate those who are done from those who have entered something wrong, to get a true measure of position in the test. It would be like monitoring runners around a race. I don't need to know beforehand where the hills and valleys are so long as I see some finishers and have a handful of splits. Data makes better scoring easy. We knew a lot more about all the puzzles after seeing the Marathon results than before. Just the number of solvers of each puzzle might be enough data to project things right.

Edited by motris 2012-04-05 5:44 AM


	@ 2012-04-05 5:12 AM (#7098 - in reply to #7097) (#7098) Top
motris Posts: 199 Country : United States	motris posted @ 2012-04-05 5:12 AM And I'll have to agree to disagree. I'd rather have an imperfect system with a fixed ceiling than a different imperfect system with a variable ceiling and much greater risk to overweight a given test, particularly when I see the largest problem coming from the range of scoring systems and bonus sizes on the monthly tests which make them a lot more like apples, oranges, and umbrellas than just a pile of apples. Some give partial time bonus for n-1/n correct. Others do not. Some give proportional time bonus. Others do not. Some have 50 finishers, others have one or none. And then there are outliers like my Decathlon test (huge points for last puzzle) or Tom's Nikoli Selection (huge points for puzzles you aren't intended to finish) which are built for huge point differences exclusively for the top 5 or so but for no one else and certainly not the 10th percentile who don't get to the big puzzles. Curve around the Nikoli Selection and I bet it counts as 1.7 tests for H.Jo and 1.5 for me, compared to say the Screen Test. So am I wrong to think you would give H. Jo 1700 in your system? Why should Tom's test be valued more than others, when it is just because of the particular scoring and timing that it became an issue? Imagine those individual marathons were each worth say 50 more points on the Nikoli Selection. The point value was arbitrary. Now H. Jo might earn 2000 points. His relative performance is not changed at all. So if we cannot get objective measures for relative performance uniform across tests, I do not want any system that blows up those performances without fixed bound. I'll accept a "less valuable" 1000 as a result of normalization when a test is an oddball over an artificially valuable 2000 any day. I wouldn't mind curving 800 to the 80th percentile too or something like that. The median is probably too low for the other pivot point, given all 0 tests are dropped anyway. If I was designing a yearly scoring system from scratch, I would never consider test "points" at all. I would make a system that projected finish times based on puzzle solves/time throughout the test and then use exactly the real and projected finish times for everyone's solving. Some good implementation of instant grading could collect enough time-dependent data to make this modeling fair, and to separate those who are done from those who have entered something wrong, to get a true measure of position in the test. It would be like monitoring runners around a race. I don't need to know beforehand where the hills and valleys are so long as I see some finishers and have a handful of splits. Data makes better scoring easy. We knew a lot more about all the puzzles after seeing the Marathon results than before. Just the number of solvers of each puzzle might be enough data to project things right. Edited by motris 2012-04-05 5:44 AM
	@ 2012-04-06 12:14 AM (#7104 - in reply to #7007) (#7104) Top
Para Posts: 315 Country : The Netherlands	Para posted @ 2012-04-06 12:14 AM I think the main point I wanted to address was to make sure that performances in different tests can be accurately compared as you employ a best 3 out of 4 score system. In the LMI scoring system every test is counted, so when someone has a runaway performance, the difference between other players is still counted towards the standings and still compares the difference between all players. I had to get a normalised score of minimally 770 in TVC XII to beat Hideaki in the overall standings as I had to gain 77 points on him and my lowest normalised score was 693 before that. But it would mean I had to have beaten Hideaki by 230 normalised points in the last test. So if I had beaten him by a whopping 225, I wouldn't have gotten 3rd place, even though I clearly beat him in 2 out of 3 tests, which happened to be the test with the lowest normalised scores. This is the problem I think currently exists and should be dealt with. The best 3 out of 4 system is what causes problems in the current TVC scoring system and should somehow be adjusted in my opinion. The easiest would just be to abolish the best 3 out of 4 system and use all 4 tests for the final standings. Although I assume this was implemented because using all 4 tests had caused problems before.