Thread - LMI Players' Rating System

Posts: 774

Country : India

rakesh_rai posted @ 2011-07-20 8:38 AM

euklid - 2011-07-20 12:55 AM

P.S.: Thanks Rakesh that you are updating the rating numbers VERY fast after each test!

As mentioned sometime back by Deb, we have managed to automate the ratings calculation process now, as a result of which we are able to generate the ratings faster.

@ 2011-07-20 10:59 AM (#5243 - in reply to #5242) (#5243) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-07-20 10:59 AM

jhrdina - 2011-07-17 3:40 AM

(3) Base the calculation of NS on linear extrapolation between 0 and median point. It means to use the same calculation for all the players even above median.
E.g. median is 50 points and the top two players have 112,5 and 100 points. In the current system they would get NS 1000 and 900 respectively. So the second player would be heavily penalized. My suggestion, would be to use the give them NS 1125 and 1000 instead.
So there would be no upper limit on NS and NS would be always calculated in relation to median performance.

The only condition to keep the system fair is that each competition should have no upper limit on points. So there should always be some time bonus for saved minutes.

Thanks for sharing your views, Jiri.

We had thought of a similar system earlier. But, with no upper limit, it does not work out well in terms of consistent results. And, if we use the linear extrapolation, it will come up with randomly high numbers. For example, the median in Nikoli Selection was 150. So, a score of 492 can translate to something like 3000+ on a rating scale of 1000. With an upper limit in place, the results are better. It can also serve as a quantifiable goal/target for the top solvers. And, time bonus is (mostly) already included in scores. So we should not try to duplicate its effect.

@ 2011-07-20 10:59 AM (#5244 - in reply to #1357) (#5244) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-07-20 10:59 AM

forcolin - 2011-07-12 7:05 PM

what I think is that you have implemented a very good system.
There may be (small) margin of improvement, just bear my opinion in mind alongside with the one of others; and, if there is enough people which say a certain thing and justify a variation, do it.
But I also believe that you cannot change the mechanism after every contest, so once an equilibrium point has been achieved the system should be left running without major perturbations.

jhrdina - 2011-07-17 3:40 AM

But I think that the system is great even as it is and I agree with forcolin that it should not be changed too often. You may let it run for few more months and apply the changes (if desirable) e.g. from the beginning of new year.

euklid - 2011-07-20 12:55 AM

I prefer the current rating system over all the changes (1),(2),(3) proposed above.

As mentioned by all of you, we'll keep it stable for a long enough period. And review for further improvments after that.

@ 2011-07-23 5:06 PM (#5256 - in reply to #5243) (#5256) Top

jhrdina

Posts: 9

Country : Czech Republic

jhrdina posted @ 2011-07-23 5:06 PM

rakesh_rai - 2011-07-20 10:59 AM

You are right. I had a look at some previous competitions myself and I have to admit that calibrating points on median only is not enough. The top player points would be too volatile. There will always be some objections, but the current rating system looks fair enough.
Thanks
Jiri

@ 2011-08-01 12:17 PM (#5301 - in reply to #1357) (#5301) Top

Posts: 1869

Country : India

debmohanty posted @ 2011-08-01 12:17 PM

forcolin - 2011-07-12 3:25 AM

In my opinion the current system can be affected negatively by some individual factors. For example, if a player gets a extremely high score (the case of deu in the Nikoli contest) all the score of the mid-range players will be low. in my case, I scored approximately 50% of his points, so although my score is higher than the median value my PS is relatively low compared to other contests. My RS is higher but this accounts only for 25%. The conclusion is that my NS for this particular contest, in which I believe I played better than my average performance, is in fact lower than my current rating, as opposed to other contests in which perhaps I playes more badly but either there was no such a uncommon performance from a single player or the mechanism of score was less rewarding towards high scorers.

Stefano

Same problem in Magic Cube. motris winning by a huge margin

@ 2011-08-01 11:01 PM (#5308 - in reply to #5301) (#5308) Top

euklid

Posts: 28

Country : Austria

euklid posted @ 2011-08-01 11:01 PM

Again I am wondering about my rating. Before the Magic Cube test I think I had 717 rating points and 717 as my "Best Rating". Now I still have 717 rating points but 720 as my "Best Rating" (according to my stats page).

According to my calculations I should have 720 rating points right now. Perhaps there is some calculation error because I am a "very regular player" with K>U now. My weakest test (Twist) thus has weight 0 and my second-weakest test (the most recent Magic Cube) has a weight of 0.8 instead of 1.0. But perhaps there is a mis-calculation on my side...

Have fun,
Stefan

@ 2011-08-01 11:08 PM (#5309 - in reply to #5308) (#5309) Top

Posts: 1869

Country : India

debmohanty posted @ 2011-08-01 11:08 PM

For once, LMI calculations are not wrong. Your calculations are also not wrong.
Your Ratings after Nikoli2011 is 717. This is the current rating.
Your ratings after Magic Cube will be 720 (as you can see here - http://logicmastersindia.com/forum/lmi/ratings/?test=M201107P2), but it is not made the 'current' rating yet .

@ 2011-08-01 11:31 PM (#5310 - in reply to #5301) (#5310) Top

Posts: 199

Country : United States

motris posted @ 2011-08-01 11:31 PM

debmohanty - 2011-07-31 11:17 PM
Same problem in Magic Cube. motris winning by a huge margin

It's not a bug, it's a feature.

If you look at almost all of the tests here, having the lead score 10% above the rest is not that uncommon (and if you look at puzzle result distributions on a site like croco-puzzles it's not that uncommon there with this same group of competitors).

Then there are tests like FLIP or Puzzle Jackpot where you can't easily separate the top two, but there is still a rather large gap to the next spot(s): 25% to 3rd in FLIP, 25% to 5th in Puzzle Jackpot. The truth is whenever H. Jo or uvo or someone else has a really good day, there can be a phenomenal score or two.

I think a consistent rating needs to be fair for solvers at the top and in the middle. The top is suited ok with the system as is with absolute top score contributing a lot more than rank score, rewarding a very good performance. The middle is suited ok by using the median score as the measure of 500, instead of actual score/top score which is where the median might drop to 300 or 400 on an exceptional test day. And the rank score further brings back the front-runner a bit and also separates ties by score (but not time) in the middle.

I think drastic changes from the current formula will result in a worse ranking system either for the top, or for the middle, based on the test data we have for the last 15 months. At most, I might like to see modeled what would happen if a third inflection point was built into the system, perhaps at the 1st standard deviation above and below the median, equaling another set of fixed score points. My guess is that the middle and top are stable but players at 70-85% in rank are more affected by test to test variation outside of their own performance.

Edited by motris 2011-08-01 11:44 PM

@ 2011-08-02 12:58 AM (#5312 - in reply to #1357) (#5312) Top

MellowMelon

Country : United States

MellowMelon posted @ 2011-08-02 12:58 AM

As far as individual tests are concerned, I think the rating system is fairly well-balanced given what was stated above. Where I think the problem enters is missed/excluded tests (excluded meaning the system for regular players to drop their low scores), because the variability at the top makes it so that an "equivalent" performance on two tests can result in very different ratings for people above the median. The rank score is far more consistent as an indicator across tests. I understand and agree that a good performance should be rewarded, but I think the rank score also plays an important role to the point that it's not getting the weight it should.

The biggest effect of this is probably on the 600-800 range, as you say, since for the top when a test goes your way you're generally contributing to lopsided rankings, and most of the people at the top of the ratings list play regularly. But it still has some effect; I think uvo was lucky to miss the recent Nikoli test, which had the steepest score gradient at the top that I can recall (incidentally he's the one right ahead of me right now; sorry for the personal groaning).

(EDIT: I managed to word this in a way that said something different from my intention. I meant that if one had to miss one test, that would have been the one. Of course, the data below suggests that may have been off anyway. Bit hasty on my part.)

Edited by MellowMelon 2011-08-02 6:42 PM

@ 2011-08-02 4:16 AM (#5314 - in reply to #5312) (#5314) Top

Posts: 199

Country : United States

motris posted @ 2011-08-02 4:16 AM

MellowMelon - 2011-08-01 11:58 AM
I think uvo was lucky to miss the recent Nikoli test, which had the steepest score gradient at the top that I can recall (incidentally he's the one right ahead of me right now; sorry for the personal groaning).

I strongly disagree with this perception. No one is "lucky" to not have taken a test. I expect uvo may easily have claimed a high spot as he had on the first Nikoli Selection by being 4th.

So, was that test (Nikoli Selection) really an outlier and if so why? Consider these numbers. This is the prorated score (PS) for 2nd, 3rd, 5th, 7th, and 10th on the last 6 puzzle contests in no particular order.

2/3/5/7/10
A - 890, 882, 866, 780, 751
B - 951, 924, 883, 771, 717
C - 911, 795, 749, 735, 716
D - 905, 884, 761, 735, 709
E - 919, 905, 790, 771, 710
F - 909, 906, 884, 813, 742

Does the Nikoli Selection stand out on this list? Yes, if you look exactly at 3rd place. But not as much at 5th and certainly not at 7th or 10th place. There is a range, but nothing that strikes me as really large differences in PS that are outside general performance variation. 4 tests are around 710 at 10th place. Two are higher at 750. The outliers are probably A and F for being so flat.

Why does the Nikoli Selection have a large scoring problem for 3rd - 7th that resolves afterwards? I've proffered my rationales before but I'll do so here again. The best measure of a group of solvers outside of exact time to finish is their points earned/minute rate. No matter what the test length (2 minutes, 2 hours, 2 days, 2 weeks, ...) if you are measuring this rate you will eventually get an accurate measure of their ability removing the puzzle to puzzle variance. When a person "finishes" a test, you have an absolute measure of that solver's points earned/minute rate and hopefully the extra time is rewarded close to this number.

The problem with Nikoli Selection was two-fold:
First, for the top 10 solvers, but not for the larger field, the test had two different parts with very different points/minute rates. The non-traditional "marathon" bonus system was not very granular at the top, and a lot of solvers got much less value for finishing early unless they were the right increment of fifteen-twenty minutes away from total test time for a big bonus jump. Further, it was even more valuable to solve 2 or 3 of these than just 1. Even so, the point/minute rates were grossly compressed for H. Jo and myself. In the first ~45 minutes, he earned 280 points for finishing the main test. In the last 45 minutes, he earned 170 + 42 time bonus points for finishing three marathons, about 75% the original rate on the test everyone else was being compared on. My second half was 105+46 or about 50% of my original rate. I'd suppose those who got only 1 (or 0) marathons out were much lower even than this.

Second, the "main test" was being scored with a time bonus that would, if it were for the whole test, have worked ok except for being (1) slightly undervalued and (2) lost by basically everyone from 5th to 9th (who would have normally filled 3rd to 6th) for making a typo in a main puzzle answer key. Answer keys they did not check as well because there was something else to move onto unlike every other test. So a 20 point puzzle error + 30 time bonus point loss was a more significant penalty (11% max score value) than it usually is. But the performance gap falls aways from 10th and on and the test resembles others in its curve because for everyone else, there was no point/minute drop-off of transitioning from sprint to marathon territory because there was no marathon territory to deal with. The problem is not H.Jo (or on other tests me or uvo or psyho or whomever). The problem this time was the test/scoring structure that depressed the points earned for the top 10 universally and most pronounced for 3rd through 10th compared to what the usual test would look like. In other words: a compressed score range (higher relative median score), but less realized bonus for 3rd to 10th place, equals the scoring crunch which is coming from both the 1000 mark and the 500 mark.

Edited by motris 2011-08-02 5:31 AM

@ 2011-08-02 2:29 PM (#5316 - in reply to #1357) (#5316) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-08-02 2:29 PM

Updated LMI Puzzle Ratings after Magic Cube (July 2011 LMI puzzle test #2) are now available.

The Top Five: 1. motris, 2. deu, 3. uvo, 4. MellowMelon, 5. Nikola

India Top Five: 1. Rohan, 2. Rakesh, 3. Rajesh, 4. Amit, 5. Jaipal

@ 2011-08-03 12:53 AM (#5325 - in reply to #1357) (#5325) Top

Posts: 199

Country : United States

motris posted @ 2011-08-03 12:53 AM

For the goals of keeping a consistent rating, can I make a request that test authors keep control of the base scoring of tests (and twists like Puzzle Jackpot and Twist and such are ok as this is the base value), but that LMI admins set the value of time bonus (which is really just serving a role of standardizing scores from month to month) with a fixed standard for all tests? When set by authors, we get such a wide wide range of these values which are really problematic for comparing tests. The upcoming sudoku test has a 3x value for time bonus compared to puzzle minutes, so finishers will benefit excessively. Past tests have had closer to 0x value for time bonus compared to puzzle minutes, so finishers are punished excessively. Is total value/total time that hard a ratio to fix? Time bonus will never affect the ranks in a given test (which is why 1000 points a minute or .0001 points a minute could be chosen by test authors if they wanted) but it does affect the yearly rankings which is why it should not be a variable in the constructor's mind at all.

@ 2011-08-03 4:49 AM (#5326 - in reply to #5325) (#5326) Top

Posts: 1869

Country : India

debmohanty posted @ 2011-08-03 4:49 AM

Very valid argument, we will do that.

For Sudoku City, Nikola had asked me to review the bonus system, but I had missed it completely. I'll get it rectified. Thanks for bringing it up. And it indeed helps when the anomalies are brought up before the test itself, so that we can get it fixed as needed.

@ 2011-08-03 12:52 PM (#5328 - in reply to #5310) (#5328) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-08-03 12:52 PM

motris - 2011-08-01 11:31 PM

I think a consistent rating needs to be fair for solvers at the top and in the middle. The top is suited ok with the system as is with absolute top score contributing a lot more than rank score, rewarding a very good performance. The middle is suited ok by using the median score as the measure of 500, instead of actual score/top score which is where the median might drop to 300 or 400 on an exceptional test day. And the rank score further brings back the front-runner a bit and also separates ties by score (but not time) in the middle.

I think drastic changes from the current formula will result in a worse ranking system either for the top, or for the middle, based on the test data we have for the last 15 months. At most, I might like to see modeled what would happen if a third inflection point was built into the system, perhaps at the 1st standard deviation above and below the median, equaling another set of fixed score points. My guess is that the middle and top are stable but players at 70-85% in rank are more affected by test to test variation outside of their own performance.

Very nicely put.

Just one question: when you say 70-85% in rank, do you mean those ranked 71st-85th (out of 100) or those ranked 15th-30th? And, I would also like to know the degree (slightly/moderately/heavily) to which they are affected, so that we can think of further improvements at some point of time.

@ 2011-08-04 10:53 AM (#5336 - in reply to #5328) (#5336) Top

Posts: 199

Country : United States

motris posted @ 2011-08-04 10:53 AM

rakesh_rai - 2011-08-02 11:52 PM

By 70-85% in rank I meant those with a rank score around 700 to 850. So 15th to 30th place in your out of 100 example.

@ 2011-08-08 7:50 PM (#5371 - in reply to #5316) (#5371) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-08-08 7:50 PM

Updated LMI Sudoku Ratings after Sudoku City (August 2011 LMI sudoku test) are now available.

The Top five: 1. motris, 2. nikola, 3. purifire, 4. deu, 5. misko

India Top five: 1. Rishi, 2. Rohan, 3. Rakesh, 4. Gaurav, 5. Sumit

@ 2011-08-24 11:14 AM (#5453 - in reply to #1357) (#5453) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-08-24 11:14 AM

Updated LMI Puzzle Ratings after Japanese Puzzle Land (August 2011 LMI puzzle test) are now available.

xevs is the only new name in the Top 10.

Rohan Rao at #43 is the top ranked puzzler from India.

@ 2011-08-30 11:50 AM (#5488 - in reply to #5244) (#5488) Top

anand2100

Posts: 1

anand2100 posted @ 2011-08-30 11:50 AM

How to know my LMI rating?

@ 2011-08-30 12:14 PM (#5489 - in reply to #5488) (#5489) Top

Posts: 1869

Country : India

debmohanty posted @ 2011-08-30 12:14 PM

anand2100 - 2011-08-30 11:50 AM

How to know my LMI rating?

Didn't you register at LMI few minutes back?

@ 2011-09-05 5:51 PM (#5542 - in reply to #1357) (#5542) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-09-05 5:51 PM

Updated LMI Puzzle Ratings after Sprint Test (September 2011 LMI puzzle test) are now available.

MellowMelon sprints to the third place.

@ 2011-09-14 5:48 PM (#5631 - in reply to #1357) (#5631) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-09-14 5:48 PM

Updated LMI Sudoku Ratings after Crazy Arrows (September 2011 LMI sudoku test) are now available.

532 players representing 50 countries are now a part of the ratings. motris and nikola remain #1 and #2 repectively, and deu moved up to #3.

Amongst the heavy gainers, ppeetteerr gained 203 rating points to move to #40, and tarotaro gained 164 rating points to move up to #76.

USA has 6 players in the Top 50, while Japan and India have 5 each. Germany has 4 players in the Top 50.

@ 2011-10-09 9:08 PM (#5778 - in reply to #5631) (#5778) Top

Posts: 774

Country : India

rakesh_rai posted @ 2011-10-09 9:08 PM

Updated LMI Sudoku Ratings after A or B (October 2011 LMI sudoku test) are now available.

motris inched closer to the 1000 mark, gaining one point to reach 993. deu and Nikola exchanged places and are within a point of each other now. jaku111 and uvo, both made major gains to enter the Top 10. Four of the top 10 have palindromic ratings (which read the same whether read from left or right).

TiiT (palindrome again) who finished 2nd in A or B, gained 36 rating points to move to #15. Gotroch has a rating of 888 (which is a flush in poker). Hunsudoku, tarotaro, SCORPPROCS (another palindrome), tilansia and spelvin all gained over 100 points and should be rapidly moving towards the top in the coming months.

Among players from India, Rohan moved to within 2 points of 900 and is ranked 8th now. Prasanna gained 27 points.

@ 2011-10-17 10:25 AM (#5818 - in reply to #1357) (#5818) Top

Posts: 1869

Country : India

debmohanty posted @ 2011-10-17 10:25 AM

Ratings are updated after Double Decathlon - Link
I would wait till Rakesh double checks once and make it official.

@ 2011-11-05 8:37 AM (#5917 - in reply to #5818) (#5917) Top