Thread - LMI Players' Rating System

@ 2011-04-27 4:44 AM (#4273 - in reply to #4268) (#4273) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-27 4:44 AM

motris - 2011-04-26 9:45 PM

So on April Sudoku, I would certainly have just used ~5 points per minute (600/120). On Twist I would have used the same staggered "value of minutes" for the bonus, instead of going down to a nominal .5 points per minute. So I would have earned 1 minute at 100%, 10 minutes at 75%, 10 minutes at 60%, and 10 minutes at 50%. In each case, simply making the time worth the value of those minutes for other solvers allows the relative performance of all finishers to be correctly staggered for your ratings, for UKPA rankings, and so on. Unlike a one-time WPC where it doesn't matter as much, LMI has really become one of the big forums for monthly competition and so consistency is absolutely required in the ranking system.

I agree on the April Sudoku part.

On Twist, I'm probably missing your point. Since the puzzle points were reduced after 90 minutes, we can't give significant bonus after 90 minutes. Otherwise, it will work as double-penalty for others. Again, I could have misunderstood your point.

[ The idea of .5 per minute is to separate solvers solving at 95 minutes vs 99 minutes. ]

@ 2011-04-27 5:32 AM (#4274 - in reply to #4273) (#4274) Top

motris

Posts: 199

Country : United States

motris posted @ 2011-04-27 5:32 AM

debmohanty - 2011-04-27 4:44 AM
On Twist, I'm probably missing your point. Since the puzzle points were reduced after 90 minutes, we can't give significant bonus after 90 minutes. Otherwise, it will work as double-penalty for others. Again, I could have misunderstood your point.
[ The idea of .5 per minute is to separate solvers solving at 95 minutes vs 99 minutes. ]

I'm not asking for significant bonus (meaning "more than the value of puzzles of that time"). I'm asking for the time bonus to scale exactly as the expected point-per-minute value other solvers got did for the "extra" time.

If the standard value (as you chose for time bonus) is ~5 points per minute, then when the rest of the solvers entered 75% value time, and could still effectively earn 3.75 or more points per minute for submitting solutions, the time bonus should also scale to 75% of the value for that time, or 3.75 points per minute. Instead, the time bonus dropped to 10% of its value for the whole extra time and everyone effectively gained in relative score based on extra time despite my large margin of victory by time.

You can view the 90 minute and 120 minute flat results to see that "normal" scoring would have uvo at around 79-80% of my score. The result with only 10% time bonus was uvo at 87% of my score. With a bonus that is instead 3.75 for 10 minutes, 3 for 10 minutes, and 2.5 for 10 minutes, the scores would now be 813.5 versus 641.8 and this would give uvo about 79% of my score. I hope this shows how the balance of results can be preserved with scaling, provided all points scale the same way.

@ 2011-04-27 10:03 AM (#4275 - in reply to #4274) (#4275) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-27 10:03 AM

Thanks for explaining, and I agree with what you are saying.

The current ratings system doesn't take time-to-finish into account. So, it is only ideal that we compensate top solvers by adding appropriate time-bonus in each test.

@ 2011-04-27 10:28 AM (#4276 - in reply to #4275) (#4276) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-27 10:28 AM

Back to the Rating System, what are the different options to normalizes scores in each test so that we can compare scores across tests?
We've some basic rule which works quite well, but would love to hear independent ideas.

@ 2011-04-27 10:31 AM (#4277 - in reply to #1357) (#4277) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-27 10:31 AM

And one more - Rakesh mentioned that we don't consider player's Rank for the ratings.
Is it something we need to consider? In a way, I guess it is all related to normalization.

@ 2011-04-27 1:33 PM (#4279 - in reply to #4277) (#4279) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-04-27 1:33 PM

Sharing some views and some aspects of the current rating system:

(1) Rank: I personally think we should include test ranks also into our calculation. Even though we are ultimately arriving at ratings and deducing the rank from the ratings. Take this case: In a test, X scores 800 and comes 1st, Y scores 620 for 2nd, Z scores 590 and comes 10th. If we use only the score, Y does not get enough mileage from the test (as compared to Z). But if we do include ranks into our scheme of things, Y will tend to get compensated enough.

(2) 0 scores: To me, this is not a matter of great concern in general. However, the solution suggested by motris seems good to me. There have been a few cases in some tests where X has attempted few puzzles and still got zero. So far, we have treated even such cases as "non-participation" when it is actually a zero score after participation.

(3) Players not playing frequently: We have to ensure that such cases get the "right" rating. This is something which is difficult to do. For example, we do not want someone who played one test to jump into the Top 10. So we have built the current logic such that any player will have to play some tests consistently to be where he/she belongs. Again, this "waiting time" should not be too large. If you have any suggestions around this, please do share.

(4) Scores across different tests: If we do want the "time taken" as a factor, we can build the logic such that the scores in the test can be different from the scores used for calculations (using bonus factor = [total points]/[total time] perhaps). I agree that if we take the scores as-is, (e.g April Sudoku test), the performances are not adequately translated into scores at times. But this is also an issue encountered only once so far - in April sudoku and puzzle tests.

(5) Dependency on top score: This is one area where there are going to be definite changes. So far, we are heavily dependent on the top score for rating calculations. And this leads to certain issues during calculations. For example, an 80% score in an easy test like FLIP should not be treated equally with an 80% score in a Zoo type test.

(6) Others: We are also evaluating certain other factors for any effect on the ratings whatsoever -
No of participants in a test (should performance in a 200-participant test be accorded more weight as compared to performance in a 100-participant test),
Quality/Index of participants in a test (should performance in a test where only 3 out of Top 10 participated be treated equally with a performance in a test where 10 out of top 10 participated),
weights to tests (should recent tests carry more weight),
number of tests (how many tests should be considered for ratings - 6/8/10/12/all, or should it be all tests in last 3/6/9/12 months),
bonus (should I get some bonus if I defeat a Top-3 player , or a Top-10 player)

Please feel free to share your views on any of these factors. And, anything else we have missed out.

@ 2011-04-28 8:44 AM (#4281 - in reply to #4279) (#4281) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-28 8:44 AM

One more - How do you not penalize authors / testers for 'missing' the test?

@ 2011-04-28 8:55 AM (#4282 - in reply to #4281) (#4282) Top

purifire

Posts: 460

Country : India

purifire posted @ 2011-04-28 8:55 AM

debmohanty - 2011-04-28 8:44 AM

One more - How do you not penalize authors / testers for 'missing' the test?

You mean penalize them if they do not take part in tests by other authors????

If so then I think that is a bit harsh as at times someone can have a genuine reason not to participate... aprior commitment or a family event or any other legitimate reason under the sun :)

Rishi

@ 2011-04-28 8:57 AM (#4283 - in reply to #4282) (#4283) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-04-28 8:57 AM

I said "Not" penalize.
I just meant that we shouldn't penalize them because they 'missed' their own test. [They don't figure in the score page ]

@ 2011-04-28 9:28 AM (#4285 - in reply to #4283) (#4285) Top

purifire

Posts: 460

Country : India

purifire posted @ 2011-04-28 9:28 AM

debmohanty - 2011-04-28 8:57 AM

I said "Not" penalize.
I just meant that we shouldn't penalize them because they 'missed' their own test. [They don't figure in the score page ]

Oh that way then I agree with you :)

@ 2011-05-11 7:42 PM (#4373 - in reply to #4268) (#4373) Top

Administrator

Posts: 3574

Country : India

Administrator posted @ 2011-05-11 7:42 PM

motris - 2011-04-26 9:45 PM

I'm sure I'm not the only one who will be interested to learn the methodology behind the system as it is a real "world leader board" these days.

We had been working over the last couple of months to come up with a new revamped LMI Players Rating System. And, we are happy to share the details (including the rating calculation mechanism) of the new system with everyone.

The details of the rating system have been captured in a pdf. You can either download it or view it. And, feel free to discuss the ratings in this thread.

As for the new rating list, it will be published after MAYnipulation, for both Sudoku and Puzzles.

@ 2011-05-11 11:37 PM (#4375 - in reply to #4373) (#4375) Top

neerajmehrotra

Posts: 329

Country : India

neerajmehrotra posted @ 2011-05-11 11:37 PM

Administrator - 2011-05-11 7:42 PM

motris - 2011-04-26 9:45 PM

I'm sure I'm not the only one who will be interested to learn the methodology behind the system as it is a real "world leader board" these days.

The V2.0 of rating system looks interesting but needs thorough discussion. I request all the active players of LMI to please comment to make this system more robust.
Kudos to Rakesh Rai for designing the algorithm. I think it takes care of all the variables required for a proper rating system.

Edited by neerajmehrotra 2011-05-11 11:39 PM

@ 2011-05-12 7:34 AM (#4377 - in reply to #1357) (#4377) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-05-12 7:34 AM

As Neeraj mentioned, this system looks like covering all variables, although at the cost of being little complex.
It would help if we can show 3 different cases in action ( players getting advantage, players being penalized ) with some numbers.

@ 2011-05-13 6:45 PM (#4390 - in reply to #4377) (#4390) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-13 6:45 PM

debmohanty - 2011-05-12 7:34 AM

As Neeraj mentioned, this system looks like covering all variables, although at the cost of being little complex.
It would help if we can show 3 different cases in action ( players getting advantage, players being penalized ) with some numbers.

The image shows the working of ratings using a few fictitious players:

- Players A, B and E have played very few tests and are, therefore, penalized. The level of penalty depends on the overall weight of tests played. As they play more regularly, the level of penalty will reduce and ultimately go away.
- Player D has played average number of tests. The rating takes weighted average of NS from all participated tests. This player does not get any benefit nor is he/she penalized.
- Player C is a regular player. This player gets the benefit of only his best performances being considered for ratings.

@ 2011-05-13 7:26 PM (#4391 - in reply to #1357) (#4391) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-05-13 7:26 PM

Am I correct in saying

For a player who starts playing at LMI
1) he needs to play 4 tests before he gets the ranking he deserves? After 3 tests his penalty will be very less though
2) to get benefit of being a regular player, he has to play more than 7 tests consecutively. (after 7 tests K=6.4), and if he misses some tests, it takes longer

Edited by Rohan Rao 2011-05-13 8:14 PM

@ 2011-05-13 7:31 PM (#4392 - in reply to #1357) (#4392) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-05-13 7:31 PM

Also we discussed few posts back in this thread that we'll add an option for players who would want to consider the results to be rated.
It will be implemented starting MAYnipulation

This is how it will look like

Do we still remove 0-scores after that?

@ 2011-05-13 7:33 PM (#4393 - in reply to #4392) (#4393) Top

purifire

Posts: 460

Country : India

purifire posted @ 2011-05-13 7:33 PM

debmohanty - 2011-05-13 7:31 PM

Also we discussed few posts back in this thread that we'll add an option for players who would want to consider the results to be rated.
It will be implemented starting MAYnipulation

This is how it will look like

Do we still remove 0-scores after that?

If someone checks the box allowing the score to be considered then I would say even zero scores should be considered.

Rishi

@ 2011-05-13 7:52 PM (#4394 - in reply to #4393) (#4394) Top

MellowMelon

Country : United States

MellowMelon posted @ 2011-05-13 7:52 PM

There's a slight typo in the image by Rakesh Rai: Player E's base rating should read 1000 instead of 775 (although the final calculation is correct). Wonder who he was based off of? :P

I think considering the 0-scores is okay if you clearly say that near the "Start" button and the check box.

I don't know if I like the current weighting system. For one thing, the simulation. Relative to a 738 rating Player C has had a lot of dismal recent performances in that simulation. I suppose the ratings would eventually reflect that if it continued, but perhaps the "penalty" for those performances should kick in sooner. The fact that they never take effect if he picks his game back up is a feature I'm undecided about.

Another related issue is the following case of my own design: two regular players F and G.
-- F gets four scores around 700, then tanks for a bit and gets four scores around 500, then improves and gets four scores around 900.
-- G is consistently improving. He gets four scores around 500, then four scores around 700, then four scores around 900.
If I understand how the weighted average is calculated correctly (all of this may be moot if not), player F gets the higher rating here, because his 500s that are thrown out have a higher weight so the 900s get emphasized more in the calculation. In my opinion G's performance warrants the better rating.

Both of these issues would be fixed if the weighted average divided by the total weights of the most recent U tests, instead of the weights of the highest scoring tests. But this has its own issue in that it is a very harsh penalty on a recent bad performance. You would not want a test of weight 1 thrown out in this method. Not sure what a fix would be.

@ 2011-05-13 8:15 PM (#4395 - in reply to #4391) (#4395) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-13 8:15 PM

debmohanty - 2011-05-13 7:26 PM

Ami I correct in saying

For a player who starts playing at LMI
1) he needs to play 4 tests before he gets the ranking he deserves? After 3 tests his penalty will be very less though

Yes. For normal players, who do not author/test any tests, this would be true. For authors/testers, there can be cases where playing 3 tests may be enough. And, they are justified to get the benefits of a reduced N.

2) to get benefit of being a regular player, he has to play more than 7 tests consecutively. (after 7 tests K=6.4), and if he misses some tests, it takes longer

Yes. In order for the player's poor performances to be ignored from rating calculations, about 7 tests out of 12 would be needed. The author/tester logic applies here too.

@ 2011-05-13 8:19 PM (#4396 - in reply to #4393) (#4396) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-13 8:19 PM

purifire - 2011-05-13 7:33 PM

If someone checks the box allowing the score to be considered then I would say even zero scores should be considered.

I agree that 0 scores should be considered (manipulated) for ratings from now on. But, since this is the first test after the change being implemented, we'd take a call after the test depending on how the players have adapted to the change.

Edited by rakesh_rai 2011-05-13 8:19 PM

@ 2011-05-13 9:39 PM (#4397 - in reply to #4394) (#4397) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-13 9:39 PM

MellowMelon - 2011-05-13 7:52 PM

There's a slight typo in the image by Rakesh Rai: Player E's base rating should read 1000 instead of 775 (although the final calculation is correct).

Yes. Its corrected now.

Wonder who he was based off of? :P

These are all fictional players. Any resemblances may at best be coincidental.

For one thing, the simulation. Relative to a 738 rating Player C has had a lot of dismal recent performances in that simulation. I suppose the ratings would eventually reflect that if it continued, but perhaps the "penalty" for those performances should kick in sooner. The fact that they never take effect if he picks his game back up is a feature I'm undecided about.

We deliberated if we should keep 12 tests or 8 tests. Ultimately we decided for a longer duration. So the ratings will be based on all performances during this period. And, as mentioned earlier too, regular players are entitled to some benefits - they can afford to have a few bad days, for example. And, I would view player C example as the system allowing regular players to recover too.

Also, these ratings should reflect the whole 12-month period without being too volatile. One bad or good performance should not shake up the the ratings.

Another related issue is the following case of my own design: two regular players F and G.
-- F gets four scores around 700, then tanks for a bit and gets four scores around 500, then improves and gets four scores around 900.
-- G is consistently improving. He gets four scores around 500, then four scores around 700, then four scores around 900.

If I understand how the weighted average is calculated correctly (all of this may be moot if not), player F gets the higher rating here, because his 500s that are thrown out have a higher weight so the 900s get emphasized more in the calculation. In my opinion G's performance warrants the better rating.

Both F and G would get a rating of 822 in this case. But next month, F's 700 and G's 500 go out of the calculations. So G will have a better rating. And so on.

Both of these issues would be fixed if the weighted average divided by the total weights of the most recent U tests, instead of the weights of the highest scoring tests. But this has its own issue in that it is a very harsh penalty on a recent bad performance. You would not want a test of weight 1 thrown out in this method. Not sure what a fix would be.

I get your point and I agree that the ratings are a little slow to reflect the recent performances, but the weights are still an improvement over what we had so far.

Edited by rakesh_rai 2011-05-13 9:40 PM

@ 2011-05-16 6:13 AM (#4424 - in reply to #4392) (#4424) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-05-16 6:13 AM

Do we still remove 0-scores after that?

This didn't seem to work. Either the check box was too small or the purpose of it was not clear.
Most of the players having zero score still have the check box selected.

@ 2011-05-16 11:06 AM (#4427 - in reply to #4424) (#4427) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-16 11:06 AM

debmohanty - 2011-05-16 6:13 AM

This didn't seem to work. Either the check box was too small or the purpose of it was not clear.
Most of the players having zero score still have the check box selected.

So we'll exclude all zero scores from ratings...

@ 2011-05-17 4:48 PM (#4437 - in reply to #4427) (#4437) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2011-05-17 4:48 PM

rakesh_rai - 2011-05-16 11:06 AM

So we'll exclude all zero scores from ratings...

To put in perspective, here are the numbers -
Out of 116 players who started the test, exactly 7 marked that their results shouldn't be considered for ratings. (1 of them got non-zero score)
Of the remaining 109 players, 32 got zero scores.

@ 2011-05-17 6:34 PM (#4438 - in reply to #1357) (#4438) Top

rakesh_rai

Posts: 774

Country : India

rakesh_rai posted @ 2011-05-17 6:34 PM

Updated LMI Puzzle Ratings after MAYnipulation (May 2011 LMI puzzle test), and LMI Sudoku Ratings after the April 2011 LMI sudoku test are now available.

The ratings are based on the new logic shared earlier. Four players find a place in the Top 10 in both lists - motris, deu, nikola and misko.

Overall 487 players (from 45 countries) are included in the sudoku ratings and 425 (from 44 countries) in the puzzle ratings.