Thread - Logirace — LMI July Puzzle Test

Posts: 199

Country : United States

motris posted @ 2012-07-24 11:38 AM

MellowMelon - 2012-07-23 8:43 PM

With all loop puzzles ... it would have been harder to give me a better gift from a competitive standpoint...

I was indeed expecting a TVC like result from you on this contest, so I am glad to have achieved at least a runaway second place, even if still clearly not as fast as first. I'm fortunate to have even found any time this weekend to take the test.

On instant grading: My most recent thinking for the penalty is to have it be high, but give solvers one cheap miss. It probably wouldn't work well on a test like this with a 6 point puzzle, but with something like my two value point system, say 20 or 50 per puzzle, I'd say 5 points for first mistake, then 15 points for each subsequent mistake. I think 15 points (~1.5 minutes of test value) is high enough that I would have checked a bit harder before the second very quick resubmit on a still not fully retweaked Dotted loop. Your comment suggests about the same.

@ 2012-07-24 11:31 PM (#8008 - in reply to #7813) (#8008) Top

Posts: 315

Country : The Netherlands

Para posted @ 2012-07-24 11:31 PM

I think the bonus system is good for varied rounds. But I think in these single source or type tests it can cause these runaway performances. Like here with all Loop puzzles, or for example all Nikoli puzzles. Extremely high bonuses can become a bit dangerous. Only 10 people ended up coming within 50% of Palmer's top score, which is a bit over the top.

I like Instant Grading in some sense, mostly to avoid losing points on entry key errors (although I do hate waiting 20 times for the score to turn green, I still dread making a mistake). I would also opt for a 2 value penalty system, but I would opt for differentiating between a typo and a solving error. And I think the cutoff would fit nicely at 60 seconds. So a fix within 60 seconds would be 5 points or 20%(which is what we now use for errors i think) and a fix afterwards would be 15 points or 50%. Of course people should really avoid hitting submit all then.

Edited by Para 2012-07-25 12:07 AM

@ 2012-07-25 1:48 AM (#8009 - in reply to #7813) (#8009) Top

Gareth

Posts: 17

Country : United Kingdom

Gareth posted @ 2012-07-25 1:48 AM

One of the great things about the instant grading is that if you misunderstand the key you can still get points because you have a chance to realise; but it can take a few minutes, not a few seconds, to re-read the instructions and try to see what you've misunderstood. I can see that at the top level it's possible to somewhat hammer the instant grading for extra speed, but for the average player (which by definition most of us are) this is less significant and I would suggest the advantages to the many should outweigh the disadvantages for the few. Time limits to re-submit would penalise average players even more than better players too, since they would inevitably take longer to fix their errors in any case...

So I would say even if it's not a perfect system for comparing the top players, the fact that it makes the tests much more inviting for the vast majority of other users is a very good reason to stick to something close to the current system. For this to remain true the penalty needs to stay relatively small - this is particularly important to solvers who don't get through many puzzles in a test. And I think time limits would excessively penalise less experienced players (or people like me who get the Masyu key wrong and take 5 minutes to work out why...)

Edited by Gareth 2012-07-25 1:52 AM

@ 2012-07-25 7:14 AM (#8012 - in reply to #8008) (#8012) Top

Posts: 199

Country : United States

motris posted @ 2012-07-25 7:14 AM

Para - 2012-07-24 10:31 AM

I think the bonus system is good for varied rounds. But I think in these single source or type tests it can cause these runaway performances. Like here with all Loop puzzles, or for example all Nikoli puzzles. Extremely high bonuses can become a bit dangerous. Only 10 people ended up coming within 50% of Palmer's top score, which is a bit over the top.

Could you please give a little more indication of which tests are ok for you? I don't know what you are excluding when you say "all Nikoli puzzles". That seems to me to be a very broad mix of puzzle genres (not just one, like Fillomino-Fillia), constructed with elegance and logical solutions. I would not want to discourage tests with a broad range of puzzles, even if using classic styles and approachable puzzles means there are 3-4 solvers in the world who can have a remarkable time as deu does on a Nikoli Selection (3rd place at 72% of his score) or, with deu test-solving, me on Japanese Puzzle Land (2nd place at 79% of my score).

For example, does your description also include tests with "all Serkan puzzles" (Best of LMI Puzzle Tests was a runaway by Palmer, 10th at 63% of his score)? Does it include "all Deb puzzles" (Twist, with me finishing 30 minutes before second place was one of the largest runaways ever, and even without proper bonus 10th was 60% of my score)? Do you really mean single source in the way you seem to imply? I think your problem is not with the bonus system per se, but with dealing with the fact the best solver may indeed be twice as fast as the eleventh place finisher if it takes him 60 minutes to finish, and the player in 11th has 2 puzzles to go, both high value ones.

The most objective measure of two solvers is the total time they take to complete the test. Here only 6 completed, and 6th place took 40 more minutes than Palmer. 60/100 = 60%. That the scores earned (991.4/1665.3) also equals 60% is a feature, not a bug of the system. Palmer just solved that well. I'll welcome a more developed argument that normalization should be made to a solver besides #1 for the monthly rankings -- I'm not sure I'll agree -- but I will not accept an argument that a bonus system like each extra minute = 5 points is better. 991.4/1200 does make Kota much closer to Palmer, but he still finished 40 minutes slower. At the end of the day, that should mean something. This is the right bonus system to get raw score. Let's debate what to do with the raw score (and rank) to get the fairest annual ranking. In other words, the right discussion is on the ranking formula, not the bonus system, because in the fairest evaluation there were only 10-12 solvers likely to finish within 2x of Palmer's time.

Edited by motris 2012-07-25 7:46 AM

@ 2012-07-25 7:26 AM (#8013 - in reply to #8009) (#8013) Top

Posts: 199

Country : United States

motris posted @ 2012-07-25 7:26 AM

On the Instant Grading discussion:

I agree that it seems possible to consider typo versus puzzle error, on a case by case basis, as a test administrator, and give a "small" or "big" penalty. But there are many gray areas where it is not possible. For my Dotted Loop mistake, I entered 6 instead of 7 for the last row. This is a possible typo or miscount, as it is one off from correct. It also can come from a wrong loop with a touching segment. I don't know how, for this puzzle, to judge an answer ending with a 6 as just a typo, when an error in the solved form seems as likely or more likely. So I prefer to treat wrong as wrong.

I also hesitate, for the reasons Gareth points out, to use a strict time cut-off to try to separate one from the other. Having to spend extra time sorting it out seems extra penalty enough in combination with the flat cost of submitting something wrong.

So to me, the question is should each wrong submission be a fixed percentage of puzzle value, or a fixed point value, and I lean to the latter. But I think you have to treat wrong as wrong and nothing "cute" to separate particular kinds of answers can be fairly and uniformly done on these kinds of tests.

@ 2012-07-26 2:09 AM (#8019 - in reply to #8012) (#8019) Top

Posts: 315

Country : The Netherlands

Para posted @ 2012-07-26 2:09 AM

motris - 2012-07-25 7:14 AM

I have no problems with the tests. I enjoy all types of tests. I just meant that these types of tests can facilitate certain runaway performances, which maybe doesn't suit this bonus system. For that reason the single themed rounds in WPC's are usually shorter rounds with lower total scores, so the bonus can't get too big. If you were to use this bonus system in such a round, the bonus could get ridiculously big. Hideaki's bonus for Nikoli selection would have gone towards 100% of the total score as he saved 42 minutes on 90 minutes. Your bonus would have gone over 100% as you saved 46 minutes. At least that is my interpretation of the score page. I feel bonuses shouldn't be able to get this high. So my point was using your bonus system for those types of tests, might not be the best idea. I don't have a problem with holding these types of tests. I just brought up Nikoli, as I remember Hideaki and you doing particularly well on those Tom's last Nikoli selection. And that is very much caused by a high exposure to Nikoli genre puzzles and being far much trained in those genres. You won't find that many practise material in all Deb, Serkan or Naoki Inaba puzzle genres. That's really why I singled out Nikoli, as it makes a lot of difference.

So my main point is really that the bonus shouldn't get too big. I think it would be better if a big score difference comes from solving more puzzles than getting a really big bonus.

@ 2012-07-26 2:26 AM (#8020 - in reply to #8013) (#8020) Top

Posts: 315

Country : The Netherlands

Para posted @ 2012-07-26 2:26 AM

motris - 2012-07-25 7:26 AM

On the Instant Grading discussion:

I agree that it seems possible to consider typo versus puzzle error, on a case by case basis, as a test administrator, and give a "small" or "big" penalty. But there are many gray areas where it is not possible. For my Dotted Loop mistake, I entered 6 instead of 7 for the last row. This is a possible typo or miscount, as it is one off from correct. It also can come from a wrong loop with a touching segment. I don't know how, for this puzzle, to judge an answer ending with a 6 as just a typo, when an error in the solved form seems as likely or more likely. So I prefer to treat wrong as wrong.

I also hesitate, for the reasons Gareth points out, to use a strict time cut-off to try to separate one from the other. Having to spend extra time sorting it out seems extra penalty enough in combination with the flat cost of submitting something wrong.

So to me, the question is should each wrong submission be a fixed percentage of puzzle value, or a fixed point value, and I lean to the latter. But I think you have to treat wrong as wrong and nothing "cute" to separate particular kinds of answers can be fairly and uniformly done on these kinds of tests.

I understand that. But your own reason to introduce instant grading was because you hated it that you would click submit bonus and then found out there was a small counting error on your point in the entry key. I'm not against instant gradind, it's just always been a problem with me that instant grading doesn't differentiate between the 2. I always claim my entry key errors when there's no instant grading and don't claim when there was a solving error. In my opinion instant grading is all about getting the points you deserve and I don't think making a solving mistake deserves the same amount of point as counting wrong in the entry key. It might not be the easiest to implement, but it is fairer.

@ 2012-07-26 6:05 AM (#8021 - in reply to #8020) (#8021) Top

Posts: 199

Country : United States

motris posted @ 2012-07-26 6:05 AM

Instant grading actually came about because of the competing desires to fairly rank fast solvers with running tests that many can finish. I think a good LMI test should have at least five finishers, ideally more, meaning it is quite likely that someone is done with over thirty minutes left. Before "Claim Bonus" there was a long wait to just check and check and check work since last submit would equal the final time. Actually handing in is a huge improvement, but now makes the meta choice of how long to check far too heavy in weight when you want a test to be long enough for many other solvers. Instant grading solves this imbalance, just as quick grading at WPC playoffs do, to emphasize solving speed over everything else.

Where we disagree is the view a typo is different from say not marking the last mine in a minesweeper puzzle. In both cases I think the solver is effectively done and slightly inaccurate so they are treated the same. Because you can't run a perfect test to tell them apart, and self reporting will be unfair if not all solvers know to use the system or only half of typos can be identified as such, it is certainly fairest if not most comforting to treat wrong as wrong. The solver gets an X and then proves they know the right answer by getting to resubmit. This is much better than an organizer guessing each and every time.

Edited by motris 2012-07-26 6:07 AM

@ 2012-07-26 12:21 PM (#8024 - in reply to #8021) (#8024) Top

Posts: 1956

Country : India

prasanna16391 posted @ 2012-07-26 12:21 PM

motris - 2012-07-26 6:05 AM
Where we disagree is the view a typo is different from say not marking the last mine in a minesweeper puzzle. In both cases I think the solver is effectively done and slightly inaccurate so they are treated the same. Because you can't run a perfect test to tell them apart, and self reporting will be unfair if not all solvers know to use the system or only half of typos can be identified as such, it is certainly fairest if not most comforting to treat wrong as wrong. The solver gets an X and then proves they know the right answer by getting to resubmit. This is much better than an organizer guessing each and every time.

I agree that a typo is similar to say one mine not put in. I don't think thats what Para means by a solving error. The valid example of where a solving error is helped by instant grading is say when you're done with 60% of a puzzle and you've narrowed it down to two possible paths towards the solution. Now maybe narrowing it down to one requires some difficult logic which you can avoid by quickly trying one path, clicking submit, and if its wrong, trying the other. In this case, just a -5 isn't really enough. Of course going back and solving it correctly will ideally take more time than say the last mine or a typo in the answer key. The suggestion of either second penalty being higher, or of the penalty increasing per second after the first wrong submission, reduces the possibility of someone using this instant grading as a means of solve-checking rather than answer-checking.

Take Palmer's 2 errors for example - He himself has said his solves were mostly instinctive and the instant grading helped as a checker. If the penalty conditions weren't so friendly I feel he may have taken more time to check and been more cautious, and so may not have had as much of a runaway performance.

@ 2012-07-26 9:37 PM (#8030 - in reply to #7813) (#8030) Top

Posts: 199

Country : United States

motris posted @ 2012-07-26 9:37 PM

I've been using the definition of solving error as "what is on the paper would not get points at a WPC." So one mine off is different from a typo which is only an error in sending along you perfect paper solution. A serious puzzle error is certainly different, and even in your hypothetical case I would hope a 15 point (1.5 minute) penalty would be high enough. I think only on my last puzzle, where I did not check that all pentominoes were in and had a tweak to make did I feel I was using the checking in a way a five point penalty was worth it as I would be finished.

@ 2012-07-26 10:30 PM (#8031 - in reply to #8030) (#8031) Top

Posts: 1956

Country : India

prasanna16391 posted @ 2012-07-26 10:30 PM

motris - 2012-07-26 9:37 PM

I've been using the definition of solving error as "what is on the paper would not get points at a WPC." So one mine off is different from a typo which is only an error in sending along you perfect paper solution. A serious puzzle error is certainly different, and even in your hypothetical case I would hope a 15 point (1.5 minute) penalty would be high enough. I think only on my last puzzle, where I did not check that all pentominoes were in and had a tweak to make did I feel I was using the checking in a way a five point penalty was worth it as I would be finished.

If I remember correctly, there was a rule in the last WPC that if the organizers/judges felt that a puzzle was 90% correct, a percentage of the points would be awarded. So while a one-mine-off error may be slightly different from answer key errors, it is however very different from a serious puzzle error. The thing is, while you can't differentiate between answer-key/one-mine-off in terms of time required for correcting it, you can do that for a major solving error. We are trying to bring the online tests as much to the level of an offline event as possible, so at least where the differentiation is possible, I think it should be implemented. Thats why the penalty increasing per second after the first incorrect submission till the correct submission(and (no penalty/just the base penalty) if it is never correctly submitted) makes sense to me. To your end of keeping a 15 point penalty, we can keep the base penalty low, and make sure it increases at a fair rate.

@ 2012-07-26 11:29 PM (#8032 - in reply to #7813) (#8032) Top

Posts: 199

Country : United States

motris posted @ 2012-07-26 11:29 PM

The 90% rule was simply to granting time bonus to the rest of the round, even with the mistake. It has never allowed, to my knowledge, puzzle points directly. (We created the rule for WSC 5 and Hungarian organizers used again last fall.)

@ 2012-07-26 11:42 PM (#8033 - in reply to #7813) (#8033) Top

Posts: 1956

Country : India

prasanna16391 posted @ 2012-07-26 11:42 PM

Directly or indirectly, the point I'm trying to make is that concessions were made when the puzzle was almost correct. Concessions weren't made when the puzzle was mostly wrong. So simply, returning to the topic of penalty, a system that provides less penalty for a 90% correct puzzle or an answer entry, and more penalty for a major solving error should be encouraged as opposed to a flat penalty for all kinds of errors. It may not be perfect, but all I'm saying is its closer to a perfect evaluation than the current system, whether the current system be -5 for all errors, or -15 for all errors.

@ 2012-07-26 11:51 PM (#8034 - in reply to #8030) (#8034) Top

Posts: 315

Country : The Netherlands

Para posted @ 2012-07-26 11:51 PM

I don't think the WPC playoff is an accurate measure to compare this competition to. The WPC playoff is a single round that decides the winner of the competition. It was introduced to make the competition more exciting and add to the spectator element. Palmer himself said he didn't feel like he was the best overall puzzler of the last WPC, as Ulrich outscord in him rounds where he managed to do the best he could. So for judging the overall performance of solvers, a playoff system is not accurate.
This competition seems more like the regular rounds of a WPC. where you have a bunch of variating rounds and the overall score of those rounds determines a ranking. Especially the long Varia rounds at the WPC. You get a lot of puzzles and you try to solve as many as possible and get points accordingly. Making mistakes in those rounds can cost you points, but in the long run a single mistake won't cost you that much as you can make it up.
Sometimes this can happen in a championship as it did to me in Brazil. I missed a single bridge in a Hashiwokakero puzzle in the first round. It was a carelessness by me, caused by me wanting to be extra fast as people were finishing. I lost a lot of bonus in that round. The other rounds weren't very well made to make up that loss. I therefor lost any chance of getting a good finish.

But this situation wont occur in the LMI ranking as all scores are normalised. So all 12 tests that count towards you rating are comparably scored, and no one test will be worth far more than others. So over 12 tests, making a mistake in a single test won't hurt the overall assessment on the LMI ranking. It will only hurt the LMI ranking if you constantly keep making mistakes in your tests. But then that should be something you have to work on and improve. It shouldn't be something you want to fix through the scoring system.
Not making mistakes is a part of puzzling. You don't want to take that out by changing the scoring system. It's comparable to the starting gun in athletics, swimming or speedskating. If you don't react well to the starting gun, you can lose the race and the gold. You might have gone the fastest of everyone, but because you reacted 2 tenths of a second slower than the rest, you miss out on the gold by 1 tenth of a second. So is not making mistakes part of puzzling and making mistakes should cost you points.

Just to make clear why I think the bonus shouldn't get too big in instant grading is this. The penalty was set at approximately 30 seconds of solving time. Except that is only accurate if you solve for 1000 points in 100 minutes. Palmer solved for 1600+ points in 100 minute. So those 5 points are worth less than 20 seconds of solving time. So if he had spent 25 seconds checking, noticed his mistake and fixed it, he would have scored lower than now submitting and then fixing his mistake. But he did beat the 30 seconds by which you defined a penalty as. So this means the better you do, the less time an error is worth. So 30 seconds of solving time in penalty should really have been .5% of the total score. Or to make it more general ([30/"total time of the test in seconds"]*100)%.
I think that a test shouldn't be beaten by more than 25% of the total time. 30 minutes bonus on a 120 minute test should be maximum. 40 minutes on a 100 minute test is just too much. Optimally the top time on a 120 minute test would be 100 minutes for the best solvers in my opinion.

@ 2012-07-27 4:55 AM (#8035 - in reply to #7813) (#8035) Top

Posts: 199

Country : United States

motris posted @ 2012-07-27 4:55 AM

Except his penalty was no longer five points per mistake. It was five points times time correction or about 8.5 points. That is still the right proportion of score, .5% of score per mistake as you say. I'm pretty sure penalty was in the prefactor. If not it should have been.

@ 2012-07-27 5:17 AM (#8036 - in reply to #7813) (#8036) Top

Posts: 199

Country : United States

motris posted @ 2012-07-27 5:17 AM

Maybe the right system, to be fair as Gareth wants for the range of solvers, fast and slow, is a set percentage of final score? If every penalty was 1% or 1.5% of final score (therefore the exact value of a minute or so for that solver) would one bother to check more? Probably. But unlike a flat fifteen points this doesn't excessively subtract from a solver who completes only five or ten of twenty puzzles.

@ 2012-07-29 8:52 PM (#8052 - in reply to #7813) (#8052) Top

sudokufan

Posts: 3

Country : India

sudokufan posted @ 2012-07-29 8:52 PM

I went to the Monthly Sudoku Contest but the puzzle is not showing inspite of clicking on the Reset button and pressing the 'Control' button while left clicking on the mouse. What should I do?

@ 2012-07-30 1:01 AM (#8055 - in reply to #8052) (#8055) Top