Thread - Puzzle Marathon — 21st-29th January

@ 2012-02-01 9:32 PM (#6571 - in reply to #6396) (#6571) Top

forcolin

Posts: 172

Country : ITALY

forcolin posted @ 2012-02-01 9:32 PM

I have done some analysis of the score and bonus system.

First of all, in my opinion the scoring system in this contest was very good because every player had a realistic possibility of gaining bonuses on most puzzles, which means that the rank is very close to the sum of the times obtained in the individual puzzles. There is an exception, the Samurai Sudoku which was much more difficult than the remaining puzzles, and in which only 20 players were awarded bonus points, and which even the best solver (motris) had to drop as his worst result.

Overall, the number of players gaining a bonus was 1108 out of 1927 (57.5%), relatively high, and this is the distribution among the various puzzles.

If we consider the percentage of players which gained a bonus as a measure of the difficulty of the puzzle, we must conclude that the easiest of the puzzle was the Braille wordsearch, with 84% of the submissions gaining a bonus. This was originally indicated as an AVERAGE puzzle. Which means that an attempt to allocate different times (or different bonus thresholds) to puzzle of different difficulties as proposed by detuned, may be affected by wrong evaluation of the difficulty.

I have also analysed the proposal of awarding bonuses only to those players completing a puzzle within a fixed time (30 minutes) from the top solver. The total number of players earning a bonus in this case would be of 786, 40.8%, and the distribution is the following.

In my personal opinion, this system would be much worse. Not only the peculiarity of the Samurai sudoku is not solved (of course, 30 minutes margin on a very though puzzle means much less for a tough puzzle than for an easy one) but the total number of bonusus decreases dramatically, punishing the players earning 10-15 points with a solution time between 40 and 50 minutes. Overall, almost 350 submissions would earn no bonus at all, and this would be concentrated in the middle category solvers. Also, for those players, the average bonus would be reduced therefore the score would privilege a player with a very good time in just one puzzle against a player with decent times overall, and I do not think this is (or should be) the target of this competition.

I have tried to develop a different system. This is based on assigning to the top solver a bonus of, say, 50 points, to define a bonus threshold to n times the time of the top solver, and to calculate the bonus by linear interpolation between these two values. I have prepared 3 scenarios, with n= 4, 5 and 6 respectively. Which means that a player would earn a bonus if his/her time was 4, 5 or 6 times the time of the best solver, or better.

The following distributions are obtained

The total number of bonus scores is 905 (46.9%) for n=4, 1118 (58%) for n=5 and 1275 (66%) for n=6.

All these calculations give a better distribution of the scores among the puzzles, (the number of players earning a bonus on the samurai is now comparable to the other puzzles) and the situation which better approaches the system adopted is for n=5. Of course this system would give different results in terms of final ranking, benefitting mainly those players which had a good time on the Samurai, but not good enough to gain a bonus on it.

The negative consequence is that, with the system adopted for this competition, it was clear that after one hour from starting a player could put the puzzle in a corner to solve it the next day. With this system it could be possible (except for the very first player to start a puzzle) to show a “current bonus threshold” as an indication about when a player could give up, and also as an indication of the level of difficulty of the puzzle as required by Puzzlescot, but this indication may change with the time, as strong players will set up best times.

Overall, I think that the system adopted for Marathon number one has the advantage of being simple, and could be adopted again without variation if only the organizers will avoid to use puzzles with a remarkable difference in difficulty, such as the Samurai Sudoku. If a new system has to be adopted, a calculation based on a bonus threshold of minimum 5 or 6 times the time of the best solver can be an improvement and could allow to use puzzles of different level of difficulty, but I am convinced that very difficult puzzles requiring the average solver much more than an hour to be solved, should be avoided for practical reasons.

Excel Analysis : http://logicmastersindia.com/M201201P/MarathonSolvingTimes_forcolin...

@ 2012-02-02 8:38 AM (#6572 - in reply to #6571) (#6572) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2012-02-02 8:38 AM

Stefano,

Thanks for the details analysis and many insights. The %bonus per puzzle is indeed useful information.

1) About Braille Word Search - Yes, this puzzle was marked as AVERAGE difficulty. For some reason, this puzzle looked scary (to me, and I guess many others). As you can see this puzzle has least number of submissions, even compared to Graffiti which was uploaded 48 hours later.

2) About Samurai - A lot has been said in the forum about this puzzle. All I can repeat that it was a bad choice. It is doubly bad considering that I insisted all authors to make puzzles with 12-18 minutes target time for top solvers for each puzzle.
The low percentage for Kakuro is not really surprising. This is the only classic-Nikoli puzzle and we know that some players are extremely fast in those. (That is also the reason we had exactly 1 classic-Nikoli puzzle)

3) About 5XN or 6XN bonus system - This is really innovative. If the puzzle difficulties are varying a lot, we might have to follow something similar.
But as you mentioned, if there were no Samurai, there is little need for changing the current bonus system. The current bonus system has 2 major benefits
a) it is extremely simple
b) the target for each puzzle is published and is well known

So, in future marathons, we would first make sure that there are no Samurai like puzzles. That solves majority of the problems. It will be impossible to make all puzzles of similar difficulty. But as long as there is no puzzle extremely difficult, we should be ok.

There other points are
1) whether it is fair to compare scores by just adding up individual puzzle times of varying difficulties
2) whether ranks in individual puzzle should be given any importance (like LMI Ratings)
I think this post from motris briefs about these two, but it does not have specific formula.

Thanks once again for your analysis and your suggestions to improve everything that we should.

@ 2012-02-02 9:50 AM (#6575 - in reply to #6571) (#6575) Top

motris

Posts: 199

Country : United States

motris posted @ 2012-02-02 9:50 AM

This is incredible data and I'm glad to finally have something like this in hand. I'm not sure what Stefano is trying to optimize (uniformity of percent achieving bonus? - is this really the relevant parameter?) and I haven't had time to dive too deep into the info myself. But I think the most fascinating graphs so far are just looking at the trends in time for each puzzle across the top 100 solvers and seeing how using the "nth solver" at any point is a good measure of the relative ranking of a puzzle's difficulty.

I've linked to two images, one with a view of the whole test and one with just a view of the first hour which cuts Pentomino, Kakuro, and Samurai from the top 100 solver graph but gives a much better picture of the other puzzles. I think the data establishes a clear order of Tapa < Loop The Loop < Braille/Small Regions < Diff Neighbors/Graffiti < Black and White Loop < Pentomino/Kakuro < Samurai.

Notice that top time is probably the worst of the 100 choices for ranking the difficulty of the puzzles (and therefore the worst to use to normalize by multiplication or other means). Looking at the 10th solver (95th percentile) seems much better though. The top time suggests Pentomino is slightly easier than Black and White Loop. The 10th place (or any spot from 10-100) shows it is a 20-25% harder puzzle than the Black and White Loop for the vast majority of solvers.

These graphs also show me good characteristics to fit to either a rank-based or a normalized scoring model to get all the puzzles back on par with each other. The linear nature actually suggests rank may be best, with perhaps 150 to the top solver, 149 to the second, down to 100 for the 51st and later. For only Samurai, which we agree is too hard, would this system break down at one hour. But I do think you need to treat Kakuro and Pentomino differently from the easiest 6, and maybe even Black and White Loop as well. I disagree that only Samurai was an outlier on this test, and I'll let these graphs speak for themselves on that point.

Edited by motris 2012-02-02 10:14 AM

(top100.png)

(top100-zoom.png)

Attachments
----------------

top100.png (60KB - 0 downloads)
Attachments

top100-zoom.png (87KB - 1 downloads)

@ 2012-02-02 5:20 PM (#6578 - in reply to #6575) (#6578) Top

Realshaggy

Posts: 69

Country : Germany

Realshaggy posted @ 2012-02-02 5:20 PM

First of all thank you for the nice contest.

Beside any data analysis: for me (as mediocre solver) the Sudoku was the only puzzle, that felt a little bit like marathon. All the other ones are just a little bit bigger than usual, which didn't matter, because I could solve one or two per day. If you want to test endurance, I would suggest the following: Give an even longer general time window (maybe four weeks), so that much people can find the time to participate. In this window, you can start the contest at any time, which gives you a 24h-window working like the last contest.

I think a general problem of this contests is the time difference between a top solver and an average solver. In a 2 hour contest, which the best solvers hardly finish, I will get 1/3-1/2 of the points and need maybe 2-3 more hours, if I want to finish all puzzles. If the contest should feel like a marathon for the best, this would mean at least 4-5h for them. But if it aims for "time needed for a fixed amount of puzzles" instead of "finished puzzles in a fixed time" that would mean something like 15 hours for me, which isn't suitable. And if I can do it in different sessions it's not really a marathon for me.

(This reminds me of an interview with an hobby-marathonist which I read a while ago. He said things get easier after you can beat the 3h-mark, because you don't have to run so long, if you're fast enough ;-) )

@ 2012-02-03 11:10 AM (#6581 - in reply to #6565) (#6581) Top

reesylou

Posts: 10

Country : Australia

reesylou posted @ 2012-02-03 11:10 AM

debmohanty - 2012-02-01 3:26 PM

reesylou - 2012-02-01 6:10 AM

I'd really appreciate someone giving a break down of an entry point into Different Numbers - I really struggle with these and got absolutely nowhere with this particualr one.

There is cheeky start to the Different Neighbours at the top right corner.
Note that X has to be 1 or 2, otherwise the top right is not solvable uniquely.

Then transferring the 4 we get that the 2X2 cell can only be 3.

Ahhh.. of course. I used assuming uniqueness in some of the other puzzles, but the Different Numbers type always cause me problems, so I didn't think to use that here - and I unfortunately chose to focus on the bottom left corner.

I'll give it another go with that in mind. Thanks.

@ 2012-02-04 8:56 PM (#6582 - in reply to #6396) (#6582) Top

detuned

Posts: 152

Country : United Kingdom

detuned posted @ 2012-02-04 8:56 PM

This thread is turning into a bit of a monster, but as a point of interest, I've just posted some kakuro thoughts on the UKPA boards:

http://forum.ukpuzzles.org/viewtopic.php?f=5&t=534#p5675

@ 2012-02-04 11:51 PM (#6583 - in reply to #6582) (#6583) Top

macherlakumar

Posts: 123

Country : India

macherlakumar posted @ 2012-02-04 11:51 PM

detuned - 2012-02-04 8:56 PMThis thread is turning into a bit of a monster, but as a point of interest, I've just posted some kakuro thoughts on the UKPA boards:http://forum.ukpuzzles.org/viewtopic.php?f=5&t=534#p5675

I want to say one thing about your Kakuro, it is simply "Beauty and Beast" :).
Beauty : In the way it is designed.
Beast : The toughness in solving.
I am not sure about the break-in as top left as you mentioned, I am sure when I solved this, I solved it from bottom of 'I' on below left and worked to the top.

Regards,
Ravi

@ 2012-02-05 3:22 AM (#6584 - in reply to #6575) (#6584) Top

motris

Posts: 199

Country : United States

motris posted @ 2012-02-05 3:22 AM

I've now gone ahead and played with the scoring model and tried the 150 for 1st, 149 for 2nd, 148 for 3rd, down to 100 for 51st through last finisher. I like this system a lot because it makes all puzzles equal for potential bonus. Each puzzle has a total of 1275 bonus points split between the top 50 finishers. This means an "easy" puzzle will not give too much bonus to too many solvers. A "hard" puzzle will not give too little bonus to too few solvers. Each puzzle has same final value. Obviously, the choice of top 50, and the linear progression of bonus, were arbitrary and can be adjusted for a given test. I kept the best 9 out of 10 approach.

I've attached the stats for the top 20 (yellow shading is the dropped puzzle with rank scoring and I've shaded all tied options where those exist, red font is the dropped puzzle with current scoring). The average rank column is for the top 9 puzzles for that solver, but shows that deu and kota were very close and para and misko also very close in overall performance based on rank across the test. The time scoring had a different rank in these cases.

I have also attached my excel spreadsheet if someone wants to play with this type of system further. I'm already looking at how to use it on my next decathlon test.

Edited by motris 2012-02-05 3:29 AM

(rank-example.png)

Attachments
----------------

rank-example.png (55KB - 1 downloads)
Attachments

rank.xlsx (64KB - 10 downloads)

@ 2012-02-05 5:07 AM (#6585 - in reply to #6396) (#6585) Top

Tablesaw

Posts: 12

Country : United States

Tablesaw posted @ 2012-02-05 5:07 AM

Hello, all. I consider myself a medium-level solver, and this was definitely one of the most exciting tests I've seen here. It's the first test that I've solved all puzzles while the test was running. The fact that time solving time was not a major factor in the test (both in terms of the time alotted to take the test, and the factor that time had in assigning a score) helped to relieve a lot of pressure from solving, making for a more enjoyable experience for me. As solvers talk about different scoring systems, I hope that the appeal of a test like this to the not-top solvers is retained.

I'd like to see that we limit the number of solvers getting no bonus, because when that happens, ties accumulate around the multiples of the base-point per puzzle, and which puts more focus on the time solved.

@ 2012-02-05 6:00 AM (#6586 - in reply to #6396) (#6586) Top

MellowMelon

Country : United States

MellowMelon posted @ 2012-02-05 6:00 AM

That was one of the things that came to mind when I read motris's system. It seems there's four things a ranking system for a test like this has to deal with
1. Getting a proper ordering at the top.
2. Giving out enough bonus in the middle to avoid huge ties.
3. Being able to set a hard cutoff for no bonus (60 minutes here) so solvers that want to be competitive don't have to set aside an indeterminate amount of time to do each puzzle.
4. Being able to throw out a contestant's worst performance in a sensible way. (This basically requires that the top time be given the same amount of bonus for all puzzles.)

I'm inclined to agree with motris that the rank-based system is the best way to do 1 and 4. I've played around with several modifications to his system to try to do 2 and 3 better, but only two ideas don't have anything egregiously wrong with them:

A. Have a double-layered system where the top 25 or so use the system motris has (50-25 bonus points) and everyone else between 25th and the 60 minutes has points assigned by linear interpolation (25-0 bonus points). The interpolation could be done either by rank or by time. Some weaknesses are that it seems needlessly complex and that if there aren't many more than 25 people who solved in under 60 minutes the gradient for the 25-0 range could be inappropriately steep.

B. If a puzzle has N people finish in under 60 minutes, award 201-[rank] points for solvers finishing faster than 60 (so 200 for 1st) and 201-N points for everyone else. Have some floor (50?), a point total which anyone who solves the puzzle is guaranteed, in case close to 200 people finish in an hour. Numbers can be tweaked obviously. One "weakness" of this system is that it gives a different amount of points for finishing each puzzle assuming you cross the 60 minutes mark. But that might be a feature as opposed to a bug, since it will make the hardest puzzles worth much more to finish.

@ 2012-02-05 7:48 AM (#6587 - in reply to #6586) (#6587) Top

motris

Posts: 199

Country : United States

motris posted @ 2012-02-05 7:48 AM

MellowMelon - 2012-02-04 5:00 PM

That was one of the things that came to mind when I read motris's system. It seems there's four things a ranking system for a test like this has to deal with
1. Getting a proper ordering at the top.
2. Giving out enough bonus in the middle to avoid huge ties.
3. Being able to set a hard cutoff for no bonus (60 minutes here) so solvers that want to be competitive don't have to set aside an indeterminate amount of time to do each puzzle.
4. Being able to throw out a contestant's worst performance in a sensible way. (This basically requires that the top time be given the same amount of bonus for all puzzles.)

This is a terrific framework to view the scoring concerns. I chose 50 because it would "work" for 9 of the 10 puzzles, but I could have chosen 100 if I wanted 7 of the 10 puzzles to work, dropping kakuro and pentomino as being too hard too. But I agree a stable system that only rewards bonus to those under an hour is good and I think of your options, (A) is closest to what I might imagine being a good leveraged system. Let's say this:

"For all solvers that finish in under 60 minutes, they will earn a bonus based on their final rank. The first 25 solvers will earn a bonus of 25 points for 1st, down to 1 point for 25th. Also, with M solvers under one hour on a puzzle, the Nth solver will earn (M+1-N)/M * 25 additional points."

That certainly does (3). And while this might seem complex with two different parts, they independently serve to do (1) and (2) fairly. You cannot do just the former and capture (2). But you also cannot do just the latter and fairly do (1), as on a puzzle where many people qualify versus a puzzle where few people qualify, the top tier being 125, 124.8, 124.6 is much different than 125, 124, 123. The half-half system ends up being a good compromise. Taking the best score on 9 of 10 does the last part and your parameters are met.

I've remodeled the scoring with this new melon-like (A) as I specifically restated it. I was curious to see which if any puzzles really broke the scoring. Samurai sort of does as it only has 20 solvers in one-hour bonus zone, but the formula still is basically ok. Also, now only two solvers (158 and 159) tie at 900, and there is a lot more grading of intermediate scores. Give it a look if you are really interested.

rank2.xlsx (hosted on my webspace because of a 100kb rule here.)

Edited by motris 2012-02-05 7:55 AM

@ 2012-02-05 12:10 PM (#6588 - in reply to #6586) (#6588) Top

debmohanty

Posts: 1869

Country : India

debmohanty posted @ 2012-02-05 12:10 PM

Fully agree with Melon's list, especially point 3

MellowMelon - 2012-02-05 6:00 AM
3. Being able to set a hard cutoff for no bonus (60 minutes here) so solvers that want to be competitive don't have to set aside an indeterminate amount of time to do each puzzle.

While we are trying to design a robust system that determines the relative scores / ranks at the top accurately and fairly, it is equally important to keep most other players in mind. In my view, the 60 minutes cut off for each puzzle in this test has been a key parameter for the success of this test. I would always vote for something that a player knows as his target, rather than bonus for top-50 or bonus based on n*top player's time which players don't know when they start solving.

motris' rank2.xlsx captures all the points logically and can be used in future marathons. It might sound complicated when someone reads first time, but for those who are not interested in details, it simply means "you get bonus if you solve within 60 minutes".
Yes, Samurai sort of breaks the scoring. But it is part of organizers' responsibility to have puzzles based on the scoring system in place.

@ 2012-02-05 5:16 PM (#6589 - in reply to #6396) (#6589) Top

Para

Posts: 315

Country : The Netherlands

Para posted @ 2012-02-05 5:16 PM

I guess this system solves many scoring ambiguities. I think the only scenario that is not captured is a TVC V like scenario, where one player is far superior than the rest. Isn't it possible to implement a system that is similar to the LMI Ranking score. There part of the rating is based on ranking and part on actual score, with the top score being 1000. So a system based part on ranking and part of actual time, where the fastest time is a set bonus and 60 minutes is 0 bonus and the rest is scaled.

I think the most annoying part of these grading systems is that you have no clue how you stand opposed to others when you're done solving and your relative rank will shift constantly. You can be ahead of someone when you're done solving and behind them when the test ends. This will especially affect people in the middle I think. I assume there's people in the middle who will try competing against eachother a bit too and they will have no clue if they beat their friend in this test or not till days after they are done. At least I always like to see how I have done against players who are close to me on the LMI rank when i'm done solving.

Edited by Para 2012-02-05 5:17 PM

@ 2012-02-05 5:53 PM (#6590 - in reply to #6584) (#6590) Top

Valezius

Posts: 66

Country : Hungary

Valezius posted @ 2012-02-05 5:53 PM

motris - 2012-02-05 3:22 AM

I've now gone ahead and played with the scoring model and tried the 150 for 1st, 149 for 2nd, 148 for 3rd, down to 100 for 51st through last finisher.

I dont think this system is too fair if somebody win a round with 6 minutes apart ;)

I propose that first position is 50 points bonus, and this is the base of the calculation of bonus points/minute.

For instance if the first's solving time 10 minutes then every minute is 1 point.
If the solving time 15 minutes then 50/45=1,11

So if the puzzle is too easy the bonus will be lower than 1, but in generally it will be higher than 1, in extreme cases it can be almost 2.

Every player still know that if he solves the puzzle within one hour, he gets bonus (and the bonus will be 1-1.5 in most cases).

@ 2012-02-05 7:25 PM (#6591 - in reply to #6563) (#6591) Top

rob

Posts: 172

Country : Germany

rob posted @ 2012-02-05 7:25 PM

Regarding the Different Neighbours puzzle, it's also doable if you miss the uniqueness (I did). I've recorded a possible start . The notes are kind of hard to make out, but you should be able to follow the solve.

@ 2012-02-05 9:27 PM (#6592 - in reply to #6591) (#6592) Top

prasanna16391

Posts: 2000

Country : India

prasanna16391 posted @ 2012-02-05 9:27 PM

In all my struggles with the Different Neighbors puzzle(I took a certain part as correct which was wrong and kept thinking the mistake is elsewhere), I found about 3-4 openings of different complexities. The easiest one is what Deb mentioned but there are other tricks there. I guess if one wants to stare at it and start over about 5 times like I ended up doing, they'll find all of them :\

@ 2012-02-05 11:07 PM (#6593 - in reply to #6589) (#6593) Top

motris

Posts: 199

Country : United States

motris posted @ 2012-02-05 11:07 PM

Para - 2012-02-05 4:16 AM]
Isn't it possible to implement a system that is similar to the LMI Ranking score.

Yes, and I think you've given a great idea for how this system would look, with half of bonus scaling by time and half being flat based on rank. That seems perfectly appropriate for an LMI test comparing 10 puzzles just as it works over the year for comparing 10 LMI tests with each other.

Of course, all systems have problems. The scoring of the Marathon test is a huge outlier compared to the normal monthly tests with time bonus, so it is not a good test for the overall rankings as it raises almost everyone more than usual. In this test, the scoring of three of the puzzles led to much less bonus than the other seven. So we are proposing possibilities to address these issues. I don't think there is any dominant answer here, but there are better and worse approaches and it is good to hear from many solvers and aim for better next time.

Para - 2012-02-05 4:16 AM]
I think the most annoying part of these grading systems is that you have no clue how you stand opposed to others when you're done solving and your relative rank will shift constantly.

Well, this is sort of a problem on all tests (as your rank will only ever fall) but with variable scoring there could indeed be small rank changes when solvers are at values where they are "effectively tied". I don't think anyone can cleanly claim victory when things are this close (like when I beat Ulrich by 1 second!) but the rank score will eventually favor one over the other. My sense though is that these effects will be rather small, as they were during Puzzle Jackpot when final scoring wasn't known until all solvers had completed. I could do subsampling analysis to be sure, but I think using rank and not time you will have greater stability. And until results are finalized, you can always just use relative performance for "bragging rights".

"I beat you on 6 of 10 puzzles!"
"Yes, but I beat you by 5 minutes overall!"

Like many things in sports, there aren't always winners but there is always debate.

@ 2012-02-08 5:44 PM (#6635 - in reply to #6591) (#6635) Top

reesylou

Posts: 10

Country : Australia

reesylou posted @ 2012-02-08 5:44 PM

rob - 2012-02-06 12:25 AM

Regarding the Different Neighbours puzzle, it's also doable if you miss the uniqueness (I did). I've recorded a possible start . The notes are kind of hard to make out, but you should be able to follow the solve.

Wow. Thanks for that video... I now have a better understanding of how to make limiting assumptions on possibilities. Seeing the thought process unfold just made it click :)

@ 2012-12-28 3:39 PM (#9262 - in reply to #6396) (#9262) Top

poonamc306

Posts: 2

Country : India

poonamc306 posted @ 2012-12-28 3:39 PM

Really this is looking interesting. I have never seen such kind of game. I would like to participate in this amazing and different game. I like such kind of things really. And i think this would be knowledgeable. So any one can tell me how i can be the part of this game.