x64 ArchLinux : AMD® Ryzen 7 4700U®

Each chart bar shows *how many times slower*, one ↓ **fannkuch-redux** program was, compared to the fastest program.

These are not the only programs that could be written. These are not the only compilers and interpreters. These are not the only programming languages.

Column × shows *how many times more* each program used compared to the benchmark program that used least.

sort | sort | sort | ||||

× | Program Source Code | CPU secs | Elapsed secs | Memory KB | Code B | ≈ CPU Load |
---|---|---|---|---|---|---|

1.0 | PyPy 2 |
1.28 | 0.49 | 79,176 | 1009 | 18% 56% 59% 4% 40% 36% 0% 56% |

1.1 | Pyston #3 |
3.47 | 0.51 | 13,476 | 894 | 84% 88% 87% 90% 86% 86% 87% 92% |

1.2 | Nuitka #3 |
4.05 | 0.56 | 14,572 | 894 | 93% 89% 82% 89% 96% 91% 89% 91% |

1.2 | Nuitka #4 |
4.15 | 0.58 | 14,552 | 1069 | 91% 91% 88% 86% 90% 88% 93% 90% |

1.4 | PyPy 3 #4 |
2.49 | 0.66 | 81,224 | 1069 | 43% 41% 44% 40% 81% 40% 43% 49% |

1.4 | PyPy 3 #3 |
2.39 | 0.67 | 81,384 | 894 | 37% 37% 39% 42% 39% 39% 89% 38% |

1.5 | Pyston #4 |
4.78 | 0.72 | 13,464 | 1069 | 87% 87% 86% 81% 84% 91% 92% 85% |

1.6 | PyPy 3 |
1.61 | 0.76 | 80,796 | 1271 | 39% 5% 39% 1% 49% 42% 1% 43% |

1.6 | PyPy 3 #6 |
0.74 | 0.76 | 66,624 | 552 | 0% 1% 0% 0% 0% 0% 97% 0% |

1.7 | PyPy 3 #2 |
1.88 | 0.85 | 79,936 | 1008 | 36% 0% 1% 43% 42% 56% 44% 1% |

1.9 | Python development version #4 |
6.76 | 0.92 | 15,148 | 1069 | 93% 92% 93% 95% 96% 95% 93% 98% |

1.9 | Python development version #3 |
6.35 | 0.92 | 15,068 | 894 | 91% 96% 90% 91% 90% 89% 83% 91% |

1.9 | Pyston #2 |
3.53 | 0.94 | 13,460 | 1008 | 33% 15% 92% 96% 81% 6% 94% 12% |

2.5 | Nuitka #2 |
4.72 | 1.23 | 14,408 | 1008 | 5% 98% 97% 2% 96% 94% 1% 1% |

2.9 | Nuitka |
5.27 | 1.39 | 14,560 | 1271 | 87% 18% 1% 97% 8% 1% 91% 95% |

3.4 | Python development version #2 |
6.38 | 1.65 | 15,040 | 1008 | 95% 19% 97% 5% 4% 7% 94% 97% |

4.1 | Python development version |
7.68 | 1.97 | 15,264 | 1271 | 94% 28% 97% 18% 23% 96% 98% 13% |

4.5 | Python 3 #4 |
14.55 | 2.20 | 12,812 | 1069 | 94% 95% 94% 90% 98% 90% 93% 90% |

4.8 | Python 3 #3 |
13.67 | 2.31 | 12,852 | 894 | 87% 84% 97% 85% 90% 94% 89% 87% |

5.8 | Nuitka #6 |
2.78 | 2.80 | 10,700 | 552 | 0% 100% 0% 0% 1% 0% 1% 0% |

5.9 | Python 2 |
11.02 | 2.85 | 11,460 | 1009 | 6% 4% 97% 95% 94% 0% 1% 96% |

6.0 | Pyston #6 |
2.93 | 2.93 | 8,508 | 552 | 12% 8% 4% 100% 1% 2% 4% 1% |

6.3 | Python 3 #2 |
11.97 | 3.07 | 12,672 | 1008 | 97% 26% 98% 14% 8% 96% 98% 6% |

7.2 | Graal #6 |
3.86 | 3.47 | 735,676 | 552 | 93% 75% 86% 90% 76% 86% 98% 70% |

7.5 | Python development version #6 |
3.64 | 3.64 | 9,548 | 552 | 16% 8% 100% 4% 7% 6% 5% 7% |

8.5 | Python 3 |
16.04 | 4.12 | 12,888 | 1271 | 98% 94% 29% 28% 30% 98% 99% 21% |

12 | Python 3 #6 |
5.63 | 5.64 | 8,096 | 552 | 49% 21% 20% 16% 69% 19% 12% 13% |

13 | MicroPython #6 |
6.21 | 6.22 | 4,348 | 552 | 0% 0% 0% 100% 0% 0% 0% 0% |

464 | RustPython #6 |
224.80 | 224.81 | 14,552 | 552 | 0% 0% 0% 0% 100% 0% 0% 0% |

missing benchmark programs | ||||||

Jython |
No program | |||||

IronPython |
No program | |||||

Cython |
No program | |||||

Shedskin |
No program | |||||

Numba |
No program | |||||

Grumpy |
No program |

**diff** program output N = 7 with this output file to check your program is correct before contributing.

We are trying to show the performance of various programming language implementations - so we ask that contributed programs not only give the correct result, but also **use the same algorithm** to calculate that result.

For N = 7 programs should generate these permutations (40KB) - which, incidentally, seem to be in the same order as permutations generated by the Tompkins-Paige algorithm, see pages 150-151 Permutation Generation Methods Robert Sedgewick.

The fannkuch benchmark is defined by programs in Performing Lisp Analysis of the FANNKUCH Benchmark, Kenneth R. Anderson and Duane Rettig.

Each program should

- Take a permutation of {1,...,n}, for example: {4,2,1,5,3}.
- Take the first element, here 4, and reverse the order of the first 4 elements: {5,1,2,4,3}.
- Repeat this until the first element is a 1, so flipping won't change anything more: {3,4,2,1,5}, {2,4,3,1,5}, {4,2,3,1,5}, {1,3,2,4,5}.
- Count the number of flips, here 5.
- Keep a checksum
- checksum = checksum + (if permutation_index is even then flips_count else -flips_count)
- checksum = checksum + (toggle_sign_-1_1 * flips_count)

- Do this for all n! permutations, and record the maximum number of flips needed for any permutation.

The conjecture is that this maximum count is approximated by n*log(n) when n goes to infinity.

*FANNKUCH* is an abbreviation for the German word *Pfannkuchen*, or pancakes, in analogy to flipping pancakes.

Thanks to Oleg Mazurov for insisting on a checksum and providing this helpful description of the approach he took -

- A common idea for parallel implementation is to divide all work (n! permutations) into chunks small enough to avoid load imbalance but large enough to keep overhead low. I set the number of chunks as a parameter (NCHUNKS = 150) from which I derive the size of a chunk (CHUNKSZ) and the actual number of chunks/tasks to be processed (NTASKS), which may be different from NCHUNKS because of rounding.
- Task scheduling is trivial: threads will atomically get and increment the taskId variable to derive a range of permutation indices to work on:
task = taskId.getAndIncrement(); idxMin = task * CHUNKSZ; idxMax = min( idxMin + CHUNKSZ, n! );

- Maximum flip counts and partial checksums can be computed for chunks in arbitrary order and recombined to generate the required result at the final step (CHUNKSZ must be even for adding partial checksums to be associative - I didn't enforce it in my submission).
- Now I need to go from a permutation index to the permutation itself.
- The predefined order in which all permutations are to be generated can be described as follows: to generate n! permutations of n arbitrary numbers, rotate the numbers left (from higher position to lower) n times, so that each number appears in the n-th position, and for each rotation recursively generate (n-1)! permutations of the first n-1 numbers whatever they are.
- To optimize the process I use an intermediate data structure, count[], which keeps count of how many rotations have been done at every level. Apparently, count[0] is always 0, as there is only one element at that level, which can't be rotated; count[1] = 0..1 for two elements, count[2] = 0..2 for three elements, etc.
- To generate next permutation I swap the first two elements and increase count[1]. If count[1] becomes greater than 1, I'm done with rotations at level 1 and need to "return" (as it would have been in the recursive implementation) to level 2. Now, I rotate 3 elements and increment count[2]. If it becomes greater than 2, I'm done with level 2 and need to go to level 3, etc.
- It should be clear now how to generate a permutation and corresponding count[] array from an arbitrary index. Basically, count[k] = ( index % (k+1)! ) / k! is the number of rotations we need to perform on elements 0..k. Doing it in the descending order from n-1 to 1 gives us both the count[] array and the permutation.