Quicksort improvement

DSP, Plugin and Host development discussion.
RELATED
PRODUCTS

Post

Must admit I didn't read any of the references, but isn't it so, that no matter what the approach to pick the reference value, there is always a worst-case example which needs quadratic time?

In comparison, IIRC, heapsort always guarantees linear logarithmic time in the worst case, and recently there have been implementations which do heapsort in-place (originally it required a second array).

Post

I would test your code to see how much faster it actually is, but it doesn't compile with clang.

Post

Z1202 wrote: Thu Feb 21, 2019 9:41 am In comparison, IIRC, heapsort always guarantees linear logarithmic time in the worst case, and recently there have been implementations which do heapsort in-place (originally it required a second array).
Heapsort is always O(N log N) as long as the keys are distinct. According to Wikipedia the in-place version has been known about as long as the algorithm itself:
Wikipedia wrote: Heapsort was invented by J. W. J. Williams in 1964.[2] This was also the birth of the heap, presented already by Williams as a useful data structure in its own right.[3] In the same year, R. W. Floyd published an improved version that could sort an array in-place, continuing his earlier research into the treesort algorithm.[3]
Practically speaking, fairly trivial heapsort of small to moderate (=fits in L2) datasets takes about 1.5 to 2 times as long as std::sort() on this MBP, but heapsort has a terrible memory access pattern that start to hurt as your dataset outgrows your caches. In comparison, in-place quicksort (assuming ideal pivots at least) is essentially cache oblivious.

Post

mystran wrote: Thu Feb 21, 2019 11:34 amPractically speaking, fairly trivial heapsort of small to moderate (=fits in L2) datasets takes about 1.5 to 2 times as long as std::sort() on this MBP, but heapsort has a terrible memory access pattern that start to hurt as your dataset outgrows your caches. In comparison, in-place quicksort (assuming ideal pivots at least) is essentially cache oblivious.
Notably, since C++11 std::sort is required to be O(NlogN). At any rate for realtime usage I'd be more concerned about worst case performance (where O(N^2) would usually be prohibitive) and for non-realtime usage I wouldn't care about being 1.5 to 2 times slower, except for offline processing of large data sets (where I'd still be concerned about worst case, as having O(N^2) for large N doesn't sound like a good idea to me).

Post

Z1202 wrote: Thu Feb 21, 2019 12:16 pm
mystran wrote: Thu Feb 21, 2019 11:34 amPractically speaking, fairly trivial heapsort of small to moderate (=fits in L2) datasets takes about 1.5 to 2 times as long as std::sort() on this MBP, but heapsort has a terrible memory access pattern that start to hurt as your dataset outgrows your caches. In comparison, in-place quicksort (assuming ideal pivots at least) is essentially cache oblivious.
Notably, since C++11 std::sort is required to be O(NlogN). At any rate for realtime usage I'd be more concerned about worst case performance (where O(N^2) would usually be prohibitive) and for non-realtime usage I wouldn't care about being 1.5 to 2 times slower, except for offline processing of large data sets (where I'd still be concerned about worst case, as having O(N^2) for large N doesn't sound like a good idea to me).
You can get O(NlogN) by doing quicksort and limiting the recursion depth to KlogN for some constant K (which could be anything as far as the asymptotic bound is concerned) and switching to another algorithm (usually heapsort) when you hit that limit. This is usually what "introsort" does. When this happens, pure heapsort would be faster, but you keep the asymptotic complexity bounds.

As far as the worst-case of pure quicksort: it's worth keeping in mind that as a recursive divide and conquer algorithm, it also needs auxiliary memory (ie. stack space) that's linear in the recursion depth (to keep track of where you split the array), so if your N is large and you hit the O(N^2) case, you also need O(N) stack space to avoid crashing. In contrast, heapsort only needs O(1) storage that typically fits into registers (since it's just a few pointers and iterators), which makes it an ideal fall-back.

Either way, if you are sorting very large data sets, you can get O(NlogN) worst-case with O(N) extra storage by using merge sort (typically combined with another sorting algorithm for the small blocks), which might be the best strategy once you run out of RAM.

Post

Also you can often beat the O(NlogN) bounds if you can use information other than pair-wise comparisons. For example, if all keys are integers of K bits, then in-place binary radix sort can be done in K passes over the data (eg. partitioning similar to in-place quicksort, but looking at the bits in the keys rather than comparing them to pivot), without doing any pair-wise comparisons at all. You can also do higher radix (still in-place) if you do separate counting pass at each level of recursion to figure out how large the buckets should be. For example, you could sort 32-bit keys with a radix of 256 and never have to recurse more than 4 levels.

Post

mystran wrote: Thu Feb 21, 2019 9:54 am I would test your code to see how much faster it actually is, but it doesn't compile with clang.
Strange, I don't recall using any thing but the standard C/C++ stuff. May be it's some Microsoft specific keywords like __inline instead of inline. and so one. I'll if I can install Clang and check whats happening.
mystran wrote: Thu Feb 21, 2019 4:33 pm Also you can often beat the O(NlogN) bounds if you can use information other than pair-wise comparisons. For example, if all keys are integers of K bits, then in-place binary radix sort can be done in K passes over the data (eg. partitioning similar to in-place quicksort, but looking at the bits in the keys rather than comparing them to pivot), without doing any pair-wise comparisons at all. You can also do higher radix (still in-place) if you do separate counting pass at each level of recursion to figure out how large the buckets should be. For example, you could sort 32-bit keys with a radix of 256 and never have to recurse more than 4 levels.
I did similar stuff with bucket sort, like a hybrid with quicksort. It works really well when the input space is known as you mentioned. But the tricky thing is how to adapt that to be general purpose for any input and still maintain very good performance, assuming that the bulk of the input is going to be in a certain range yet have some out of ranges value.

Edit: Actually, now that I read it again. It's different.
www.solostuff.net
Advice is heavy. So don’t send it like a mountain.

Post

Today I woke up with an itch. About what you guys where saying regarding the O(N^2) worst case. Back in 2015, I had the best mind set, now I forgot what I did. But have a look at this:

https://www.cs.cmu.edu/~avrim/451f11/le ... ct0906.pdf

You can enable randomized worst-case mitigation in the the code by defining the RANDOM_SAMPLES compiler directive.

For guaranteed N-logN worst case. It would be actually easy to modify the algorithm to fallback to heapsort or so after a certain tree depth.
www.solostuff.net
Advice is heavy. So don’t send it like a mountain.

Post Reply

Return to “DSP and Plugin Development”