Parallel Sort –

by Blog Admin
0 comment

Everyone knows about what sorting is. There are so many computer algorithms that have emerged to support sorting. Some of the well-known algorithms are quick sort, heap sort, merge sort, etc.

All these sorting algorithms work based on sequential sorting. This means a single thread is used to perform the complete sorting operation. However, this approach is quite useful when the requirement is to handle low to medium levels of data. For a huge dataset (in billions), sequential sorting does not scale well. This is where we need to use ‘Parallel sort.’

Java 8 Support

Since Java 8 onwards, the collections API implemented parallel sorting with the help of Streams.

Arrays and Lists are the popular data structures that benefited from parallel sorting, and they directly started supporting it. However, prior to Java 8, parallelism is not directly supported.

Streams and Parallelism

Java has provided parallelism support in the Stream API. For example, java.util.List has a stream() method that will support parallelism. You can create a parallel stream by calling Stream.parallel()

Listlist = new ArrayList<>(); list.add(“a”); …..

Java has provided parallelism support in the Stream API. For example, java.util.List has a stream() 

The above code will prepare a parallel stream for further processing. Similarly, Arrays can also be used this way. We can make parallel streams from the arrays.

Arrays and Parallel Sort

Java 8 onwards, there are a few ways to sort an array in a concurrent manner. They are listed below:

  1. Arrays.parallelSort(array object)
  2. object).parallel().sorted()
  3. object).parallel().sorted(Comparator.comparing(int ->{ ….}));

     Similarly, there is a method for sorting a List as well.


However, there are some differences when the above methods are used. The Arrays.parallelSort() and can only be used to sort in ascending order. But,  can be used to sort an array reverse.

How Does Parallel Sort Work?

The parallelSort() method uses the fork and joins approach in which the arrays are divided into smaller units until each unit is easily manageable and then sorted individually. Then, the smaller units are joined together, and this entire operation happens in parallel. This approach uses multithreading, which makes it more efficient and fast.

Parallelism Value

By default, when parallel sorting is performed, the ForkJoinPool creates a default thread pool. The size of the thread pool is determined based on the number of CPUs available in the current machine. In most of the circumstances, this value is sufficient. However, we can fine-tune this value based on the need of the application. In that case, it is possible to control the size of the thread pool by passing the “java.util.concurrent.ForkJoinPool.common.parallelism” property with a value with the help of the System.setProperty() method.

System.setProperty("java.util.concurrent.ForkJoinPool.common.parallelism", "10");

The above code will set the parallelism property to 10, and the ForkJoinPool framework will generate ten worker threads. You will get to know this behavior later in the article.

Benchmark for the Parallel Sort

Let’s develop a small piece of code that will demonstrate the performance of parallel sort. 

In this code, we will populate a list with 10 million entries and will sort it without using parallelism. Next, we will sort it using parallelism and compare the results. 

First, sort the list using sequentially (without using parallelism).

sort the list using sequentially

It takes 14389 milliseconds to perform the operation.

Next, sort the list using a parallel stream.

sort the list using a parallel stream

The time taken for this operation is 3906 milliseconds.

It is just taking half of the time required for sequential sorting.

This takes less amount of time because parallel sorting employs multiple threads for the sorting operation as compared to sequential sorting. Remember, in the case of the sequential sort, only ONE thread is used for the entire sorting operation. This is the main difference in why the parallel sort is ahead of the sequential sort.

Note: The time will vary on different systems.

What Happens Under the JVM?

Now, we are going to perform a thread dump analysis to understand what happens inside the JVM. This process will give you a complete picture of the threads that are used to perform both sequential and parallel operations. In the case of parallel sort, more threads will be used.

We will analyze the thread dumps for the sequential and parallel sorts to find out this difference.

Thread Dump Analysis in Sequential Sorting

Fig: fastThread tool highlighting 32 RUNNABLE threads Fig: fastThread tool highlighting 32 RUNNABLE threads

thread dump analysis report Above is the thread dump analysis report that was captured from the sequential sorting program. In the above report, you can see that there are 32 threads that are in a RUNNABLE state. In the case of sequential sort, Java is NOT using the ForkJoinPool API. Please take a look at the Thread Group section as well.

Access the full report.

Thread Dumps in Parallel Sorting

In the case of parallelism, Java uses the ForkJoinPool API that will split the tasks into smaller chunks by an operation called “forking” and join the results. This brings a lot of performance boost to the complex operations. In parallel sorting, the ForkJoinPool API internally maintains a thread pool, which is used in the case of parallel sorting. 

But we can control this thread creation process by setting a system property known as “java.util.concurrent.ForkJoinPool.common.parallelism,” which was mentioned in the previous section of the article. Based on the value of this property, the ForkJoinPool API will generate as many threads as possible. In the above code, you can see the value is 10, which means the API will generate a pool of ten working threads.  

Fig: fastThread tool highlighting 42 RUNNABLE threads Fig: fastThread tool highlighting 42 RUNNABLE threads 

Fig: fastThread tool highlighting newly created ‘ForkJoinPool.commonPool’  Fig: fastThread tool highlighting newly created ‘ForkJoinPool.commonPool’  

The Thread Group section of the report reveals the number of worker threads that are generated by the ForkJoinPool API. There are ten threads in a RUNNABLE state. You can see that in the case of parallel sorting, there are 42 threads used as opposed to 32 for sequential sorting.

Let’s talk about a few words about ForkJoinPool API. This framework is used to perform parallel operations in Java. This is the default implementation of Java when a parallel operation is performed. However,  this API is available from Java 8 onwards. You can see from the thread group image that it is generating around ten threads for this operation. 

A complete overview of the report.


Parallelism makes the code more scalable and efficient. In the above code with parallel sort, there was no need for developing complex algorithms, and it still achieves the same result. This will remove the overhead of developing complicated logic for parallelism.

You may also like

Leave a Comment