TOPIC: Parallel Processing in Python
Parallelization in programming means running multiple tasks at the same time. Modern computers
are very powerful and the traditional way of programming is not utilizing the full potential of these
computers. But through parallelization, we can utilize our underutilized computer hardware to its
full potential.
Parallelization in python can be achieved in 2 ways. Through multiple threads or multiple
processes.
• Process: When you run a program, it becomes a process.
• Thread: By default, one process has just one thread of execution or simply called thread.
But you can increase the number of threads in a single process.
• Multiprocessing: When you run multiple processes on your CPU at the same time, it is
called multiprocessing.
• Multithreading: When you run multiple threads on your CPU at the same time, it is called
multithreading.
Parallelization is not actually about running multiple independent tasks. It is more about dividing
your big task into small multiple tasks and then run them in parallel in order to reduce their
execution time exponentially. So you can either divide a single task into multiple processes
(multiprocessing) or you can divide it into multiple threads (multithreading).
Example:
Now, let me show you an example of how parallelism is achieved in python. In order to achieve
process-based parallelism (multiprocessing), we can use a python module called multiprocessing.
And to achieve thread-based parallelism (multithreading), we can use a module called threading.
Process-based parallelism:
Here is an example of a python code which creates multiple processes to run a task.
Code:
, Figure 1: process-based parallelism
The code in figure 1 uses multiprocessing module. This python module allows you to create
multiple processes in your code and hence achieve parallelism.
As you can see in the figure that first we import process class from the multiprocessing module.
Then, we create a function. The function is the task that we want to execute parallelly for different
inputs. The function itself is pretty basic. It takes a number, runs a loop from 1 to 5 and add the
iteration number to the input argument.
In the main block, we simply use the process class to create a process. So what is the process that
we want? Obviously the number function we have created which takes a number as input and adds
numbers from 1 to 5 to that input number. We have simply passed our required function to the target
parameter of the process class and the second argument to the process class is the argument that we
want to pass to our numbers function. The process class will create a process out of the numbers
function with the given input value.
We then create another process (p2) by following the same steps but this time, we pass a different
argument to our numbers function. So the process p2 does the same job as process p1 but its input
and output are different than that of p1. Then we start the two processes. These processes will now
run parallelly. After they have completed their execution, we join the results from the two processes.
The output looks like it has been generated from a single processes but actually it has been
generated by 2 different processes running in parallel. Note that you can create as many processes
as you want (but know that creating more processes than your cpu cores will be pointless).