In this story, I will summarize 3 different ways in python to archive asynchronous programming.
There are usually 2 types of problems that asynchronous programming can have a big impact, these are usually called CPU-intensive and I/O-intensive.
CPU-intensive: Tasks that require a lot of CPU cycles, like sorting, search, graph traversal, matrix multiply.
I/O-intensive: Tasks that require a lot of IO, like read/write files, HTTP requests.
Use these 3 libraries and what’s the difference between them:
threading
multiprocessing
asyncio
Cause I/O is asynchronous in node.js, so we don’t need to care about I/O in nodejs, but it’s completely different in python. We will use requests as our test tool, cause requests is I/O intensive.
Let’s start to run codes. At first, let’s try the synchronous version:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
import requests import time defdownload_site(url, session): with session.get(url) as response: print("Got content from website: {}".format(url)) defdownload_all_sites(sites): with requests.Session() as session: for url in sites: download_site(url, session) if __name__ == "__main__": sites = ["https://stackoverflow.com", "https://github.com"] * 10 start_time = time.time() download_all_sites(sites) duration = time.time() - start_time print("Download time: {}".format(duration))
In this sample, we download every site 10 times, and it cost 8.953143119812012 seconds.
Ler’s try to use threading to add value to this program.
import requests import multiprocessing import time session = None defset_global_session(): global session ifnot session: session = requests.Session() defdownload_site(url): with session.get(url) as response: print("Got content from website: {}".format(url)) defdownload_all_sites(sites): with multiprocessing.Pool(initializer = set_global_session) as pool: pool.map(download_site, sites) if __name__ == "__main__": sites = ["https://stackoverflow.com", "https://github.com"] * 10 start_time = time.time() download_all_sites(sites) duration = time.time() - start_time print("Download time: {}".format(duration))
This version cost around `2.18524312973022461 seconds.
So for CPU-intensive tasks, we should use multiprocessing, cause multiprocessing will use multiple CPUs and can reduce calculation time.
For I/O intensive tasks, we can choose threading or asyncio. This will help you run your tasks in a higher perfermance.
There are many other approaches that can reach run python tasks in an asynchronous way. My suggestion is to use celery (https://docs.celeryproject.org/)
This is a superb tool that can run many many tasks asynchronously.