Everyone complains that Python is slow. What bothered us was not the execution time of Python, but the startup time. One of our command-line utility written in Python was very slow.
$ time elf --help ... real 0m5.450s user 0m4.095s sys 0m0.632s
It takes 5 seconds even to start the application. It was not something that we could live with, and we had to do something to solve this.
After investigation, we found that import some heavy modules is taking a very long time.
$ time python -c 'import pandas' real 0m1.008s user 0m0.960s sys 0m0.142s
$ time python -c 'import google.cloud.bigquery' real 0m2.083s user 0m1.839s sys 0m0.278s
While there are some parts of the application that use these modules, it doesn’t make sense to load them on startup and paying the price whether or not we use it.
Identifying the Heavy Modules
Python 3.7 has introduced a beautiful feature
-X importtime to profile the import times.
$ python -X importtime program.py import time: self [us] | cumulative | imported package ... import time: 1601 | 100679 | numpy.core import time: 3817 | 219544 | numpy ... import time: 310 | 369807 | pandas.core.groupby import time: 401 | 442950 | pandas.core.api import time: 4122 | 1090869 | pandas
It prints the time taken to import every module used by the program. The second column is the cumulative time taken to import that module and all the modules imported by that module.
Please note the time reported is in microseconds.
With this, we can easily filter the ones that take taking too long to import.
$ python -X importtime elf.main 2>&1 | awk -F '|' '$2 > 200000' import time: 4324 | 205209 | numpy ... import time: 1568 | 921741 | pandas ... import time: 5249 | 1023674 | google.cloud.storage ... import time: 1912 | 127621 | google.cloud.bigquery ... import time: 1202 | 1161596 | statsmodels.tsa import time: 2658 | 353493 | scipy.optimize import time: 1204 | 253559 | scipy.stats import time: 2381 | 1731564 | statsmodels.tsa.holtwinters
The above command filters the output to include the ones with the second column (cumulative time) more than 200000 (0.2 seconds).
Lazily Loading the Modules
All problems in computer science can be solved by another level of indirection.
– Butler Lampson
Now that we’ve identified which modules that are causing the trouble, we need to figure out a way to load them on demand instead of loading them on startup.
So, we write a small utility to load a module lazily.
lazy_import function returns a proxy object that loads the module only on the first use and delegates all the attribute access to the loaded module. However, there is a small price to pay for that indirection.
lazy_import all around the code is not pleasant. So we created a new helper module in our application to lazy import all heavy modules.
And all we had to do in the rest of the application was to change imports slightly.
Handling Rarely used Modules
Some of the heavy modules were used only in a function or two. We decided to import them in the function where they are used instead of adding them to
Avoiding Initialization on Startup
While lazy importing sorted part of the problem, that didn’t solve the issue altogether. There were parts of code, that was trying to initialize a global variable on startup. One such example is initializing the database on startup. These add considerable overhead to the startup time, and it is possible to avoid that by moving the initialization to a function.
The Final Result
With all these improvements, the startup time came down to almost a second.
$ elf --help ... real 0m1.102s user 0m0.973s sys 0m0.116s
It turned out that about half of that time is the overhead in launching the script and using the module directly takes even less time.
$ time python -m elf.main ... real 0m0.646s user 0m0.580s sys 0m0.052s
At this point, we have already reached the point of diminishing returns and decided to stop here and leave that half a second for some other time.
We’re hiring! We are keen to work with enthusiastic engineers who are passionate about product development, software engineering and machine learning. Please visit our careers page for more details.