Everyone complains that Python is slow. What bothered us was not the execution time of Python, but the startup time. One of our command-line utility written in Python was very slow.

$ time elf --help
real  0m5.450s
user  0m4.095s
sys 0m0.632s

It takes 5 seconds even to start the application. It was not something that we could live with, and we had to do something to solve this.

After investigation, we found that import some heavy modules is taking a very long time.

$ time python -c 'import pandas'

real  0m1.008s
user  0m0.960s
sys 0m0.142s
$ time python -c 'import google.cloud.bigquery'

real  0m2.083s
user  0m1.839s
sys 0m0.278s

While there are some parts of the application that use these modules, it doesn’t make sense to load them on startup and paying the price whether or not we use it.

Identifying the Heavy Modules

Python 3.7 has introduced a beautiful feature -X importtime to profile the import times.

$ python -X importtime program.py
import time: self [us] | cumulative | imported package
import time:      1601 |     100679 |     numpy.core
import time:      3817 |     219544 |   numpy
import time:       310 |     369807 |     pandas.core.groupby
import time:       401 |     442950 |   pandas.core.api
import time:      4122 |    1090869 | pandas

It prints the time taken to import every module used by the program. The second column is the cumulative time taken to import that module and all the modules imported by that module.

Please note the time reported is in microseconds.

With this, we can easily filter the ones that take taking too long to import.

$ python -X importtime elf.main 2>&1 | awk -F '|' '$2 > 200000'
import time:      4324 |     205209 |       numpy
import time:      1568 |     921741 |     pandas
import time:      5249 |    1023674 |     google.cloud.storage
import time:      1912 |     127621 |     google.cloud.bigquery
import time:      1202 |    1161596 |   statsmodels.tsa
import time:      2658 |     353493 |   scipy.optimize
import time:      1204 |     253559 |   scipy.stats
import time:      2381 |    1731564 | statsmodels.tsa.holtwinters

The above command filters the output to include the ones with the second column (cumulative time) more than 200000 (0.2 seconds).

Lazily Loading the Modules

All problems in computer science can be solved by another level of indirection.

– Butler Lampson

Now that we’ve identified which modules that are causing the trouble, we need to figure out a way to load them on demand instead of loading them on startup.

So, we write a small utility to load a module lazily.

The lazy_import function returns a proxy object that loads the module only on the first use and delegates all the attribute access to the loaded module. However, there is a small price to pay for that indirection.

But, using lazy_import all around the code is not pleasant. So we created a new helper module in our application to lazy import all heavy modules.

And all we had to do in the rest of the application was to change imports slightly.

Handling Rarely used Modules

Some of the heavy modules were used only in a function or two. We decided to import them in the function where they are used instead of adding them to heavy_module.py.

Avoiding Initialization on Startup

While lazy importing sorted part of the problem, that didn’t solve the issue altogether. There were parts of code, that was trying to initialize a global variable on startup. One such example is initializing the database on startup. These add considerable overhead to the startup time, and it is possible to avoid that by moving the initialization to a function.

The Final Result

With all these improvements, the startup time came down to almost a second.

$ elf --help
real  0m1.102s
user  0m0.973s
sys 0m0.116s

It turned out that about half of that time is the overhead in launching the script and using the module directly takes even less time.

$ time python -m elf.main
real  0m0.646s
user  0m0.580s
sys 0m0.052s

At this point, we have already reached the point of diminishing returns and decided to stop here and leave that half a second for some other time.

We’re hiring! We are keen to work with enthusiastic engineers who are passionate about product development, software engineering and machine learning. Please visit our careers page for more details.