1.
A data science team is trying to speed up their Python script by using the threading
module to process multiple large data files at once on a multi-core machine. They are surprised to observe no significant performance improvement. Please explain the following:
- What is the Global Interpreter Lock (GIL)? Describe its purpose and how it functions in CPython.
- Why does the GIL prevent true parallelism when using the
threading
module for CPU-bound tasks (like processing large data files) on multi-core machines, leading to the team's observed lack of performance improvement? - Suggest an alternative solution that would allow for true parallelism in this scenario. Specifically mention the
multiprocessing
module. - Explain how the suggested alternative (multiprocessing) overcomes the limitations of the GIL to achieve actual speedup on multi-core systems for CPU-bound tasks. Compare and contrast its execution model with
threading
.