Python multiprocessing WHY and HOW

简介:

I am working on a Python script which will migrate data from one database to another. In a simple way, I need selectfrom a database and then insert into another.
In the first version, I designed to use multithreading, just because I am more familiar with it than multiprocessing. But after fewer month, I found several problems in my workaround.

  • can only use one of 24 cpus in case of GIL
  • can not handle singal for each thread. I want to use a simple timeout decarator, to set a signal.SIGALRM for a specified function. But for multithreading, the signal will get caught by a random thread.

So I start to refactor to multiprocessing.

multiprocessing

multiprocessing is a package that supports spawning processes using an API similar to the threading module.

But it's not so elegant and sweet as it described.

multiprocessing.pool

from multiprocessing import Pool

def f(x):
    return x*x

if __name__ == '__main__':
    p = Pool(5)
    print(p.map(func=f, [1, 2, 3]))

It looks like a good solution, but we can not set a bound function as a target func. Because bound function can not be serialized in pickle. And multithreading.pool use pickle to serialize object and send to new processes.pathos.multiprocessing is a good instead. It uses dill as an instead of pickle to give better serialization.

share memory

Memory in multithreading is shared naturally. In a multiprocessing environment, there are some wrappers to wrap a sharing object.

  • multiprocessing.Value and multiprocessing.Array is the most simple way to share Objects between two processes. But it can only contain ctype Objects.
  • multiprocessing.Queue is very useful and use an API similar to Queue.Queue

Python and GCC version

I didn't know that even the GCC version will affect behavior of my code. On my centos5 os, same Python version with different GCC version will have different behaviors.

Python 2.7.2 (default, Jan 10 2012, 11:17:45)
[GCC 3.4.6 20060404 (Red Hat 3.4.6-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/queues.py", line 48, in <module>
    from multiprocessing.synchronize import Lock, BoundedSemaphore, Semaphore, Condition
  File "/home/oracle/dbapython/lib/python2.7/multiprocessing/synchronize.py", line 59, in <module>
    " function, see issue 3770.")
ImportError: This platform lacks a functioning sem_open implementation, therefore, the required synchronization primitives needed will not function, see issue 3770.


Python 2.7.2 (default, Oct 15 2013, 13:15:26)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from multiprocessing.queues import JoinableQueue
>>>

目录
相关文章
|
Unix Linux Python
114 python高级 - multiprocessing
114 python高级 - multiprocessing
47 0
|
5月前
|
数据采集 并行计算 安全
Python并发编程:多进程(multiprocessing模块)
在处理CPU密集型任务时,Python的全局解释器锁(GIL)可能会成为瓶颈。为了充分利用多核CPU的性能,可以使用Python的multiprocessing模块来实现多进程编程。与多线程不同,多进程可以绕过GIL,使得每个进程在自己的独立内存空间中运行,从而实现真正的并行计算。
|
5月前
|
Unix Linux API
Python multiprocessing模块
Python multiprocessing模块
|
6月前
|
数据处理 调度 Python
Python并发编程实战指南:深入理解线程(threading)与进程(multiprocessing)的奥秘,打造高效并发应用!
【7月更文挑战第8天】Python并发编程探索:使用`threading`模块创建线程处理任务,虽受限于GIL,适合I/O密集型工作。而`multiprocessing`模块通过进程实现多核利用,适用于CPU密集型任务。通过实例展示了线程和进程的创建与同步,强调了根据任务类型选择合适并发模型的重要性。
71 5
|
6月前
|
数据库 数据安全/隐私保护 C++
Python并发编程实战:线程(threading)VS进程(multiprocessing),谁才是并发之王?
【7月更文挑战第10天】Python并发对比:线程轻量级,适合I/O密集型任务,但受GIL限制;进程绕过GIL,擅CPU密集型,但通信成本高。选择取决于应用场景,线程利于数据共享,进程利于多核利用。并发无“王者”,灵活运用方为上策。
98 2
|
6月前
|
Python
在Python中,`multiprocessing`模块提供了一种在多个进程之间共享数据和同步的机制。
在Python中,`multiprocessing`模块提供了一种在多个进程之间共享数据和同步的机制。
|
6月前
|
安全 API Python
`multiprocessing`是Python的一个标准库,用于支持生成进程,并通过管道和队列、信号量、锁和条件变量等同步原语进行进程间通信(IPC)。
`multiprocessing`是Python的一个标准库,用于支持生成进程,并通过管道和队列、信号量、锁和条件变量等同步原语进行进程间通信(IPC)。
|
6月前
|
API 数据库 C++
震惊!Python并发编程大揭秘:线程(threading)VS进程(multiprocessing),你选对了吗?
【7月更文挑战第8天】在Python并发编程中,线程适合I/O密集型任务,如实时订单处理,而进程适合CPU密集型任务,如商品信息同步。线程利用轻量级并发,处理I/O等待时切换成本低;进程通过multiprocessing模块充分利用多核CPU。根据任务类型选择合适工具,能提升效率并优化系统性能。理解和运用线程与进程,是解决并发问题的关键。
49 0
|
8月前
|
数据采集 Java Python
python并发编程:使用多进程multiprocessing模块加速程序的运行
python并发编程:使用多进程multiprocessing模块加速程序的运行
157 1
|
8月前
|
安全 Python
python多进程multiprocessing使用
如果你想在python中使用线程来实现并发以提高效率,大多数情况下你得到的结果是比串行执行的效率还要慢;这主要是python中GIL(全局解释锁)的缘故,通常情况下线程比较适合高IO低CPU的任务,否则创建线程的耗时可能比串行的还要多。GIL是历史问题,和C解释器有关系。 为了解决这个问题,python中提供了多进程的方式来处理需要并发的任务,可以有效的利用多核cpu达到并行的目的。【2月更文挑战第5天】
141 0