Motor: Asynchronous Python driver for Tornado and MongoDB

简介:
使用pymongo驱动连接mongodb插入100W记录, 需要364秒. 性能较差,  原因是pymongo未使用异步接口.
motor驱动使用异步接口, 性能有所提升.
安装motor驱动, 可以使用pip快速安装, 解决依赖问题. pip是一个脚本, 可以自动从PyPI安装module. 如下
# cat /usr/local/bin/pip3.4
#!/usr/local/bin/python3.4

# -*- coding: utf-8 -*-
import re
import sys

from pip import main

if __name__ == '__main__':
    sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?$', '', sys.argv[0])
    sys.exit(main())

安装motor
[root@localhost bin]# pip3.4 install motor

或
[root@localhost bin]# pip3.4 install motor --upgrade

motor的使用手册如下
修改测试脚本 : 
使用批量写入.
[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

print(time.time())

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))

    collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(0,100000)), callback=my_callback)
    tornado.ioloop.IOLoop.instance().start()

    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)

def test():
  t_names = dict()
  for i in range(0,8):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

报了一堆错误, 除了0号线程未报错, 其他线程都报类似以下错误 : 
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/threading.py", line 921, in _bootstrap_inner
    self.run()
  File "t.py", line 33, in run
    tornado.ioloop.IOLoop.instance().start()
  File "/usr/local/lib/python3.4/site-packages/tornado/ioloop.py", line 704, in start
    raise RuntimeError("IOLoop is already running")
RuntimeError: IOLoop is already running

ERROR:tornado.application:Uncaught exception, closing connection.
Traceback (most recent call last):
  File "/usr/local/lib/python3.4/site-packages/tornado/iostream.py", line 508, in wrapper
    return callback(*args)
  File "/usr/local/lib/python3.4/site-packages/tornado/stack_context.py", line 275, in null_wrapper
    return fn(*args, **kwargs)
  File "/usr/local/lib/python3.4/site-packages/motor/__init__.py", line 138, in callback
    child_gr.switch(result)
greenlet.error: cannot switch to a different thread

改为单线程执行, 一次批量1000条 : 
[root@localhost ~]# cat t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

print(time.time())

class n_t(threading.Thread):   #The timer class is derived from the class threading.Thread
  def __init__(self, num):
    threading.Thread.__init__(self)
    self.thread_num = num

  def run(self): #Overwrite run() method, put what you want the thread do here
    start_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(start_t))

    all = 1000000
    pice = 1000
    for i in range(1,int(all/pice)+1):
      if i == 1:
        s=0
        e=pice
      else:
        s=pice*(i-1)
        e=pice*i
      collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
      tornado.ioloop.IOLoop.instance().start()

    stop_t = time.time()
    print("TID:" + str(self.thread_num) + " " + str(stop_t))
    print(stop_t-start_t)
    tornado.ioloop.IOLoop.current().run_sync(do_count)

def test():
  t_names = dict()
  for i in range(0,1):
    t_names[i] = n_t(i) 
    t_names[i].start()
  return

if __name__ == '__main__':
  test()

插入耗时38秒 : 
[root@localhost ~]# python t.py
0 documents in collection
1423138288.7702804
TID:0 1423138288.7712061
TID:0 1423138326.7712593
38.00005316734314
1000000 documents in collection

以上相比PostgreSQL 使用pgbench 单步提交测试16秒还有差距. 

开16个一起测, 220秒插入1600W.
不删除collection: 注释
#db.drop_collection('test_collection')
mongo 127.0.0.1:5281/test_database
use test_database
db.test_collection.drop()

# vi t.py
import motor
import tornado
import time
import threading

c = motor.MotorClient('/tmp/mongodb-5281.sock', max_pool_size=1000)
db = c.test_database
#db.drop_collection('test_collection')
collection = db.test_collection

@motor.gen.coroutine
def do_count():
  n = yield collection.count()
  print(str(n) + ' documents in collection')

tornado.ioloop.IOLoop.current().run_sync(do_count)

def my_callback(result, error):
  tornado.ioloop.IOLoop.instance().stop()

start_t = time.time()
print(str(start_t))

all = 1000000
pice = 1000
for i in range(1,int(all/pice)+1):
  if i == 1:
    s=0
    e=pice
  else:
    s=pice*(i-1)
    e=pice*i
  collection.insert(({'id':i, 'username': 'digoal.zhou', 'age':32, 'email':'digoal@126.com', 'qq':'276732431'} for i in range(s,e)), callback=my_callback)
  tornado.ioloop.IOLoop.instance().start()

stop_t = time.time()
print(str(stop_t))
print(stop_t-start_t)
tornado.ioloop.IOLoop.current().run_sync(do_count)

# vi test.sh
for ((m=1;m<17;m++))
do
  python /root/t.py &
done


测试结果 : 
[root@localhost ~]# . ./test.sh
[root@localhost ~]# 0 documents in collection
1423138761.355559
TID:0 1423138761.3678153
0 documents in collection
0 documents in collection
1423138761.4109175
1423138761.4113982
TID:0 1423138761.4338183
TID:0 1423138761.4408538
1255 documents in collection
1127 documents in collection
1423138761.4774377
1423138761.4775264
2383 documents in collection
1423138761.4832406
TID:0 1423138761.483632
TID:0 1423138761.4897928
TID:0 1423138761.5128014
6506 documents in collection
1423138761.5683684
8037 documents in collection
1423138761.58344
TID:0 1423138761.58417
7550 documents in collection
1423138761.5864558
8932 documents in collection
1423138761.5868194
TID:0 1423138761.5982714
TID:0 1423138761.5984921
TID:0 1423138761.613552
11407 documents in collection
1423138761.620963
TID:0 1423138761.623182
15278 documents in collection
1423138761.6682127
TID:0 1423138761.6716254
17677 documents in collection
17677 documents in collection
17677 documents in collection
1423138761.713706
1423138761.7137444
17677 documents in collection
1423138761.7138374
1423138761.7139668
TID:0 1423138761.7142007
TID:0 1423138761.714226
TID:0 1423138761.714233
TID:0 1423138761.714348
TID:0 1423138980.550259
218.92707705497742
15900545 documents in collection
TID:0 1423138980.9974186
219.56360030174255
15932365 documents in collection
TID:0 1423138981.2562747
219.54192662239075
15951346 documents in collection
TID:0 1423138981.4110239
219.81253170967102
15961596 documents in collection
TID:0 1423138981.4463546
219.96272253990173
15964366 documents in collection
TID:0 1423138981.4915378
219.90736770629883
15967637 documents in collection
TID:0 1423138981.540123
220.17230772972107
15970868 documents in collection
TID:0 1423138981.6574204
220.05914902687073
15979686 documents in collection
TID:0 1423138981.681502
220.00987672805786
15981430 documents in collection
TID:0 1423138981.7540042
220.03977823257446
15985686 documents in collection
TID:0 1423138981.792348
220.3514940738678
15988377 documents in collection
TID:0 1423138981.8279915
220.315190076828
15991149 documents in collection
TID:0 1423138981.8720205
220.38222765922546
15994000 documents in collection
TID:0 1423138981.874455
220.26090288162231
15994000 documents in collection
TID:0 1423138981.9668252
220.25259232521057
15998000 documents in collection
TID:0 1423138982.0496628
220.33546209335327
16000000 documents in collection

[1]   Done                    /usr/local/bin/python3 /root/t.py
[2]   Done                    /usr/local/bin/python3 /root/t.py
[3]   Done                    /usr/local/bin/python3 /root/t.py
[4]   Done                    /usr/local/bin/python3 /root/t.py
[5]   Done                    /usr/local/bin/python3 /root/t.py
[6]   Done                    /usr/local/bin/python3 /root/t.py
[7]   Done                    /usr/local/bin/python3 /root/t.py
[8]   Done                    /usr/local/bin/python3 /root/t.py
[9]   Done                    /usr/local/bin/python3 /root/t.py
[10]   Done                    /usr/local/bin/python3 /root/t.py
[11]   Done                    /usr/local/bin/python3 /root/t.py
[12]   Done                    /usr/local/bin/python3 /root/t.py
[13]   Done                    /usr/local/bin/python3 /root/t.py
[14]   Done                    /usr/local/bin/python3 /root/t.py
[15]-  Done                    /usr/local/bin/python3 /root/t.py
[16]+  Done                    /usr/local/bin/python3 /root/t.py


将PostgreSQL改为批量提交测试结果42秒插入1600W .
postgres@localhost-> pgbench -M prepared -n -r -f ./test.sql -c 16 -j 4 -t 50000
transaction type: Custom query
scaling factor: 1
query mode: prepared
number of clients: 16
number of threads: 4
number of transactions per client: 50000
number of transactions actually processed: 800000/800000
tps = 18835.837413 (including connections establishing)
tps = 18840.926458 (excluding connections establishing)
statement latencies in milliseconds:
        0.842201        insert into tt values(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431'),(1, 'digoal.zhou',32,'digoal@126.com','276732431');
postgres@localhost-> psql
psql (9.3.5)
Type "help" for help.

postgres=# select count(*) from tt;
  count   
----------
 16000000
(1 row)
postgres=# select 16000000/20/18835.837413;
      ?column?       
---------------------
 42.4722289993786537
(1 row)



[参考]
目录
相关文章
|
NoSQL MongoDB Python
【Python】已完美解决(MongoDB安装报错)Service ‘MongoDB Server (MongoDB)’ (MongoDB) failed tostart
【Python】已完美解决(MongoDB安装报错)Service ‘MongoDB Server (MongoDB)’ (MongoDB) failed tostart
1016 1
|
缓存 自然语言处理 数据库
构建高效Python Web应用:异步编程与Tornado框架
【5月更文挑战第30天】在追求高性能Web应用开发的时代,异步编程已成为提升响应速度和处理并发请求的关键手段。本文将深入探讨Python世界中的异步编程技术,特别是Tornado框架如何利用非阻塞I/O和事件循环机制来优化Web服务的性能。我们将剖析Tornado的核心组件,并通过实例演示如何构建一个高效的Web服务。
|
存储 NoSQL MongoDB
MongoDB数据库转换为表格文件的Python实现
MongoDB数据库转换为表格文件的Python实现
410 0
|
API 开发者 Python
探索Python中的异步编程:Asyncio与Tornado的对决
在这个快节奏的世界里,Python开发者面临着一个挑战:如何让代码跑得更快?本文将带你走进Python异步编程的两大阵营——Asyncio和Tornado,探讨它们如何帮助我们提升性能,以及在实际应用中如何选择。我们将通过一场虚拟的“对决”,比较这两个框架的性能和易用性,让你在异步编程的战场上做出明智的选择。
|
NoSQL MongoDB 数据库
python3操作MongoDB的crud以及聚合案例,代码可直接运行(python经典编程案例)
这篇文章提供了使用Python操作MongoDB数据库进行CRUD(创建、读取、更新、删除)操作的详细代码示例,以及如何执行聚合查询的案例。
311 6
|
数据处理 开发者 Python
浅析Python中的异步编程:从asyncio到Tornado
Python的异步编程是提升应用性能的关键。本文从Python的异步编程概念入手,探讨了asyncio库的使用及其在实际开发中的应用,并分析了Tornado框架的异步模型,以及如何将异步思维运用于实际项目中。
|
NoSQL JavaScript Java
Java Python访问MongoDB
Java Python访问MongoDB
105 4
|
NoSQL 安全 MongoDB
用python安装mongodb
用python安装mongodb
143 0
|
API 数据库 开发者
Python连接Neo4j工具比较 Neo4j Driver、py2neo
Python连接Neo4j工具比较 Neo4j Driver、py2neo
523 0
|
NoSQL Shell MongoDB
【Python】已解决:(MongoDB安装报错)‘mongo’ 不是内部或外部命令,也不是可运行的程序
【Python】已解决:(MongoDB安装报错)‘mongo’ 不是内部或外部命令,也不是可运行的程序
1419 0