开发者社区> 问答> 正文

在python中运行并行请求会话

我正在尝试打开多个Web会话并将数据保存为CSV,使用for循环和requests.get选项编写了我的代码,但是访问90个Web位置需要很长时间。任何人都可以让我知道整个过程如何并行运行loc_var:

代码工作正常,只有问题是为loc_var逐个运行,并花了很长时间。

想要并行访问所有for循环loc_var URL并写入CSV操作

以下是代码:

import pandas as pd
import numpy as np
import os
import requests
import datetime
import zipfile
t=datetime.date.today()-datetime.timedelta(2)
server = [("A","web1",":5000","username=usr&password=p7Tdfr")]
'''List of all web_ips'''
web_1 = ["Web1","Web2","Web3","Web4","Web5","Web6","Web7","Web8","Web9","Web10","Web11","Web12","Web13","Web14","Web15"]
'''List of All location'''
loc_var =["post1","post2","post3","post4","post5","post6","post7","post8","post9","post10","post11","post12","post13","post14","post15","post16","post17","post18"]

for s,web,port,usr in server:

login_url='http://'+web+port+'/api/v1/system/login/?'+usr
print (login_url)
s= requests.session()
login_response = s.post(login_url)
print("login Responce",login_response)
#Start access the Web for Loc_variable
for mkt in loc_var:
    #output is CSV File
    com_actions_url='http://'+web+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
    print("com_action_url",com_actions_url)
    r = s.get(com_actions_url)
    print("action",r)
    if r.ok == True:            
        with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
            f.write(r.content)  

    # If loc is not aceesble try with another Web_1 List
    if r.ok == False:
        while r.ok == False:
            for web_2 in web_1:
                login_url='http://'+web_2+port+'/api/v1/system/login/?'+usr
                com_actions_url='http://'+web_2+port+'/api/v1/3E+date(%5C%22'+str(t)+'%5C%22)and+location+%3D%3D+%27'+mkt+'%27%22&page_size=-1&format=%22csv%22'
                login_response = s.post(login_url)
                print("login Responce",login_response)
                print("com_action_url",com_actions_url)
                r = s.get(com_actions_url)
                if r.ok == True:            
                    with open(os.path.join("/home/Reports_DC/", "relation_%s.csv"%mkt),'wb') as f:
                        f.write(r.content)  
                    break

展开
收起
一码平川MACHEL 2019-02-28 14:10:40 3404 0
2 条回答
写回答
取消 提交回答
  • python多线程没有达到真正的并发,只适合,io密集型的,cpu密集型不适合.

    建义用gevent第三方协程.

    2019-11-18 18:07:10
    赞同 展开评论 打赏
  • 您可以采用多种方法来进行并发HTTP请求。我通常使用的主要两个是使用多个线程concurrent.futures.ThreadPoolExecutor或使用异步发送请求asyncio/aiohttp。

    要使用线程池并行发送请求,您首先要生成一个要并行获取的URL列表(在您的情况下生成一个login_urls和的列表com_action_urls),然后您将同时请求所有URL,如下所示:

    from concurrent.futures import ThreadPoolExecutor
    import requests

    def fetch(url):

    page = requests.get(url)
    return page.text
    

    pool = ThreadPoolExecutor(max_workers=5)

    urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.evergreen.edu'] # Create a list of urls

    for page in pool.map(fetch, urls):

    # Do whatever you want with the results ...
    print(page[0:100])

    使用asyncio / aiohttp通常比上面的线程方法更快,但学习曲线非常复杂。这是一个简单的例子(Python 3.7+):

    import asyncio
    import aiohttp

    urls = ['http://www.google.com', 'http://www.yahoo.com', 'http://www.evergreen.edu']

    async def fetch(session, url):

    async with session.get(url) as resp:
        return await resp.text()
    

    async def fetch_concurrent(urls):

    loop = asyncio.get_event_loop()
    async with aiohttp.ClientSession() as session:
        tasks = []
        for u in urls:
            tasks.append(loop.create_task(fetch(session, u)))
    
        for result in asyncio.as_completed(tasks):
            page = await result
            #Do whatever you want with results
            print(page[0:100])
    

    asyncio.run(fetch_concurrent(urls))

    2019-07-17 23:29:43
    赞同 展开评论 打赏
问答排行榜
最热
最新

相关电子书

更多
From Python Scikit-Learn to Sc 立即下载
Data Pre-Processing in Python: 立即下载
双剑合璧-Python和大数据计算平台的结合 立即下载