开源项目 requests 的 stars 为啥比 python 还多3.7k?

简介:

highlight: a11y-dark

结合上一篇文章《一次算法读图超时引起的urllib3源码分析》,我们学习了 urllib3 的基本语法、常见姿势和请求管理模式,以及PoolManagerHTTPConnectionPoolHTTPConnection等模块部分源码。对于学习 Python 的小伙伴来说,urllib3 强大的功能几乎能实现所有 HTTP 请求场景,但这就足够了吗?

接下来我们做个验证,通过 POST 发送请求并将请求结果转 JSON 存储的小例子,如下:

urllib3发送POST请求

  import json
  import urllib3
  # 1 创建连接
  http = urllib3.PoolManager()
  # 2 编码参数
  from urllib.parse import urlencode
  encoded_args = urlencode({'arg': 'value'})
  # 3 请求
  url = 'http://httpbin.org/post?' + encoded_args
  r = http.request('POST', url)
  # 4 解码结果
  decode_data = r.data.decode('utf-8')
  # 5 转JSON
  data = json.loads(decode_data)['args']
  print(data)
  
  # 输出
  {'arg': 'value'}

发送一个 POST 请求,我们总共经历了五个步骤,是不是有些太麻烦了😂

本着程序员的极简原则,能用轮子解决的问题为啥还非得自己手撸代码,我们看看 requests 都如何操作的,如下:

requests发送POST请求

  import requests
  # 1 发送请求
  r = requests.post('https://httpbin.org/post', data={'key': 'value'})
  # 2 获取结果
  data = r.json()
  print(data['form'])
  
  # 输出
  {'key': 'value'}

可见,通过 requests 发送 POST 请求,只需要简单的两个步骤即可,请求-接收模式也更加符合我们日常语言的交流习惯,这也许就是 requests 成为当今下载量最大的 Python 包之一的原因吧!

Requests is one of the most downloaded Python packages today, pulling in around 30M downloads / week— according to GitHub, Requests is currently depended upon by 1,000,000+ repositories. You may certainly put your trust in this code.

https://github.com/psf/requests#requests

接下来这篇文章,包含了对requests的基础应用、超时机制、请求流程的学习,辅以流程图和部分源码的分析帮助理解。篇幅较短,预计阅读时间 15 分钟,如果对您有帮助,还望不吝评价,求点赞、求评论、求转发

开始之前,我们先简单聊聊 urlliburllib2urllib3requests的区别。

  • urllib 和 urllib2 都是 Python 代码模块,用作 URL 请求相关的工作,提供不同的功能

    • urllib2 可以接受一个 Request 对象来设置 URL 请求头,urllib 只接受一个 URL
    • urllib 提供了 urlencode/unquote 方法,用于生成 GET 查询字符串,urllib2 没有类似功能,所以 urllib 和 urllib2 经常一起使用的原因
  • urllib3 是一个第三方 URL 库,提供了许多 Python 标准库中缺少的关键特性:线程安全、连接池、SSL/TLS验证、重试请求和HTTP重定向等等
  • requests 封装了urllib3

    • 使之更简洁易用。

requests的设计初衷就是基于简单,下面引用作者的一段话:为人类而建,可见作者的良苦用心。当然,本篇文章也是为各位大佬而写,记得给号主点个赞👍啊~!

A simple, yet elegant, HTTP library.
Requests is an elegant and simple HTTP library for Python, built for human beings.

4912a416e5ca47b481b4ec0004f29c7b~tplv-k3u1fbpfcp-zoom-1.image

requests 常用的两种姿势

一、直接使用

>>> import requests
>>> r = requests.get('https://api.github.com/events')
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]

二、通过 Session 使用

>>> import requests
>>> s = requests.Session()
>>> r = s.get("https://api.github.com/events")
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf-8'
>>> r.encoding
'utf-8'
>>> r.text
'[{"id":"20714286674","type":"PushEven... '
>>> r.json()
[{'id': '20714286674', 'type': 'PushEvent'..}]

第一种属于基本使用,满足日常大部分请求场景,第二种requests.Session对象允许跨请求持久化某些参数、持久化 Cookie 和使用 urllib3 的连接池。因此,在向同一主机发送多个请求的场景,底层 TCP 连接将被重用,这可能显著提升请求性能。

requests 架构其实很简单

整个架构包括两部分:Session持久化参数和HTTPAdapter适配器连接请求,其余部分都是 urllib3 的内容。

读到这里,号主分享的内容基本就结束了,后续的部分涉及源码,如果没兴趣可以直接跳到文章末尾。

requests 源码不过如此

源码版本:v2.26.0
源码路径: https://github.com/psf/requests/tree/v2.26.0

requests包除了常见的GETPOSTDELETEPUT之外,还有timeout参数功能也非常好用,可以防止请求阻塞太长时间,具体如下:

>>> import requests
>>> requests.get("https://api.github.com/events", timeout=1)
<Response [200]>
>>> requests.get("https://api.github.com/events", timeout=0.00001)
Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
socket.timeout: timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/connection.py", line 146, in _new_conn
    (self.host, self.timeout))
urllib3.exceptions.ConnectTimeoutError: (<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  ...
  File "/opt/anaconda3/envs/python37/lib/python3.7/site-packages/requests/adapters.py", line 496, in send
    raise ConnectTimeout(e, request=request)
requests.exceptions.ConnectTimeout: HTTPSConnectionPool(host='api.github.com', port=443): Max retries exceeded with url: /events (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x7f98ba78e110>, 'Connection to api.github.com timed out. (connect timeout=1e-05)'))

由上,我们逆序追踪超时报错流程为:-> requests/adapters.py(ConnectTimeoutError(<urllib3.connection.HTTPConnection) -> urllib3.util/retry.py(ConnectTimeoutError) -> urllib3/connection.py(ConnectTimeoutError) -> urllib3/util/connection.py(socket.timeout: timed out)

接下来,我们逆序追踪超时异常流程代码,由于requests包发出的HTTP请求是基于urllib3包进行开发,Timeout机制也是直接沿用urllib3的超时逻辑进行处理,如下:

# 入口 
# https://github.com/psf/requests/blob/v2.26.0/requests/api.py#L16

def request(method, url, **kwargs):
    """Constructs and sends a :class:`Request <Request>`.

    :param method: method for the new :class:`Request` object: ``GET``, ``OPTIONS``, ``HEAD``, ``POST``, ``PUT``, ``PATCH``, or ``DELETE``.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary, list of tuples or bytes to send
        in the query string for the :class:`Request`.
    
    with sessions.Session() as session:
        return session.request(method=method, url=url, **kwargs)

请求被发送至session.request

# https://github.com/psf/requests/blob/v2.26.0/requests/sessions.py#L470

def request(self, method, url,
        params=None, data=None, headers=None, cookies=None, files=None,
        auth=None, timeout=None, allow_redirects=True, proxies=None,
        hooks=None, stream=None, verify=None, cert=None, json=None):
    """Constructs a :class:`Request <Request>`, prepares it and sends it.
    Returns :class:`Response <Response>` object.

    :param method: method for the new :class:`Request` object.
    :param url: URL for the new :class:`Request` object.
    :param params: (optional) Dictionary or bytes to be sent in the query
        string for the :class:`Request`.
    ...
    :param timeout: (optional) How long to wait for the server to send
        data before giving up, as a float, or a :ref:`(connect timeout,
        read timeout) <timeouts>` tuple.
    :type timeout: float or tuple
    ...
    :rtype: requests.Response
    """
    # Create the Request.
    req = Request(
        method=method.upper(),
        url=url,
        headers=headers,
        files=files,
        data=data or {},
        json=json,
        params=params or {},
        auth=auth,
        cookies=cookies,
        hooks=hooks,
    )
    prep = self.prepare_request(req)

    proxies = proxies or {}

    settings = self.merge_environment_settings(
        prep.url, proxies, stream, verify, cert
    )

    # Send the request.
    send_kwargs = {
        'timeout': timeout,
        'allow_redirects': allow_redirects,
    }
    send_kwargs.update(settings)
    resp = self.send(prep, **send_kwargs)

    return resp

从架构图可知,request请求通过调用HTTPAdapter.send请求处理,具体如下

# https://github.com/psf/requests/blob/v2.26.0/requests/adapters.py#L394

def send(self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None):
    """Sends PreparedRequest object. Returns Response object.

    :param request: The :class:`PreparedRequest <PreparedRequest>` being sent.
    :param stream: (optional) Whether to stream the request content.
    :param timeout: (optional) How long to wait for the server to send
        data before giving up, as a float, or a :ref:`(connect timeout,
        read timeout) <timeouts>` tuple.
    :type timeout: float or tuple or urllib3 Timeout object
    ...
    :rtype: requests.Response
    """

    try:
        conn = self.get_connection(request.url, proxies)
    except LocationValueError as e:
        raise InvalidURL(e, request=request)
    ...
    # 超时配置包括 tuple 和 float 两种方式
    if isinstance(timeout, tuple):
        try:
            connect, read = timeout
            timeout = TimeoutSauce(connect=connect, read=read)
        except ValueError as e:
            # this may raise a string formatting error.
            err = ("Invalid timeout {}. Pass a (connect, read) "
                    "timeout tuple, or a single float to set "
                    "both timeouts to the same value".format(timeout))
            raise ValueError(err)
    elif isinstance(timeout, TimeoutSauce):
        pass
    else:
        timeout = TimeoutSauce(connect=timeout, read=timeout)

    try:
        if not chunked:
            resp = conn.urlopen(
                method=request.method,
                url=url,
                body=request.body,
                headers=request.headers,
                redirect=False,
                assert_same_host=False,
                preload_content=False,
                decode_content=False,
                retries=self.max_retries,
                timeout=timeout
            )

        # Send the request.
        else:
            ...

    except (ProtocolError, socket.error) as err:
        raise ConnectionError(err, request=request)

    except MaxRetryError as e:
        if isinstance(e.reason, ConnectTimeoutError):
            # TODO: Remove this in 3.0.0: see #2811
            if not isinstance(e.reason, NewConnectionError):
                raise ConnectTimeout(e, request=request)
        ...
        raise ConnectionError(e, request=request)
    ...
    except (_SSLError, _HTTPError) as e:
        if isinstance(e, _SSLError):
            # This branch is for urllib3 versions earlier than v1.22
            raise SSLError(e, request=request)
        elif isinstance(e, ReadTimeoutError):
            raise ReadTimeout(e, request=request)
        elif isinstance(e, _InvalidHeader):
            raise InvalidHeader(e, request=request)
        else:
            raise
    return self.build_response(request, resp)

总结一下



通过对比Pythonurllib3requests三个开源项目的SponsorWatchForkStar指标,requests 的 stars 竟然比 python 还多3.7k?

一个好的成功的开源项目要么技术壁垒足够强大,如果设计足够巧妙,即使技术没那么复杂也能弯道超车。

参考文档

❤️❤️❤️读者每一份热爱都是笔者前进的动力!
我是三十一,感谢各位朋友:求点赞、求评论、求转发,大家下期见!

相关文章
|
2月前
|
数据采集 前端开发 算法
Python Requests 的高级使用技巧:应对复杂 HTTP 请求场景
本文介绍了如何使用 Python 的 `requests` 库应对复杂的 HTTP 请求场景,包括 Spider Trap(蜘蛛陷阱)、SESSION 访问限制和请求频率限制。通过代理、CSS 类链接数控制、多账号切换和限流算法等技术手段,提高爬虫的稳定性和效率,增强在反爬虫环境中的生存能力。文中提供了详细的代码示例,帮助读者掌握这些高级用法。
Python Requests 的高级使用技巧:应对复杂 HTTP 请求场景
|
2月前
|
网络协议 数据库连接 Python
python知识点100篇系列(17)-替换requests的python库httpx
【10月更文挑战第4天】Requests 是基于 Python 开发的 HTTP 库,使用简单,功能强大。然而,随着 Python 3.6 的发布,出现了 Requests 的替代品 —— httpx。httpx 继承了 Requests 的所有特性,并增加了对异步请求的支持,支持 HTTP/1.1 和 HTTP/2,能够发送同步和异步请求,适用于 WSGI 和 ASGI 应用。安装使用 httpx 需要 Python 3.6 及以上版本,异步请求则需要 Python 3.8 及以上。httpx 提供了 Client 和 AsyncClient,分别用于优化同步和异步请求的性能。
python知识点100篇系列(17)-替换requests的python库httpx
|
27天前
|
数据采集 JSON 测试技术
Python爬虫神器requests库的使用
在现代编程中,网络请求是必不可少的部分。本文详细介绍 Python 的 requests 库,一个功能强大且易用的 HTTP 请求库。内容涵盖安装、基本功能(如发送 GET 和 POST 请求、设置请求头、处理响应)、高级功能(如会话管理和文件上传)以及实际应用场景。通过本文,你将全面掌握 requests 库的使用方法。🚀🌟
46 7
|
2月前
|
存储 网络协议 API
详解Python中的Requests会话管理
详解Python中的Requests会话管理
|
3月前
|
JSON API 数据格式
30天拿下Python之requests模块
30天拿下Python之requests模块
41 7
|
3月前
|
API Python
使用Python requests库下载文件并设置超时重试机制
使用Python的 `requests`库下载文件时,设置超时参数和实现超时重试机制是确保下载稳定性的有效方法。通过这种方式,可以在面对网络波动或服务器响应延迟的情况下,提高下载任务的成功率。
172 1
|
2月前
|
监控 安全 中间件
Python requests 如何避免被 Gzip 炸弹攻击
Python requests 如何避免被 Gzip 炸弹攻击
38 0
|
2月前
|
Python 容器
AutoDL Python实现 自动续签 防止实例过期释放 小脚本 定时任务 apscheduler requests
AutoDL Python实现 自动续签 防止实例过期释放 小脚本 定时任务 apscheduler requests
35 0
|
2月前
|
JSON API 开发者
深入解析Python网络编程与Web开发:urllib、requests和http模块的功能、用法及在构建现代网络应用中的关键作用
深入解析Python网络编程与Web开发:urllib、requests和http模块的功能、用法及在构建现代网络应用中的关键作用
18 0
|
3月前
|
UED Python
Python requests库下载文件时展示进度条的实现方法
以上就是使用Python `requests`库下载文件时展示进度条的一种实现方法,它不仅简洁易懂,而且在实际应用中非常实用。
102 0