开发者社区> 问答> 正文

在scrapy框架中如何设置代理?

在scrapy框架中如何设置代理?

展开
收起
珍宝珠 2019-11-22 13:56:31 2048 0
1 条回答
写回答
取消 提交回答
  • 方式一:内置添加代理功能
    # -*- coding: utf-8 -*-
    import os
    import scrapy
    from scrapy.http import Request
    
    class ChoutiSpider(scrapy.Spider):
        name = 'chouti'
        allowed_domains = ['chouti.com']
        start_urls = ['https://dig.chouti.com/']
    
        def start_requests(self):
            os.environ['HTTP_PROXY'] = "http://192.168.11.11"
    
            for url in self.start_urls:
                yield Request(url=url,callback=self.parse)
    
        def parse(self, response):
            print(response)
    
    方式二:自定义下载中间件
    import random
    import base64
    import six
    def to_bytes(text, encoding=None, errors='strict'):
        """Return the binary representation of `text`. If `text`
        is already a bytes object, return it as-is."""
        if isinstance(text, bytes):
            return text
        if not isinstance(text, six.string_types):
            raise TypeError('to_bytes must receive a unicode, str or bytes '
                            'object, got %s' % type(text).__name__)
        if encoding is None:
            encoding = 'utf-8'
        return text.encode(encoding, errors)
        
    class MyProxyDownloaderMiddleware(object):
        def process_request(self, request, spider):
            proxy_list = [
                {'ip_port': '111.11.228.75:80', 'user_pass': 'xxx:123'},
                {'ip_port': '120.198.243.22:80', 'user_pass': ''},
                {'ip_port': '111.8.60.9:8123', 'user_pass': ''},
                {'ip_port': '101.71.27.120:80', 'user_pass': ''},
                {'ip_port': '122.96.59.104:80', 'user_pass': ''},
                {'ip_port': '122.224.249.122:8088', 'user_pass': ''},
            ]
            proxy = random.choice(proxy_list)
            if proxy['user_pass'] is not None:
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
                encoded_user_pass = base64.encodestring(to_bytes(proxy['user_pass']))
                request.headers['Proxy-Authorization'] = to_bytes('Basic ' + encoded_user_pass)
            else:
                request.meta['proxy'] = to_bytes("http://%s" % proxy['ip_port'])
    
    
    
    配置:
        DOWNLOADER_MIDDLEWARES = {
           # 'xiaohan.middlewares.MyProxyDownloaderMiddleware': 543,
        }
    
    2019-11-22 13:56:53
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
低代码开发师(初级)实战教程 立即下载
冬季实战营第三期:MySQL数据库进阶实战 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载