开发者社区> 问答> 正文

在导航到下一页并使用网站上的硒抓取所有数据时遇到错误?

嗨,我在这里尝试从https://www.naukri.com/抓取所有老师的工作 我想要所有的页面数据,但是我只得到一个页面数据,并收到此错误

Traceback (most recent call last):
  File "naukri.py", line 48, in <module>
    driver.execute_script("arguments.click();", next_page)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 636, in execute_script
    'args': converted_args})['value']
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 321, in execute
    self.error_handler.check_response(response)
  File "/home/nyros/Documents/mypython/lib/python3.6/site-packages/selenium/webdriver/remote/errorhandler.py", line 242, in check_response
    raise exception_class(message, screen, stacktrace)
selenium.common.exceptions.JavascriptException: Message: javascript error: arguments.click is not a function
  (Session info: chrome=80.0.3987.116)

我写的代码是:

import selenium.webdriver

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

url ='https://www.naukri.com/'
driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")


driver.get(url)
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, '#qsbClick > span.blueBtn'))).click()
driver.find_element_by_xpath('//\*@id="skill"]/div[1]/div[2]/input').send_keys("teacher")
driver.find_element_by_xpath('//\*@id="qsbFormBtn"]').click()

data = WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
result = WebDriverWait(data, 10).until(
    EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
for r in result:
    data = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    result = WebDriverWait(data, 10).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for r in result:
        try:
            title=r.find_element_by_class_name("desig").text
            print('title:',title)
            school=r.find_element_by_class_name("org").text
            print('school:',school)
            location=r.find_element_by_class_name("loc").text
            print("location:",location)
            salary=r.find_element_by_class_name("salary").text
            print("salary:",salary)
        except:
            pass
            print('-------')
    next_page = r.find_elements_by_xpath("/html/body/div[5]/div/div[3]/div[1]/div[59]/a/button")
    driver.execute_script("arguments.click();", next_page)

请任何人帮助我,谢谢!

问题来源:stackoverflow

展开
收起
is大龙 2020-03-24 22:27:53 704 0
1 条回答
写回答
取消 提交回答
  • 由于“下一个”按钮的元素索引从第一页的59变为下一页的60,因此,您可以在页面上找到所有具有类“ grayBtn”的元素,然后单击索引[-1]返回的列表中,因为它将始终提供下一个按钮。我也删除了代码中不必要的部分,例如重复导入以及不必要的按钮单击。我立即转到包含教师成绩列表的页面,而不是在主页的搜索字段中输入“老师”。我只剩下以下内容:

    from selenium import webdriver
    import time
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import re    
    
    Category = input("Category?")
    Category = re.sub(" ", "%20", Category)
    Type = re.sub(" ", "-", Category.lower())
    
    url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
    driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
    driver.get(url)
    
    data = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    result = WebDriverWait(data, 10).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    for res in result:
        data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
        jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
        for job in jobs:
            try:
                title=job.find_element_by_class_name("desig").text
                print('title:',title)
                school=job.find_element_by_class_name("org").text
                print('school:',school)
                location=job.find_element_by_class_name("loc").text
                print("location:",location)
                salary=job.find_element_by_class_name("salary").text
                print("salary:",salary)
            except:
                pass
                print('-------')
        Button = driver.find_elements_by_class_name("grayBtn")[-1]
        time.sleep(1)
        driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
        Button.click()
    

    根据要求,以下是修改后的代码,可将数据追加到pandas数据框并将该数据框转换为excel:

    from selenium import webdriver
    import time
    from selenium.webdriver.common.keys import Keys
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    import re
    import pandas as pd
    
    df = pd.DataFrame(columns = ['Title', 'School', 'Location', 'Salary'])
    
    Category = input("Category?")
    Category = re.sub(" ", "%20", Category)
    Type = re.sub(" ", "-", Category.lower())
    
    url ='https://www.naukri.com/' + Type + '-jobs?k=' + Category
    driver = webdriver.Chrome(r"mypython/bin/chromedriver_linux64/chromedriver")
    driver.get(url)
    
    data = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
    result = WebDriverWait(data, 10).until(
        EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
    i = 0
    for res in result:
        data = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CLASS_NAME, "srp_container.fl")))
        jobs = WebDriverWait(data, 10).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "row")))
        for job in jobs:
            try:
                title=job.find_element_by_class_name("desig").text
                print('title:',title)
                school=job.find_element_by_class_name("org").text
                print('school:',school)
                location=job.find_element_by_class_name("loc").text
                print("location:",location)
                salary=job.find_element_by_class_name("salary").text
                print("salary:",salary)
                df.loc[i] = [title, school, location, salary]
                i += 1
            except:
                pass
                print('-------')
        Button = driver.find_elements_by_class_name("grayBtn")[-1]
        time.sleep(1)
        driver.execute_script("window.scrollTo(0,document.body.scrollHeight - 1300)")
        Button.click()
    df.to_excel("all_results.xlsx")
    

    回答来源:stackoverflow

    2020-03-24 22:28:03
    赞同 展开评论 打赏
问答分类:
问答地址:
问答排行榜
最热
最新

相关电子书

更多
手机京东Crash自动分析处理系统 立即下载
低代码开发师(初级)实战教程 立即下载
阿里巴巴DevOps 最佳实践手册 立即下载