Selenium
学习参考 :http://selenium-python.readthedocs.io
自动化测试工具。它支持各种浏览器,包括 Chrome,Safari,Firefox 等主流界面式浏览器,如果在这些浏览器里面安装一个 Selenium 的插件,那么便可以方便地实现Web界面的测试。PhantomJS也是一个浏览器,Selenium 也支持,二者便可以实现无缝对接
工作过程:WebDriver将等待页面完全加载(即onload事件已触发),然后将控制权返回给用户的测试或脚本。值得注意的是,如果用户的页面在加载时使用了大量AJAX,那么WebDriver可能不知道它什么时候完全加载了。如果用户需要确保这些页面被完全加载,那么用户可以使用等待。
它简单实用,但运行较慢
一、安装(不用系统安装方法稍有不同,本例使用Mac的OS)
pip3 install selenium #安装模块
wget https://chromedriver.storage.googleapis.com/index.html?path=2.33/ #下载驱动
unzip chromedriver.zip
cp chromedriver /usr/local/bin/
>>> import sys python3 >>> import sys >>> sys.path.append("/usr/local/bin") >>> print(sys.path) #注意: 要保证/usr/local/bin 在 sys.path中 ['', '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python36.zip', '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6', '/usr/local/Cellar/python3/3.6.2/Frameworks/Python.framework/Versions/3.6/lib/python3.6/lib-dynload', '/usr/local/lib/python3.6/site-packages', '/usr/local/bin'] >>> from selenium import webdriver >>> driver = webdriver.Chrome() >>>
二、selenium locate elements
There are various strategies to locate elements in a page. You can use the most appropriate one for your case. Selenium provides the following methods to locate elements in a page:
find_element_by_id
find_element_by_name
find_element_by_xpath
find_element_by_link_text
find_element_by_partial_link_text
find_element_by_tag_name
find_element_by_class_name
find_element_by_css_selector
To find multiple elements (these methods will return a list):
find_elements_by_name
find_elements_by_xpath
find_elements_by_link_text
find_elements_by_partial_link_text
find_elements_by_tag_name
find_elements_by_class_name
find_elements_by_css_selector
(1) 官方举例:
<html> <body> <h1>Welcome</h1> <form id="loginForm"> <input name="username" type="text" /> <input name="password" type="password" /> <input name="continue" type="submit" value="Login" /> <input name="continue" type="button" value="Clear" /> </form> <p class="content">Site content goes here.</p> </body> <html>
login_form = driver.find_element_by_id('loginForm')
username = driver.find_element_by_name('username')
password = driver.find_element_by_name('password')
login_form = driver.find_element_by_xpath("/html/body/form[1]")
login_form = driver.find_element_by_xpath("//form[1]")
login_form = driver.find_element_by_xpath("//form[@id='loginForm']")
username = driver.find_element_by_xpath("//form[input/@name='username']")
username = driver.find_element_by_xpath("//form[@id='loginForm']/input[1]")
username = driver.find_element_by_xpath("//input[@name='username']")
clear_button = driver.find_element_by_xpath("//input[@name='continue'][@type='button']")
clear_button = driver.find_element_by_xpath("//form[@id='loginForm']/input[4]")
heading1 = driver.find_element_by_tag_name('h1')
content = driver.find_element_by_class_name('content')
content = driver.find_element_by_css_selector('p.content')
(2) 抓取"一起火"服务批发城的相关内容
import time from selenium import webdriver browser=webdriver.Chrome() url='http://www.17huo.com/search.html?sq=2&keyword=%E5%86%AC%E6%9C%8D' browser.get(url) page_info=browser.find_element_by_css_selector('body > div.wrap > div.pagem.product_list_pager > div') pages=int(page_info.text.split(', ')[0].split(' ')[1]) #print(pages) #总页数 for i in range(pages): url='http://www.17huo.com/?mod=search&sq=2&keyword=%E5%86%AC%E6%9C%8D&page='+str(i+1) browser.get(url) browser.execute_script("window.scrollBy(0,document.body.scrollHeight);") time.sleep(3) try: goods = browser.find_element_by_css_selector('body > div.wrap > div:nth-child(2) > div.p_main > ul').find_elements_by_tag_name('li') #print(len(goods)) #每页中产品项目数 for good in goods: title = good.find_element_by_css_selector('a:nth-child(1) > p:nth-child(2)').text prod_id = good.find_element_by_css_selector('a:nth-child(1) > p:nth-child(3) > span').text price = good.find_element_by_css_selector('span.txt_nowprice').text print(title,";",price,";",prod_id) except: print("exception")
上例中先使用 browser.find_element_by_css_selector('...ul').find_elements_by_tag_name('li')获取各li元素,然后再使用find_element_by_css_selector('span.txt_nowprice').text 获取各项的值
(3) 方法二(与上例效果相同)
说明:You can also look for a link by its text, but be careful! The text must be an exact match! You should also be careful when using XPATH in WebDriver. If there’s more than one element that matches the query, then only the first will be returned. If nothing can be found, a NoSuchElementException will be raised.
也可以通过它的文本查找链接,但是要小心!文本必须是完全匹配的!在WebDriver中使用XPATH时也要小心。如果有多个匹配查询的元素,那么只返回第一个元素。如果找不到任何东西,就会提出一个NoSuchElementException。
import time from selenium import webdriver browser=webdriver.Chrome() url='http://www.17huo.com/search.html?sq=2&keyword=%E5%86%AC%E6%9C%8D' browser.get(url) page_info=browser.find_element_by_xpath('//div[@class="page_count"]') pages=int(page_info.text.split(', ')[0].split(' ')[1]) print(pages) for i in range(pages): url='http://www.17huo.com/?mod=search&sq=2&keyword=%E5%86%AC%E6%9C%8D&page='+str(i+1) if i>=2 : break browser.get(url) browser.execute_script("window.scrollBy(0,document.body.scrollHeight);") time.sleep(3) try: product_info=browser.find_elements_by_xpath('//div[@class="p_main"]/ul[@class="item"]/li') for product in product_info: title=product.find_element_by_xpath('a/p[2]').text prod_id=product.find_element_by_xpath('a/p[3]/span').text price=product.find_element_by_class_name('txt_nowprice').text print(title,";",price,";",prod_id) except: print("exception")
上例中直接使用find_elements_by_xpath获取所有的li元素,然后再使用find_element_by_xpath('a/p[2]').text 及find_element_by_class_name('txt_nowprice').text 获取各项的具体值
三、selenium interaction
(1) 例在python.org官网搜索资料
1
2
3
4
5
6
7
8
9
10
11
12
|
from
selenium
import
webdriver
from
selenium.webdriver.common.keys
import
Keys
driver
=
webdriver.Chrome()
driver.get(
"http://www.python.org"
)
assert
"Python"
in
driver.title
elem
=
driver.find_element_by_name(
"q"
)
#定位到搜索框
elem.send_keys(
"pycon"
)
#在搜索框中输入pycon关键字
elem.send_keys(Keys.RETURN)
#开始搜索
#print(driver.page_source)
for
result
in
driver.find_elements_by_xpath(
'//ul[@class="list-recent-events menu"]/li'
):
print
(result.find_element_by_xpath(
'h3/a'
).text)
#输出搜索结果中的文件及链接
print
(result.find_element_by_xpath(
'h3/a'
).get_attribute(
'href'
))
|
说明:
a.通过定位密码框,enter(回车)来代替登陆按钮
driver.find_element_by_id("user_pwd").send_keys(Keys.ENTER)
b.ctrl+a 全选输入框内容
driver.find_element_by_id("kw").send_keys(Keys.CONTROL,'a')
c.刷新页面
driver.refresh()#刷新页面重新加载(相当于F5)
d.下拉页面(相当于滑动下拉框到底部)
driver.find_element_by_id("kw").send_keys(Keys.PageDown)
e.通过键盘和鼠标操作实现将指定的图片鼠标右键下载保存图片操作
使用selenium模拟鼠标和键盘操作:将鼠标放置图像上方,点击鼠标右键,然后键盘按V就可以保存了,核心代码如下:
from selenium.webdriver.common.action_chains import ActionChains from selenium.webdriver.common.keys import Keys
action = ActionChains(driver).move_to_element(element) #移动到该元素
action.context_click(element) #右键点击该元素
action.send_keys(Keys.ARROW_DOWN) #点击键盘向下箭头
action.send_keys('v') #键盘输入V保存图
action.perform() #执行保存
(2)交互式登录豆瓣
1
2
3
4
5
6
7
8
9
10
11
12
|
from
selenium
import
webdriver
from
selenium.webdriver.common.keys
import
Keys
browser
=
webdriver.Chrome()
browser.get(
"https://www.douban.com/login"
)
assert
"豆瓣"
in
browser.title
element_username
=
browser.find_element_by_name(
'form_email'
)
element_username.send_keys(
'******@***.com'
)
#输入用户名
element_userpwd
=
browser.find_element_by_name(
'form_password'
)
element_userpwd.send_keys(
'**********'
)
#输入密码
browser.find_element_by_name(
'login'
).click()
#点击登录按钮
account
=
browser.find_element_by_xpath(
'//li[@class="nav-user-account"]/a/span[1]'
).text
print
(account)
#测试输出登录成功后的账户名
|
(3)登录京东并选择相关商品
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
from
selenium
import
webdriver
from
selenium.webdriver.common.keys
import
Keys
browser
=
webdriver.Chrome()
browser.get(
"https://passport.jd.com/uc/login?ltype=login"
)
assert
"京东"
in
browser.title
browser.find_element_by_xpath(
'//div[@class="login-tab login-tab-r"]/a'
).click()
element_username
=
browser.find_element_by_id(
'loginname'
)
element_username.send_keys(
'****@***.com'
)
element_userpwd
=
browser.find_element_by_id(
'nloginpwd'
)
element_userpwd.send_keys(
'****'
)
browser.find_element_by_id(
'loginsubmit'
).click()
print
(browser.get_cookies())
#获取cookies(本例中并没使用到cookies)
browser.implicitly_wait(
10
)
#等待页面下载
print
(browser.find_element_by_class_name(
'box_tit'
).text)
browser.find_element_by_class_name(
'box_subtit_arrow'
).click()
browser.implicitly_wait(
15
)
windows
=
browser.window_handles
#页面切换
browser.switch_to_window(windows[
1
])
print
(browser.find_element_by_xpath(
'//div[@class="grid_c1"]/ul/li[4]/a/h4'
).text)
|
注意上例中的页面等待及页面切换
以是仅为自己的学习笔记,整理分享也是为了后期自己使用,高手指点且勿喷