文字点选验证码的破解方法~

2022-06-17 1721

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

简介： 大家好，我是志斌~志斌之前一直在写反爬虫系列的文章，但是因为自身水平有限，所以一直没更验证码反爬虫之文字点选验证码反爬虫的解决方式，这次专门为大家找了一个大佬——张老师，来跟大家分享一下他解决文字点选验证码的方法~

基本思路

获取图像
对图像进行二值化处理
识别图中轮廓
识别轮廓文字
构建待选文字图像
比较识别文字与待选文字图像
返回点选结果

本文仅提供基本思路，具体应用需根据各类型点选验证码自行修改。

实现过程

0.导入依赖

import base64
import json
import cv2
import numpy as np
import requests
from PIL import Image, ImageDraw, ImageFont

获取图像

第一步是获取图像，有一些验证码图像是以二进制形式返回的，本文测试的图像是以base64编码字符串的形式返回，因此需要对其进行解码。

def getData(capsession: requests.Session):
    resp = s.post("验证码获取url")
    return resp.json()["repData"]
def getImageFromBase64(b64):
    buffer = base64.b64decode(b64)
    nparr = np.frombuffer(buffer, np.uint8)
    image = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
    return image

2.对图像进行二值化处理

为了方便识别文字轮廓，我们对图像进行二值化处理。

def normalizeImage(img):
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    _, img = cv2.threshold(img, 1, 255, cv2.THRESH_BINARY)
    img = cv2.bitwise_not(img)
    return img

3.识别图中轮廓

在识别图中轮廓时，为了提高效率和准确度，我们对轮廓按长宽比和面积进行筛选，尽可能保证轮廓能够满足文字识别需要。

def findContour(img):
    contours, _ = cv2.findContours(img, cv2.RETR_EXTERNAL,
                                   cv2.CHAIN_APPROX_SIMPLE)
    def find_if_close(cnt1, cnt2):
        row1, row2 = cnt1.shape[0], cnt2.shape[0]
        for i in range(row1):
            for j in range(row2):
                dist = np.linalg.norm(cnt1[i] - cnt2[j])
                if abs(dist) < 5:
                    returnTrue
                elif i == row1 - 1and j == row2 - 1:
                    returnFalse
    LENGTH = len(contours)
    status = np.zeros((LENGTH, 1))
    for i, cnt1 in enumerate(contours):
        x = i
        if i != LENGTH - 1:
            for j, cnt2 in enumerate(contours[i + 1:]):
                x = x + 1
                dist = find_if_close(cnt1, cnt2)
                if dist == True:
                    val = min(status[i], status[x])
                    status[x] = status[i] = val
                else:
                    if status[x] == status[i]:
                        status[x] = i + 1
    unified = []
    maximum = int(status.max()) + 1
    for i in range(maximum):
        pos = np.where(status == i)[0]
        if pos.size != 0:
            cont = np.vstack([contours[i] for i in pos])
            hull = cv2.convexHull(cont)
            unified.append(hull)
    cnt = list(filter(aspectRatio,unified))
    cnt.sort(key=cv2.contourArea,reverse=True)
    return cnt[:4]
def aspectRatio(cnt):
    _,_,w,h = cv2.boundingRect(cnt)
    return (0.6<float(w)/h<1.7) and (cv2.contourArea(cnt)>200.0)

4.识别轮廓文字

我们对图中文字轮廓进行识别，返回文字轮廓与相应的坐标位置。

def extractCharContour(img, contour):
    mult = 1.2
    ret = []
    point = []
    for cnt in contour:
        rect = cv2.minAreaRect(cnt)
        box = cv2.boxPoints(rect)
        box = np.int0(box)
        W = rect[1][0]
        H = rect[1][1]
        Xs = [i[0] for i in box]
        Ys = [i[1] for i in box]
        x1 = min(Xs)
        x2 = max(Xs)
        y1 = min(Ys)
        y2 = max(Ys)
        rotated = False
        angle = rect[2]
        if angle < -45:
            angle += 90
            rotated = True
        center = (int((x1 + x2) / 2), int((y1 + y2) / 2))
        size = (int(mult * (x2 - x1)), int(mult * (y2 - y1)))
        try:
            M = cv2.getRotationMatrix2D((size[0] / 2, size[1] / 2), angle,
                                        1.0)
            cropped = cv2.getRectSubPix(img, size, center)
            cropped = cv2.warpAffine(cropped, M, size)
            croppedW = W ifnot rotated else H
            croppedH = H ifnot rotated else W
            croppedRotated = cv2.getRectSubPix(
                    cropped,
                    (int(croppedW * mult), int(croppedH * mult)),
                    (size[0] / 2, size[1] / 2),
            )
            im = cv2.resize(croppedRotated, (20, 20))
            kernel = np.array([[0, -1, 0], [-1, 5, -1], [0, -1, 0]],
                                  np.float32)
            im = cv2.filter2D(im, -1, kernel=kernel)
            ret.append(im)
            point.append((rect[0][0], rect[0][1]))
        except:
            pass
    return ret, point

5.构建待选文字图像

将获取的待选文字转化为图像。

def genCharacter(ch, size):
    img = Image.new("L", size, 0)
    font = ImageFont.truetype("simsun.ttc", min(size))
    draw = ImageDraw.Draw(img)
    draw.text((0, 0), ch, font=font, fill=255)
    return np.asarray(img)

6.比较识别文字与待选文字图像

我们将识别文字轮廓与待选文字图像进行比较，获得对应位置，并根据验证码要求依次添加点选坐标。

def compareCharImage(words, chars, point, word_list) 
    scores = []
    for i, word in enumerate(words):
        for j, char in enumerate(chars):
            scores.append(((i, j), cv2.bitwise_xor(char, word).sum()))
    scores.sort(key=lambda x: x[1])
    word_set = set()
    char_set = set()
    answers = {}
    for score in scores:
        if (score [0][0] notin word_set) and (score [0][1] notin char_set):
            continue
        word_set.add(score[0][0])
        char_set.add(score[0][1])
        answers[word_list[score[0][0]]] = point[score[0][1]]
    return [{
        "x": int(answers[word][0]),
        "y": int(answers[word][1])
    } for word in word_list]

7.返回点选结果

最后，将得到的点选结果返回给服务器进行验证。

def checkCaptcha(captchaSession: requests.Session, data, point):
    enc = encrypt(json.dumps(point).replace(" ", ""), data["secretKey"])
    resp = captchaSession.post(
        "验证码提交url",
        json={
            "token": data["token"],
            "pointJson": enc,
        },
    )
    return resp.json()

文字点选验证码的破解方法~

基本思路

实现过程

热门文章

最新文章

相关电子书

热门

活动广场

任务中心

开发者评测

高校计划

乘风者计划

训练营

阿里云MVP

话题

直播

下载

镜像站

技术资料

插件

文字点选验证码的破解方法~

基本思路

实现过程

热门文章

最新文章

相关电子书