1. Val脚本使用

在训练阶段每个batch训练结束后，都会调用一次val脚本，进行一次模型的验证。

# batch训练结束后val一次
        if RANK in [-1, 0]:
            # mAP
            callbacks.run('on_train_epoch_end', epoch=epoch)
            ema.update_attr(model, include=['yaml', 'nc', 'hyp', 'names', 'stride', 'class_weights'])
            final_epoch = (epoch + 1 == epochs) or stopper.possible_stop
            if not noval or final_epoch:  # Calculate mAP
                results, maps, _ = val.run(data_dict,
                                           batch_size=batch_size // WORLD_SIZE * 2,
                                           imgsz=imgsz,
                                           model=ema.ema,
                                           single_cls=single_cls,
                                           dataloader=val_loader,
                                           save_dir=save_dir,
                                           plots=False,
                                           callbacks=callbacks,
                                           compute_loss=compute_loss)

而当整个模型训练结束是，同样再会调用一次这个val脚本。

# 所有批次训练结束后再val一次
    if RANK in [-1, 0]:
        LOGGER.info(f'\n{epoch - start_epoch + 1} epochs completed in {(time.time() - t0) / 3600:.3f} hours.')
        for f in last, best:
            if f.exists():
                strip_optimizer(f)  # strip optimizers
                if f is best:
                    LOGGER.info(f'\nValidating {f}...')
                    results, _, _ = val.run(data_dict,
                                            batch_size=batch_size // WORLD_SIZE * 2,
                                            imgsz=imgsz,
                                            model=attempt_load(f, device).half(),
                                            iou_thres=0.65 if is_coco else 0.60,  # best pycocotools results at 0.65
                                            single_cls=single_cls,
                                            dataloader=val_loader,
                                            save_dir=save_dir,
                                            save_json=is_coco,
                                            verbose=True,
                                            plots=True,
                                            callbacks=callbacks,
                                            compute_loss=compute_loss)  # val best model with plots
        callbacks.run('on_train_end', last, best, plots, epoch)
        LOGGER.info(f"Results saved to {colorstr('bold', save_dir)}")

如果是自己想要进行验证，在parser中设置好data和训练好的模型权重weights就可以对模型进行验证。这里的data是一个yaml文件，和训练时配置的yaml文件的一样的，假设这里我训练的是一个口罩的数据集，yaml文件参考如下：

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ./dataset/mask
train: # train images (relative to 'path')  16551 images
  - images/train
val: # val images (relative to 'path')  4952 images
  - images/val
# Classes
nc: 3  # number of classes
names: ['with_mask', 'without_mask', 'mask_weared_incorrect']

进行yaml路径和weights路径的配置，就可以直接进行验证了：

parser = argparse.ArgumentParser()
parser.add_argument('--data', type=str, default='./dataset/mask/mask.yaml', help='dataset.yaml path')
parser.add_argument('--weights', nargs='+', type=str, default='./runs/train/mask/weights/best.pt', help='model.pt path(s)')
....

结果输出：

2. Val脚本解析

2.1 主体部分

同样，val脚本的主体也是一个run函数。这里需要区分训练时候直接调用还是自己单独的进行验证。同时还有很多细节的部分，不过代码的解析基本都在代码里了，这里很大部分参考了参考资料1，2的解析。

大致实现思路：

加载模型 + 加载数据集
对每批次图像进行推理，并进行非极大值抑制处理获取每张图像的一个预测矩阵
对每张图像的全部预测框进行处理，进行gt的唯一匹配。对于gt匹配的预测框计算在每一个iou阈值下是否满足条件，构建成一个评价矩阵correct
将所有图像预测框的这些评价矩阵，以及每个预测框的置信度和预测类别，还有gt的类别保存下来进行后续操作
根据以上保存的训练，获取最大f1时每个类别的查准率，查全率，f1，以及每个类别在10个iou阈值下的map，这个就是最后所需要的信息
绘制相关图像 + 打印相关信息

主体代码：

@torch.no_grad()
def run(data,
        weights=None,  # model.pt path(s)
        batch_size=32,  # batch size
        imgsz=640,  # inference size (pixels)
        conf_thres=0.001,  # confidence threshold
        iou_thres=0.6,  # NMS IoU threshold
        task='val',  # train, val, test, speed or study
        device='0',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        single_cls=False,  # treat as single-class dataset
        augment=False,  # augmented inference
        verbose=False,  # verbose output
        save_txt=False,  # save results to *.txt
        save_hybrid=False,  # save label+prediction hybrid results to *.txt
        save_conf=False,  # save confidences in --save-txt labels
        save_json=True,  # save a COCO-JSON results file
        project=ROOT / 'runs/val',  # save to project/name
        name='exp',  # save to project/name
        exist_ok=False,  # existing project/name ok, do not increment
        half=True,  # use FP16 half-precision inference
        model=None,
        dataloader=None,
        save_dir=Path(''),
        plots=True,
        callbacks=Callbacks(),
        compute_loss=None,
        ):
    # Initialize/load model and set device
    training = model is not None
    # 如果当前执行的是train.py脚本，则只需要获取使用的设备
    if training:  # called by train.py
        device = next(model.parameters()).device  # get model device
    # 如果是执行val.py脚本
    else:  # called directly
        device = select_device(device, batch_size=batch_size)
        # Directories
        # 生成save_dir文件路径  run\test\expn
        save_dir = increment_path(Path(project) / name, exist_ok=exist_ok)  # increment run
        # make dir run\test\expn\labels
        (save_dir / 'labels' if save_txt else save_dir).mkdir(parents=True, exist_ok=True)  # make dir
        # Load model
        # 加载模型 load FP32 model  只在运行test.py才需要自己加载model
        check_suffix(weights, '.pt')
        model = attempt_load(weights, map_location=device)  # load FP32 model
        # gs: 模型最大的下采样stride 一般[8, 16, 32] 所有gs一般是32
        gs = max(int(model.stride.max()), 32)  # grid size (max stride)
        # 检测输入图片的分辨率imgsz是否能被gs整除 只在运行test.py才需要自己生成check imgsz
        imgsz = check_img_size(imgsz, s=gs)  # check image size
        # Multi-GPU disabled, incompatible with .half() https://github.com/ultralytics/yolov5/issues/99
        # if device.type != 'cpu' and torch.cuda.device_count() > 1:
        #     model = nn.DataParallel(model)
        # Data
        # 一旦使用half, 不但模型需要设为half, 输入模型的图片也需要设为half
        data = check_dataset(data)  # check
    # Half
    half &= device.type != 'cpu'  # half precision only supported on CUDA
    model.half() if half else model.float()
    # Configure
    model.eval()
    # 测试数据是否是coco数据集
    is_coco = isinstance(data.get('val'), str) and data['val'].endswith('coco/val2017.txt')  # COCO dataset
    nc = 1 if single_cls else int(data['nc'])  # number of classes
    # 计算mAP相关参数
    # 设置iou阈值 从0.5-0.95取10个(0.05间隔)   iou vector for mAP@0.5:0.95
    # iouv: [0.50000, 0.55000, 0.60000, 0.65000, 0.70000, 0.75000, 0.80000, 0.85000, 0.90000, 0.95000]
    iouv = torch.linspace(0.5, 0.95, 10).to(device)  # iou vector for mAP@0.5:0.95
    # mAP@0.5:0.95 iou个数=10个
    niou = iouv.numel()
    # Dataloader
    # 如果不是训练(执行val.py脚本调用run函数)就调用create_dataloader生成dataloader
    # 如果是训练(执行train.py调用run函数)就不需要生成dataloader 可以直接从参数中传过来testloader
    if not training:
        if device.type != 'cpu':
            # 这里创建一个全零数组测试下前向传播是否能够正常运行
            model(torch.zeros(1, 3, imgsz, imgsz).to(device).type_as(next(model.parameters())))  # run once
        pad = 0.0 if task == 'speed' else 0.5
        task = task if task in ('train', 'val', 'test') else 'val'  # path to train/val/test images
        # 创建dataloader 这里的rect默认为True 矩形推理用于测试集 在不影响mAP的情况下可以大大提升推理速度
        # 默认没有设置shuffle，也就是按顺序来进行验证，没有打乱数据集
        dataloader = create_dataloader(data[task], imgsz, batch_size, gs, single_cls, pad=pad, rect=True,
                                       prefix=colorstr(f'{task}: '))[0]
    # 初始化测试的图片的数量
    seen = 0
    # 初始化混淆矩阵
    confusion_matrix = ConfusionMatrix(nc=nc)
    # 获取数据集所有类别的类名
    names = {k: v for k, v in enumerate(model.names if hasattr(model, 'names') else model.module.names)}
    class_map = coco80_to_coco91_class() if is_coco else list(range(1000))
    # 设置tqdm进度条的显示信息
    s = ('%20s' + '%11s' * 6) % ('Class', 'Images', 'Labels', 'P', 'R', 'mAP@.5', 'mAP@.5:.95')
    # 初始化p, r, f1, mp, mr, map50, map指标和时间t0, t1, t2
    dt, p, r, f1, mp, mr, map50, map = [0.0, 0.0, 0.0], 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0
    # 初始化测试集的损失
    loss = torch.zeros(3, device=device)
    # 初始化json文件中的字典 统计信息 ap等
    jdict, stats, ap, ap_class = [], [], [], []
    # 开始进行批次验证操作
    for batch_i, (img, targets, paths, shapes) in enumerate(tqdm(dataloader, desc=s)):
        t1 = time_sync()
        # 如果half为True 就把图片变为half精度  uint8 to fp16/32
        img = img.to(device, non_blocking=True)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        targets = targets.to(device)
        nb, _, height, width = img.shape  # batch size, channels, height, width
        t2 = time_sync()
        dt[0] += t2 - t1
        # Run model： augment为True时开启TTA验证
        # out:       推理结果 1个 [bs, anchor_num*grid_w*grid_h, xywh+c+20classes] = [32, 19200+4800+1200, 25]
        # train_out: 训练结果 3个 [bs, anchor_num, grid_w, grid_h, xywh+c+20classes]
        #                    如: [32, 3, 80, 80, 25] [32, 3, 40, 40, 25] [32, 3, 20, 20, 25]
        out, train_out = model(img, augment=augment)  # inference and training outputs
        dt[1] += time_sync() - t2
        # Compute loss
        # compute_loss不为空 说明正在执行train.py  根据传入的compute_loss计算损失值
        if compute_loss:
            loss += compute_loss([x.float() for x in train_out], targets)[1]  # box, obj, cls
        # Run NMS
        # 将真实框target的xywh(因为target是在labelimg中做了归一化的)映射到img(test)尺寸
        targets[:, 2:] *= torch.Tensor([width, height, width, height]).to(device)  # to pixels
        # 如果save_hybrid为True，获取当前target中每一类的对象存储在列表中, 默认为False
        # targets: [num_target, img_index+class_index+xywh] = [31, 6]
        # lb: {list: bs} 第一张图片的target[17, 5] 第二张[1, 5] 第三张[7, 5] 第四张[6, 5]
        lb = [targets[targets[:, 0] == i, 1:] for i in range(nb)] if save_hybrid else []  # for autolabelling
        t3 = time_sync()
        # out: list{bs}  [300, 6] [42, 6] [300, 6] [300, 6]  [:, image_index+class+xywh]
        # 每张图像都有一个预测矩阵，包含所有的预测对象
        out = non_max_suppression(out, conf_thres, iou_thres, labels=lb, multi_label=True, agnostic=single_cls)
        dt[2] += time_sync() - t3
        # Statistics per image 迭代依次处理每一张图像，直至完成整个batch的信息获取再进行下一个batch的处理
        for si, pred in enumerate(out):
            # 获取第si张图片的gt标签信息 包括class, x, y, w, h
            # target[:, 0]为标签属于哪张图片的编号
            labels = targets[targets[:, 0] == si, 1:]   # [:, class + xywh]
            nl = len(labels)                            # 第si张图片的gt个数
            tcls = labels[:, 0].tolist() if nl else []  # target class
            # 获取第si张图片的地址 和 第si张图片的尺寸
            path, shape = Path(paths[si]), shapes[si][0]
            seen += 1
            # 如果当前图像预测为空，则添加空的信息到stats里，提前退出
            if len(pred) == 0:
                if nl:
                    stats.append((torch.zeros(0, niou, dtype=torch.bool), torch.Tensor(), torch.Tensor(), tcls))
                continue
            # Predictions
            if single_cls:
                pred[:, 5] = 0
            predn = pred.clone()
            # 将预测坐标映射到原图img0中，也就是img[si].shape[1:] 缩放到 shape中，最后一个参数是pad信息，可以设置为False
            scale_coords(img[si].shape[1:], predn[:, :4], shape, shapes[si][1])  # native-space pred
            # Evaluate
            if nl:
                # 获得xyxy格式的框
                tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
                # 将gt bix映射到原图img的尺寸
                scale_coords(img[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
                # 处理完gt的尺寸信息，重新构建成 (cls, xyxy)的格式
                labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
                # 对当前的预测框与gt进行一一匹配，并且在预测框的对应位置上获取iou的评分信息，其余没有匹配上的预测框设置为False
                correct = process_batch(predn, labelsn, iouv)
                if plots:
                    confusion_matrix.process_batch(predn, labelsn)
            else:
                correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool)
            # 将每张图片的预测结果统计到stats中
            # Append statistics(correct, conf, pcls, tcls)   bs个(correct, conf, pcls, tcls)
            # correct: [pred_num, 10] bool 当前图片每一个预测框在每一个iou条件下是否是TP
            # pred[:, 4]: [pred_num, 1] 当前图片每一个预测框的conf
            # pred[:, 5]: [pred_num, 1] 当前图片每一个预测框的类别
            # tcls: [gt_num, 1] 当前图片所有gt框的class
            stats.append((correct.cpu(), pred[:, 4].cpu(), pred[:, 5].cpu(), tcls))  # (correct, conf, pcls, tcls)
            # Save/log
            if save_txt:    # 保存预测信息到txt文件  runs\test\exp7\labels\image_name.txt
                save_one_txt(predn, save_conf, shape, file=save_dir / 'labels' / (path.stem + '.txt'))
            if save_json:   # 将预测信息保存到coco格式的json字典(后面存入json文件)
                save_one_json(predn, jdict, path, class_map)  # append to COCO-JSON dictionary
            callbacks.run('on_val_image_end', pred, predn, path, names, img[si])
        # Plot images
        # 画出前三个batch的图片的ground truth和预测框predictions(两个图)一起保存
        if plots and batch_i < 3:
            # 保存gt图像
            f = save_dir / f'val_batch{batch_i}_labels.jpg'  # labels
            # Thread  表示在单独的控制线程中运行的活动 创建一个单线程(子线程)来执行函数 由这个子进程全权负责这个函数
            # target: 执行的函数  args: 传入的函数参数  daemon: 当主线程结束后, 由他创建的子线程Thread也已经自动结束了
            # .start(): 启动线程  当thread一启动的时候, 就会运行我们自己定义的这个函数plot_images
            # 如果在plot_images里面打开断点调试, 可以发现子线程暂停, 但是主线程还是在正常的训练(还是正常的跑)
            Thread(target=plot_images, args=(img, targets, paths, f, names), daemon=True).start()
            # 保存预测框图像
            f = save_dir / f'val_batch{batch_i}_pred.jpg'  # predictions
            Thread(target=plot_images, args=(img, output_to_target(out), paths, f, names), daemon=True).start()
    # Compute statistics：
    # stats中有多少个tuple就表示验证了多少整图像, 这里将stats有4个部分分别拼接在一起
    # stats(concat后): list{4} correct, conf, pcls, tcls  统计出的整个数据集的GT
    # correct [img_sum, 10] 整个数据集所有图片中所有预测框在每一个iou条件下是否是TP  [5087, 10]
    # conf [img_sum] 整个数据集所有图片中所有预测框的conf  [5087]
    # pcls [img_sum] 整个数据集所有图片中所有预测框的类别   [5087]
    # tcls [gt_sum] 整个数据集所有图片所有gt框的class     [754]
    stats = [np.concatenate(x, 0) for x in zip(*stats)]  # to numpy
    # stats[0].any(): stats[0]是否全部为False, 是则返回 False, 如果有一个为 True, 则返回 True
    # 当stats[0]全部为False是，表示当前的所有预测框均没有达到最低的0.5的iou阈值范围
    if len(stats) and stats[0].any():
        # 根据上面的统计预测结果计算p, r, ap, f1, ap_class（ap_per_class函数是计算每个类的mAP等指标的）等指标
        # p: [nc] 最大平均f1时每个类别的precision
        # r: [nc] 最大平均f1时每个类别的recall
        # ap: [71, 10] 数据集每个类别在10个iou阈值下的mAP
        # f1 [nc] 最大平均f1时每个类别的f1
        # ap_class: [nc] 返回数据集中所有的类别index
        p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
        # ap50: [nc] 所有类别的mAP@0.5   ap: [nc] 所有类别的mAP@0.5:0.95
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        # mp: [1] 所有类别的平均precision(最大f1时)
        # mr: [1] 所有类别的平均recall(最大f1时)
        # map50: [1] 所有类别的平均mAP@0.5
        # map: [1] 所有类别的平均mAP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        # nt: 统计出整个数据集的gt框中数据集各个类别的个数
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
    else:
        nt = torch.zeros(1)
    # Print results
    pf = '%20s' + '%11i' * 2 + '%11.3g' * 4  # print format
    print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))
    # Print results per class
    if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):
        for i, c in enumerate(ap_class):
            print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))
    # Print speeds
    t = tuple(x / seen * 1E3 for x in dt)  # speeds per image
    if not training:
        shape = (batch_size, 3, imgsz, imgsz)
        print(f'Speed: %.1fms pre-process, %.1fms inference, %.1fms NMS per image at shape {shape}' % t)
    # Plots
    if plots:
        confusion_matrix.plot(save_dir=save_dir, names=list(names.values()))
        callbacks.run('on_val_end')
    # Save JSON
    if save_json and len(jdict):
        w = Path(weights[0] if isinstance(weights, list) else weights).stem if weights is not None else ''  # weights
        anno_json = str(Path(data.get('path', '../coco')) / 'annotations/instances_val2017.json')  # annotations json
        pred_json = str(save_dir / f"{w}_predictions.json")  # predictions json
        print(f'\nEvaluating pycocotools mAP... saving {pred_json}...')
        with open(pred_json, 'w') as f:
            json.dump(jdict, f, indent=4, ensure_ascii=False)
        try:  # https://github.com/cocodataset/cocoapi/blob/master/PythonAPI/pycocoEvalDemo.ipynb
            check_requirements(['pycocotools'])
            from pycocotools.coco import COCO
            from pycocotools.cocoeval import COCOeval
            anno = COCO(anno_json)  # init annotations api
            pred = anno.loadRes(pred_json)  # init predictions api
            eval = COCOeval(anno, pred, 'bbox')
            if is_coco:
                eval.params.imgIds = [int(Path(x).stem) for x in dataloader.dataset.img_files]  # image IDs to evaluate
            eval.evaluate()
            eval.accumulate()
            eval.summarize()
            map, map50 = eval.stats[:2]  # update results (mAP@0.5:0.95, mAP@0.5)
            print(eval.stats)
        except Exception as e:
            print(f'pycocotools unable to run: {e}')
    # Return results
    model.float()  # for training
    if not training:
        s = f"\n{len(list(save_dir.glob('labels/*.txt')))} labels saved to {save_dir / 'labels'}" if save_txt else ''
        print(f"Results saved to {colorstr('bold', save_dir)}{s}")
    maps = np.zeros(nc) + map
    for i, c in enumerate(ap_class):
        maps[c] = ap[i]
    return (mp, mr, map50, map, *(loss.cpu() / len(dataloader)).tolist()), maps, t

2.2 指标计算部分

评价指标主要需要认识了解目标检测的一个评价方法，主要需要对目标检测的评价指标有一个深刻的理解，详细见：目标检测中的评估指标：PR曲线、AP、mAP

获取匹配预测框的iou信息

对于每张图像的预测框，需要筛选出能与gt匹配的框来进行相关的iou计算，设置了iou从0.5-0.95的10个梯度，如果匹配的预测框iou大于相对于的阈值，则在对应位置设置为True，否者设置为False；而对于没有匹配上的预测框全部设置为False。

为什么要筛选？这是因为一个gt只可能是一个类别，不可能是多个类别，所以需要取置信度最高的类别进行匹配。但是此时还可能多个gt和一个预测框匹配，同样的，为这个预测框分配iou值最高的gt，依次来实现一一配对。

# 这个函数是重点
# 作用1：对预测框与gt进行一一匹配
# 作用2：对匹配上的预测框进行iou数值判断，用Ture来填充，其余没有匹配上的预测框的所以行数全部设置为False
def process_batch(detections, labels, iouv):
    """
    Return correct predictions matrix. Both sets of boxes are in (x1, y1, x2, y2) format.
    Arguments:
        detections (Array[N, 6]), x1, y1, x2, y2, conf, class
        labels (Array[M, 5]), class, x1, y1, x2, y2
    Returns:
        correct (Array[N, 10]), for 10 IoU levels
    """
    # 构建一个[pred_nums, 10]全为False的矩阵
    correct = torch.zeros(detections.shape[0], iouv.shape[0], dtype=torch.bool, device=iouv.device)
    # 计算每个gt与每个pred的iou，shape为: [gt_nums, pred_nums]
    iou = box_iou(labels[:, 1:], detections[:, :4])
    # 首先iou >= iouv[0]：挑选出iou>0.5的所有预测框，进行筛选,shape为: [gt_nums, pred_nums]
    # 同时labels[:, 0:1] == detections[:, 5]：构建出一个预测类别与真实标签是否相同的矩阵表, shape为: [gt_nums, pred_nums]
    # 只有同时符合以上两点条件才被赋值为True，此时返回当前矩阵的一个行列索引，x是两个元祖x1,x2
    # 点(x[0][i], x[1][i])就是符合条件的预测框
    x = torch.where((iou >= iouv[0]) & (labels[:, 0:1] == detections[:, 5]))  # IoU above threshold and classes match
    # 如果存在符合条件的预测框
    if x[0].shape[0]:
        # 将符合条件的位置构建成一个新的矩阵，第一列是行索引（表示gt索引），第二列是列索引（表示预测框索引），第三列是iou值
        matches = torch.cat((torch.stack(x, 1), iou[x[0], x[1]][:, None]), 1).cpu().numpy()  # [label, detection, iou]
        if x[0].shape[0] > 1:
            # argsort获得有小到大排序的索引, [::-1]相当于取反reserve操作，变成由大到小排序的索引，对matches矩阵进行排序
            matches = matches[matches[:, 2].argsort()[::-1]]
            # 参数return_index=True：表示会返回唯一值的索引，[0]返回的是唯一值，[1]返回的是索引
            # matches[:, 1]：这里的是获取iou矩阵每个预测框的唯一值，返回的是最大唯一值的索引，因为前面已由大到小排序
            # 这个操作的含义：每个预测框最多只能出现一次，如果有一个预测框同时和多个gt匹配，只取其最大iou的一个
            matches = matches[np.unique(matches[:, 1], return_index=True)[1]]
            # matches = matches[matches[:, 2].argsort()[::-1]]
            # matches[:, 0]：这里的是获取iou矩阵gt的唯一值，返回的是最大唯一值的索引，因为前面已由大到小排序
            # 这个操作的含义: 每个gt也最多只能出现一次，如果一个gt同时匹配多个预测框，只取其匹配最大的那一个预测框
            matches = matches[np.unique(matches[:, 0], return_index=True)[1]]
        # 以上操作实现了为每一个gt分配一个iou最高的类别的预测框，实现一一对应
        matches = torch.Tensor(matches).to(iouv.device)
        # 当前获得了gt与预测框的一一对应，其对于的iou可以作为评价指标，构建一个评价矩阵
        # 需要注意，这里的matches[:, 1]表示的是为对应的预测框来赋予其iou所能达到的程度，也就是iouv的评价指标
        correct[matches[:, 1].long()] = matches[:, 2:3] >= iouv
    # 在correct中，只有与gt匹配的预测框才有对应的iou评价指标，其他大多数没有匹配的预测框都是全部为False
    return correct

调用的地方：

# Evaluate
            if nl:
                # 获得xyxy格式的框
                tbox = xywh2xyxy(labels[:, 1:5])  # target boxes
                # 将gt bix映射到原图img的尺寸
                scale_coords(img[si].shape[1:], tbox, shape, shapes[si][1])  # native-space labels
                # 处理完gt的尺寸信息，重新构建成 (cls, xyxy)的格式
                labelsn = torch.cat((labels[:, 0:1], tbox), 1)  # native-space labels
                # 对当前的预测框与gt进行一一匹配，并且在预测框的对应位置上获取iou的评分信息，其余没有匹配上的预测框设置为False
                correct = process_batch(predn, labelsn, iouv)
                if plots:
                    confusion_matrix.process_batch(predn, labelsn)
            else:
                correct = torch.zeros(pred.shape[0], niou, dtype=torch.bool)

获取mAP等指标信息

主要是更具所有图像预测框的这些评价矩阵，以及每个预测框的置信度和预测类别，还有gt的类别来获取mAP等信息计算。

调用部分：

# stats[0].any(): stats[0]是否全部为False, 是则返回 False, 如果有一个为 True, 则返回 True
    # 当stats[0]全部为False是，表示当前的所有预测框均没有达到最低的0.5的iou阈值范围
    if len(stats) and stats[0].any():
        # 根据上面的统计预测结果计算p, r, ap, f1, ap_class（ap_per_class函数是计算每个类的mAP等指标的）等指标
        # p: [nc] 最大平均f1时每个类别的precision
        # r: [nc] 最大平均f1时每个类别的recall
        # ap: [71, 10] 数据集每个类别在10个iou阈值下的mAP
        # f1 [nc] 最大平均f1时每个类别的f1
        # ap_class: [nc] 返回数据集中所有的类别index
        p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
        # ap50: [nc] 所有类别的mAP@0.5   ap: [nc] 所有类别的mAP@0.5:0.95
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        # mp: [1] 所有类别的平均precision(最大f1时)
        # mr: [1] 所有类别的平均recall(最大f1时)
        # map50: [1] 所有类别的平均mAP@0.5
        # map: [1] 所有类别的平均mAP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        # nt: 统计出整个数据集的gt框中数据集各个类别的个数
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class
    else:
        nt = torch.zeros(1)

map等指标具体的计算部分

# 计算得到所有的相关指标
def ap_per_class(tp, conf, pred_cls, target_cls, plot=False, save_dir='.', names=()):
    """ Compute the average precision, given the recall and precision curves.
    Source: https://github.com/rafaelpadilla/Object-Detection-Metrics.
    # Arguments
        tp:  True positives (nparray, nx1 or nx10).
        conf:  Objectness value from 0-1 (nparray).
        pred_cls:  Predicted object classes (nparray).
        target_cls:  True object classes (nparray).
        plot:  Plot precision-recall curve at mAP@0.5
        save_dir:  Plot save directory
    # Returns
        The average precision as computed in py-faster-rcnn.
    """
    # Sort by objectness
    i = np.argsort(-conf)   # 返回一个降序索引
    tp, conf, pred_cls = tp[i], conf[i], pred_cls[i]   # 得到重新排序后对应的 tp, conf, pre_cls
    # Find unique classes 对类别去重, 因为计算ap是对每类进行
    unique_classes = np.unique(target_cls)
    nc = unique_classes.shape[0]  # number of classes, number of detections
    # Create Precision-Recall curve and compute AP for each class
    px, py = np.linspace(0, 1, 1000), []  # for plotting
    ap, p, r = np.zeros((nc, tp.shape[1])), np.zeros((nc, 1000)), np.zeros((nc, 1000))
    # 对每一个类别进行遍历处理
    for ci, c in enumerate(unique_classes):
        # i: 记录着所有预测框是否是c类别框   是c类对应位置为True, 否则为False
        i = pred_cls == c
        # n_l: gt框中的c类别框数量
        n_l = (target_cls == c).sum()  # number of labels
        # n_p: 预测框中c类别的框数量
        n_p = i.sum()  # number of predictions
        # 如果没有预测到 或者 ground truth没有标注 则略过类别c
        if n_p == 0 or n_l == 0:
            continue
        else:
            # Accumulate FPs and TPs
            # tp[i] 可以根据i中的的True/False觉定是否删除这个数  所有tp中属于类c的预测框
            #       如: tp=[0,1,0,1] i=[True,False,False,True] b=tp[i]  => b=[0,1]
            # a.cumsum(0)  会按照对象进行累加操作
            # 一维按行累加如: a=[0,1,0,1]  b = a.cumsum(0) => b=[0,1,1,2]   而二维则按列累加
            # fpc: 类别为c 顺序按置信度排列 截至到每一个预测框的各个iou阈值下FP个数 最后一行表示c类在该iou阈值下所有FP数
            # tpc: 类别为c 顺序按置信度排列 截至到每一个预测框的各个iou阈值下TP个数 最后一行表示c类在该iou阈值下所有TP数
            fpc = (1 - tp[i]).cumsum(0)  # fp[i] = 1 - tp[i]
            tpc = tp[i].cumsum(0)
            # Recall
            # Recall=TP/(TP+FN)  加一个1e-16的目的是防止分母为0
            # n_l=TP+FN=num_gt: c类的gt个数=预测是c类而且预测正确+预测不是c类但是预测错误
            # recall: 类别为c 顺序按置信度排列 截至每一个预测框的各个iou阈值下的召回率
            recall = tpc / (n_l + 1e-16)  # recall curve
            # 返回所有类别, 横坐标为conf(值为px=[0, 1, 1000] 0~1 1000个点)对应的recall值  r=[nc, 1000]  每一行从小到大
            # 这里r的范围是[cls_nums, 1000]，这里是为了统一尺寸，利用插值限定了范围。每一列表示不同的iou阈值
            r[ci] = np.interp(-px, -conf[i], recall[:, 0], left=0)  # negative x, xp because xp decreases
            # Precision
            # Precision=TP/(TP+FP)
            # precision: 类别为c 顺序按置信度排列 截至每一个预测框的各个iou阈值下的精确率
            precision = tpc / (tpc + fpc)  # precision curve
            # 返回所有类别, 横坐标为conf(值为px=[0, 1, 1000] 0~1 1000个点)对应的precision值  p=[nc, 1000]
            # 这里p的范围同样是[cls_nums, 1000]，这里是为了统一尺寸，利用插值限定了范围。每一列表示不同的iou阈值
            p[ci] = np.interp(-px, -conf[i], precision[:, 0], left=1)  # p at pr_score
            # 这里的召回率与准确率本质上是根据iou阈值为0.5来进行计算的，因为线性插值的时候使用的是recall[:, 0]和precision[:, 0]
            # 插值后的r:[nc, 1000], p:[nc, 1000]
            # AP from recall-precision curve
            # 对c类别, 分别计算每一个iou阈值(0.5~0.95 10个)下的mAP
            for j in range(tp.shape[1]):
                # 这里执行10次计算ci这个类别在所有mAP阈值下的平均mAP  ap[nc, 10], 依次循环计算不同阈值下的iou
                # 在当前类别下，根据每个阈值下的召回率与查全率来map（就算不规则图像的面积，也就是使用了一个定积分计算ap）
                ap[ci, j], mpre, mrec = compute_ap(recall[:, j], precision[:, j])
                if plot and j == 0:
                    py.append(np.interp(px, mrec, mpre))  # precision at mAP@0.5
    # Compute F1 (harmonic mean of precision and recall)
    # 计算F1分数 P和R的调和平均值  综合评价指标
    # 我们希望的是P和R两个越大越好, 但是P和R常常是两个冲突的变量, 经常是P越大R越小, 或者R越大P越小 所以我们引入F1综合指标
    # 不同任务的重点不一样, 有些任务希望P越大越好, 有些任务希望R越大越好, 有些任务希望两者都大, 这时候就看F1这个综合指标了
    # 返回所有类别, 横坐标为conf(值为px=[0, 1, 1000] 0~1 1000个点)对应的f1值  f1=[nc, 1000]
    f1 = 2 * p * r / (p + r + 1e-16)
    if plot:
        plot_pr_curve(px, py, ap, Path(save_dir) / 'PR_curve.png', names)
        plot_mc_curve(px, f1, Path(save_dir) / 'F1_curve.png', names, ylabel='F1')
        plot_mc_curve(px, p, Path(save_dir) / 'P_curve.png', names, ylabel='Precision')
        plot_mc_curve(px, r, Path(save_dir) / 'R_curve.png', names, ylabel='Recall')
    # f1=[nc, 1000]   f1.mean(0)=[1000]求出所有类别在x轴每个conf点上的平均f1
    # .argmax(): 求出每个点平均f1中最大的f1对应conf点的index
    i = f1.mean(0).argmax()  # max F1 index
    # p=[nc, 1000] 每个类别在x轴每个conf值对应的precision
    # p[:, i]: [nc] 最大平均f1时每个类别的precision
    # r[:, i]: [nc] 最大平均f1时每个类别的recall
    # f1[:, i]: [nc] 最大平均f1时每个类别的f1
    # ap: [71, 10] 数据集每个类别在10个iou阈值下的mAP
    # unique_classes.astype('int32'): [nc] 返回数据集中所有的类别index
    return p[:, i], r[:, i], ap, f1[:, i], unique_classes.astype('int32')
# 主要是计算ap这个指标
def compute_ap(recall, precision):
    """ Compute the average precision, given the recall and precision curves
    # Arguments
        recall:    The recall curve (list) 在某个iou阈值下某个类别所有的预测框的recall  从小到大
                    (每个预测框的recall都是截至到这个预测框为止的总recall)
        precision: The precision curve (list) 在某个iou阈值下某个类别所有的预测框的precision
                    (每个预测框的precision都是截至到这个预测框为止的总precision)
    # Returns
        Average precision, precision curve, recall curve
            ap: 返回某类别在某个iou下的mAP
            mpre: 在开头和末尾添加保护值 防止全零的情况出现 [0, ..., 1]
            mprc: 在开头和末尾添加保护值 防止全零的情况出现 [1, ..., 0]
    """
    # Append sentinel values to beginning and end
    # 在开头和末尾添加保护值 防止全零的情况出现
    mrec = np.concatenate(([0.0], recall, [1.0]))
    mpre = np.concatenate(([1.0], precision, [0.0]))
    # Compute the precision envelope
    # np.flip(mpre): 把一维数组每个元素的顺序进行翻转 第一个翻转成为最后一个
    # np.maximum.accumulate(np.flip(mpre)): 计算数组(或数组的特定轴)的累积最大值 令mpre是单调的 从小到大
    # np.flip(np.maximum.accumulate(np.flip(mpre))): 从大到小
    # 目的: 要保证mpre是从大到小单调的(左右可以相同)
    mpre = np.flip(np.maximum.accumulate(np.flip(mpre)))
    # Integrate area under curve
    method = 'interp'  # methods: 'continuous', 'interp'
    if method == 'interp':     # 用一些典型的间断点来计算AP (定积分计算)
        x = np.linspace(0, 1, 101)  # 101-point interp (COCO)
        #  np.trapz(list,list) 计算两个list对应点与点之间四边形的面积 以定积分形式估算AP 第一个参数是y 第二个参数是x
        ap = np.trapz(np.interp(x, mrec, mpre), x)  # integrate
    else:  # 'continuous'
        i = np.where(mrec[1:] != mrec[:-1])[0]  # points where x axis (recall) changes
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])  # area under curve
    return ap, mpre, mrec

简要解析：

这里想要看懂这个指标代码，需要知道目标检测的指标是什么。mAP其实是pr曲线的面积，这个面积可以通过定积分来求得。参考文章：目标检测中的评估指标：PR曲线、AP、mAP

那么，基于这个出发点重新来大致的观摩ap_per_class这个函数，其实可以发现，其本质上就是取出某一个类的False Positive和True Positive，也可以说是根据预测中预测类别为c的数量，和gt中c类别是数量来计算recall和precision。这里的recall和precision是需要累加的，因为是一个面积曲线的问题，需要用到np.cumsum函数。根据这个面积，就可以计算出每个类别在每个iou阈值下的ap指标，这个操作通过compute_ap函数来实现。

同时，这里最后的目的是为了获取每个类别的平均召回率与准确率，在yolov5这里是利用iou=0.5这个指标来构建每个类别的f1指标的。同时，将全部预测框的数量进行线性插值到一个0-1000的范围，其中在这个缩放后的范围里面找到最高的f1，获取对应的索引i。那么，同样的，在召回率矩阵和准确率矩阵同样会进行0-1000的线性插值映射，f1最大的索引i也就是需要的每个类别的召回率和准确率。

一个可能比较绕的点是为什么需要进行0-1000的映射。个人感觉是对于每个类别来说，可能进行nms之后的预测框还是比较多，这里进行映射对数量统一规划及简化，完成了更具当前的预测框与gt box来获取查全率与查准率，进而得到f1指标。这里使用最好的f1指标来对预测框进行评价。

为什么会存在最好的f1指标？一般来说，查全率与查准率不可能两全其美，总会一个高一个低，所以存在一个最优解。

    p, r, ap, f1, ap_class = ap_per_class(*stats, plot=plots, save_dir=save_dir, names=names)
        # ap50: [nc] 所有类别的mAP@0.5   ap: [nc] 所有类别的mAP@0.5:0.95
        ap50, ap = ap[:, 0], ap.mean(1)  # AP@0.5, AP@0.5:0.95
        # mp: [1] 所有类别的平均precision(最大f1时)
        # mr: [1] 所有类别的平均recall(最大f1时)
        # map50: [1] 所有类别的平均mAP@0.5
        # map: [1] 所有类别的平均mAP@0.5:0.95
        mp, mr, map50, map = p.mean(), r.mean(), ap50.mean(), ap.mean()
        # nt: 统计出整个数据集的gt框中数据集各个类别的个数
        nt = np.bincount(stats[3].astype(np.int64), minlength=nc)  # number of targets per class

最后根据返回的信息便可见一斑。

# p=[nc, 1000] 每个类别在x轴每个conf值对应的precision
    # p[:, i]: [nc] 最大平均f1时每个类别的precision
    # r[:, i]: [nc] 最大平均f1时每个类别的recall
    # f1[:, i]: [nc] 最大平均f1时每个类别的f1
    # ap: [71, 10] 数据集每个类别在10个iou阈值下的mAP
    # unique_classes.astype('int32'): [nc] 返回数据集中所有的类别index
    return p[:, i], r[:, i], ap, f1[:, i], unique_classes.astype('int32')

剩下的就是打印相关的参数即可：

# Print results
    pf = '%20s' + '%11i' * 2 + '%11.3g' * 4  # print format
    print(pf % ('all', seen, nt.sum(), mp, mr, map50, map))
    # Print results per class
    if (verbose or (nc < 50 and not training)) and nc > 1 and len(stats):
        for i, c in enumerate(ap_class):
            print(pf % (names[c], seen, nt[c], p[i], r[i], ap50[i], ap[i]))

2.3 信息保存部分

# 保存预测信息到txt文件
def save_one_txt(predn, save_conf, shape, file):
    # Save one txt result
    # gn = [w, h, w, h] 对应图片的宽高  用于后面归一化
    gn = torch.tensor(shape)[[1, 0, 1, 0]]  # normalization gain whwh
    for *xyxy, conf, cls in predn.tolist():
        # xyxy -> xywh 并作归一化处理
        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
        line = (cls, *xywh, conf) if save_conf else (cls, *xywh)  # label format
        # 保存预测类别和坐标值到对应图片image_name.txt文件中
        with open(file, 'a') as f:
            f.write(('%g ' * len(line)).rstrip() % line + '\n')
# 将预测信息保存到coco格式的json字典
def save_one_json(predn, jdict, path, class_map):
    # Save one JSON result {"image_id": 42, "category_id": 18, "bbox": [258.15, 41.29, 348.26, 243.78], "score": 0.236}
    # 获取图片id
    image_id = int(path.stem) if path.stem.isnumeric() else path.stem
    # 获取预测框 并将xyxy转为xywh格式
    box = xyxy2xywh(predn[:, :4])  # xywh
    # 之前的的xyxy格式是左上角右下角坐标  xywh是中心的坐标和宽高
    # 而coco的json格式的框坐标是xywh(左上角坐标 + 宽高)
    # 所以这行代码是将中心点坐标 -> 左上角坐标
    box[:, :2] -= box[:, 2:] / 2  # xy center to top-left corner
    # image_id: 图片id 即属于哪张图片
    # category_id: 类别 coco91class()从索引0~79映射到索引0~90
    # bbox: 预测框坐标
    # score: 预测得分
    for p, b in zip(predn.tolist(), box.tolist()):
        jdict.append({'image_id': image_id,
                      'category_id': class_map[int(p[5])],
                      'bbox': [round(x, 3) for x in b],
                      'score': round(p[4], 5)})

此外，代码中还有混淆矩阵的实现与相关参数的绘图，对我来说一般没怎么看这些图，代码实现也不算很复杂，所以这里就不再介绍。

主要需要对目标检测的评价指标有一个深刻的理解，见：目标检测中的评估指标：PR曲线、AP、mAP，代码中也是这么实现的。

参考资料：

1. 【YOLOV5-5.x 源码解读】val.py

2. 【YOLOV5-5.x 源码解读】metrics.py

3. 目标检测中的评估指标：PR曲线、AP、mAP

YOLOv5的Tricks | 【Trick14】YOLOv5的val.py脚本的解析

1. Val脚本使用

2. Val脚本解析

2.1 主体部分

2.2 指标计算部分

2.3 信息保存部分

热门文章

最新文章

相关电子书

推荐镜像

探索云世界

热门

云计算

大数据

云原生

人工智能

数据库

开发与运维

活动广场

任务中心

训练营

直播

乘风者计划

下载

镜像站

技术资料

YOLOv5的Tricks | 【Trick14】YOLOv5的val.py脚本的解析

1. Val脚本使用

2. Val脚本解析

2.1 主体部分

2.2 指标计算部分

2.3 信息保存部分

热门文章

最新文章

相关电子书

推荐镜像