retinaNet网络

RetinaNet

https://zhuanlan.zhihu.com/p/346198300

什么是anchor

https://zhuanlan.zhihu.com/p/55824651

Head模块

路径mmdetection/mmdet/models/dense_heads/retina_head.py

head包含两个子网络分类与锚框回归分支，两个分支不共享权重，但分支内五个FPN输出特征图权重是共享的。head中的卷积网络为五层，前四层的通道数均为256，最后一层的通道数量分类为self.num_anchors * self.cls_out_channels，回归为self.num_anchors * 4

if self.use_sigmoid_cls:
	self.cls_out_channels = num_classes
else:
    self.cls_out_channels = num_classes + 1

mmcv.cnn.conv_module包含了卷积、norm、activate三个层。

BBox Assigner

RetinaNet属于anchor-based算法，在bbox分配前需要得到特征图每个位置的anchor列表

参数配置定义

anchor_generator=dict(
    type='AnchorGenerator',
    # anchor基准大小
    octave_base_scale=4,
    # 每个特征图anchor有三个尺度 2 ** 0、 2 **(1 / 3) = 1.2599、2 ** (2 / 3) =  1.5874
    scales_per_octave=3,
    # 每个特征图anchor有三种宽高比
    ratios=[0.5, 1.0, 2.0],
    # 特征图对于原图的stride
    strides=[8, 16, 32, 64, 128]),

生成anchor_generator对象，对于生成的9个anchor，组内的大小为octave_base_scale * strides在乘上相应的尺度因子。x坐标从左往右递增，y坐标从上往下递增，最左上方可见像素的坐标是（0，0）

Anchor Generator

代码位置mmdet/core/anchor/anchor_generator.py

核心函数gen_single_level_base_anchors

def gen_single_level_base_anchors(self,
                                  base_size,
                                  scales,  # 尺度因子 这里包含了octave_base_scale的系数
                                  ratios, # 高宽比
                                  center=None):

生成9个anchor 分成三组，每组内的ratio高宽比相同。

_meshgrid函数快速根据一维x，y坐标生成二维的索引坐标

 x = torch.tensor(np.array(range(2)))
 y = x.clone()
 xx = x.repeat(y.shape[0])
 yy = y.view(-1, 1).repeat(1, x.shape[0]).view(-1)
# xx tensor([0, 1, 0, 1])
# yy tensor([0, 0, 1, 1])
 shifts = torch.stack([shift_xx, shift_yy, shift_xx, shift_yy], dim=-1) 对第最后一个维度进行拼接

mmdetection中的Retinanet类

在mmdetection中并未对Retinanet做特殊实现，只是用参数初始化了父类SingleStageDetector ，实现文件为mmdet/models/detectors/single_stage.py

class RetinaNet(SingleStageDetector):
        def __init__(self,
                 backbone,
                 neck,
                 bbox_head,
                 train_cfg=None,
                 test_cfg=None,
                 pretrained=None,
                 init_cfg=None):
        super(RetinaNet, self).__init__(backbone, neck, bbox_head, train_cfg,
                                        test_cfg, pretrained, init_cfg)