新年美食鉴赏——基于注意力机制CBAM的美食101分类

春节是一年当中最隆重、最盛大的节日,吃得最好的一餐可能就是年夜饭了,想想就很激动呢!随着新节钟声的敲响,一年一度的美食斗图大赛即将上演,快来看看都有啥好吃的!

在这里插入图片描述

一、数据预处理

本项目使用的数据集地址:101类美食数据集

# 解压数据集
!unzip -oq /home/aistudio/data/data70204/images.zip -d food

1.数据集介绍

该数据集包含完整的101种食物。为图像分析提供比CIFAR10或MNIST更令人兴奋的简单训练集,因此,数据中包含大量缩小的图像版本,以便进行快速测试。

101种类别如下所示(索引从0开始):

‘apple_pie’: 0, ‘baby_back_ribs’: 1, ‘baklava’: 2, ‘beef_carpaccio’: 3, ‘beef_tartare’: 4, ‘beet_salad’: 5, ‘beignets’: 6, ‘bibimbap’: 7, ‘bread_pudding’: 8, ‘breakfast_burrito’: 9, ‘bruschetta’: 10,

‘caesar_salad’: 11, ‘cannoli’: 12, ‘caprese_salad’: 13, ‘carrot_cake’: 14, ‘ceviche’: 15, ‘cheesecake’: 16, ‘cheese_plate’: 17, ‘chicken_curry’: 18, ‘chicken_quesadilla’: 19, ‘chicken_wings’: 20,

‘chocolate_cake’: 21, ‘chocolate_mousse’: 22, ‘churros’: 23, ‘clam_chowder’: 24, ‘club_sandwich’: 25, ‘crab_cakes’: 26, ‘creme_brulee’: 27, ‘croque_madame’: 28, ‘cup_cakes’: 29, ‘deviled_eggs’: 30,

‘donuts’: 31, ‘dumplings’: 32, ‘edamame’: 33, ‘eggs_benedict’: 34, ‘escargots’: 35, ‘falafel’: 36, ‘filet_mignon’: 37, ‘fish_and_chips’: 38, ‘foie_gras’: 39, ‘french_fries’: 40,

‘french_onion_soup’: 41, ‘french_toast’: 42, ‘fried_calamari’: 43, ‘fried_rice’: 44, ‘frozen_yogurt’: 45, ‘garlic_bread’: 46, ‘gnocchi’: 47, ‘greek_salad’: 48, ‘grilled_cheese_sandwich’: 49, ‘grilled_salmon’: 50,

‘guacamole’: 51, ‘gyoza’: 52, ‘hamburger’: 53, ‘hot_and_sour_soup’: 54, ‘hot_dog’: 55, ‘huevos_rancheros’: 56, ‘hummus’: 57, ‘ice_cream’: 58, ‘lasagna’: 59, ‘lobster_bisque’: 60,

‘lobster_roll_sandwich’: 61, ‘macaroni_and_cheese’: 62, ‘macarons’: 63, ‘miso_soup’: 64, ‘mussels’: 65, ‘nachos’: 66, ‘omelette’: 67, ‘onion_rings’: 68, ‘oysters’: 69, ‘pad_thai’: 70,

‘paella’: 71, ‘pancakes’: 72, ‘panna_cotta’: 73, ‘peking_duck’: 74, ‘pho’: 75, ‘pizza’: 76, ‘pork_chop’: 77, ‘poutine’: 78, ‘prime_rib’: 79, ‘pulled_pork_sandwich’: 80,

‘ramen’: 81, ‘ravioli’: 82, ‘red_velvet_cake’: 83, ‘risotto’: 84, ‘samosa’: 85, ‘sashimi’: 86, ‘scallops’: 87, ‘seaweed_salad’: 88, ‘shrimp_and_grits’: 89, ‘spaghetti_bolognese’: 90,

‘spaghetti_carbonara’: 91, ‘spring_rolls’: 92, ‘steak’: 93, ‘strawberry_shortcake’: 94, ‘sushi’: 95, ‘tacos’: 96, ‘takoyaki’: 97, ‘tiramisu’: 98, ‘tuna_tartare’: 99, ‘waffles’: 100}

从左至右分别是蜜糖果仁千层酥(2)、披萨(76)、总会三明治(25)

从左至右分别是蜜糖果仁千层酥(2)、披萨(76)、总会三明治(25)

2.读取标签

在做分类任务之前,要明确有几类,因为机器只认识二进制,因此要把每一类(字符串)映射到唯一的一个数字上

txtpath = r"classes.txt"
fp = open(txtpath)
arr = []
for lines in fp.readlines():
    # print(lines)
    lines = lines.replace("\n","")
    arr.append(lines)
# print(arr)
fp.close()

number = []
for item in range(len(arr)):
    number.append(item)

categorys = dict(zip(arr, number))
print(categorys)
{'apple_pie': 0, 'baby_back_ribs': 1, 'baklava': 2, 'beef_carpaccio': 3, 'beef_tartare': 4, 'beet_salad': 5, 'beignets': 6, 'bibimbap': 7, 'bread_pudding': 8, 'breakfast_burrito': 9, 'bruschetta': 10, 'caesar_salad': 11, 'cannoli': 12, 'caprese_salad': 13, 'carrot_cake': 14, 'ceviche': 15, 'cheesecake': 16, 'cheese_plate': 17, 'chicken_curry': 18, 'chicken_quesadilla': 19, 'chicken_wings': 20, 'chocolate_cake': 21, 'chocolate_mousse': 22, 'churros': 23, 'clam_chowder': 24, 'club_sandwich': 25, 'crab_cakes': 26, 'creme_brulee': 27, 'croque_madame': 28, 'cup_cakes': 29, 'deviled_eggs': 30, 'donuts': 31, 'dumplings': 32, 'edamame': 33, 'eggs_benedict': 34, 'escargots': 35, 'falafel': 36, 'filet_mignon': 37, 'fish_and_chips': 38, 'foie_gras': 39, 'french_fries': 40, 'french_onion_soup': 41, 'french_toast': 42, 'fried_calamari': 43, 'fried_rice': 44, 'frozen_yogurt': 45, 'garlic_bread': 46, 'gnocchi': 47, 'greek_salad': 48, 'grilled_cheese_sandwich': 49, 'grilled_salmon': 50, 'guacamole': 51, 'gyoza': 52, 'hamburger': 53, 'hot_and_sour_soup': 54, 'hot_dog': 55, 'huevos_rancheros': 56, 'hummus': 57, 'ice_cream': 58, 'lasagna': 59, 'lobster_bisque': 60, 'lobster_roll_sandwich': 61, 'macaroni_and_cheese': 62, 'macarons': 63, 'miso_soup': 64, 'mussels': 65, 'nachos': 66, 'omelette': 67, 'onion_rings': 68, 'oysters': 69, 'pad_thai': 70, 'paella': 71, 'pancakes': 72, 'panna_cotta': 73, 'peking_duck': 74, 'pho': 75, 'pizza': 76, 'pork_chop': 77, 'poutine': 78, 'prime_rib': 79, 'pulled_pork_sandwich': 80, 'ramen': 81, 'ravioli': 82, 'red_velvet_cake': 83, 'risotto': 84, 'samosa': 85, 'sashimi': 86, 'scallops': 87, 'seaweed_salad': 88, 'shrimp_and_grits': 89, 'spaghetti_bolognese': 90, 'spaghetti_carbonara': 91, 'spring_rolls': 92, 'steak': 93, 'strawberry_shortcake': 94, 'sushi': 95, 'tacos': 96, 'takoyaki': 97, 'tiramisu': 98, 'tuna_tartare': 99, 'waffles': 100}

3.统一命名

统一命名,方便检查数据集

# 将图片整理到一个文件夹,并统一命名
import os
from PIL import Image

categorys = arr
if not os.path.exists("temporary"):
    os.mkdir("temporary")

for category in categorys:
    # 图片文件夹路径
    path = r"food/{}/".format(category)
    count = 0
    for filename in os.listdir(path):
        img = Image.open(path + filename)
        img = img.resize((512, 512),Image.ANTIALIAS) # 转换图片,图像尺寸变为1280*720
        img = img.convert('RGB') # 保存为.jpg格式才需要
        img.save(r"temporary/{}{}.jpg".format(category, str(count)))
        count += 1

4.整理图片路径

整理图片路径,便于将图片送入神经网络

# 获取图片路径与图片标签
import os
import string

train_list = open('train_list.txt',mode='w')
paths = r'temporary/'
# 返回指定路径的文件夹名称
dirs = os.listdir(paths)
# 循环遍历该目录下的照片
for path in dirs:
    # 拼接字符串
    imgPath = paths + path
    train_list.write(imgPath + '\t')
    for category in categorys:
        if category == path.replace(".jpg","").rstrip(string.digits):
            train_list.write(str(categorys[category]) + '\n')
train_list.close()

5.划分训练集与验证集

验证集用于检验模型是否过拟合,这里的划分标准是5:1,即每5张图片取1张做验证数据

# 划分训练集和验证集
import shutil

train_dir = '/home/aistudio/work/trainImages'
eval_dir = '/home/aistudio/work/evalImages'
train_list_path = '/home/aistudio/train_list.txt'
target_path = "/home/aistudio/"

if not os.path.exists(train_dir):
    os.mkdir(train_dir)
if not os.path.exists(eval_dir):
    os.mkdir(eval_dir) 

with open(train_list_path, 'r') as f:
    data = f.readlines()
    for i in range(len(data)):
        img_path = data[i].split('\t')[0]
        class_label = data[i].split('\t')[1][:-1]
        if i % 5 == 0: # 每5张图片取一个做验证数据
            eval_target_dir = os.path.join(eval_dir, str(class_label)) 
            eval_img_path = os.path.join(target_path, img_path)
            if not os.path.exists(eval_target_dir):
                os.mkdir(eval_target_dir)  
            shutil.copy(eval_img_path, eval_target_dir)                         
        else:
            train_target_dir = os.path.join(train_dir, str(class_label)) 
            train_img_path = os.path.join(target_path, img_path)                     
            if not os.path.exists(train_target_dir):
                os.mkdir(train_target_dir)
            shutil.copy(train_img_path, train_target_dir) 

    print ('划分训练集和验证集完成!')
划分训练集和验证集完成!

6.定义美食数据集

分类任务中有一个非常重要的点就是归一化处理,通过归一化处理,将图片的取值范围从0~255转化为0~1之间,这个对于后续的神经网络有很大的好处,如果不做归一化,那么神经网络有可能学不到任何东西,输出结果全部一样。

import os
import numpy as np
import paddle
from paddle.io import Dataset
from paddle.vision.datasets import DatasetFolder, ImageFolder
from paddle.vision.transforms import Compose, Resize, BrightnessTransform, ColorJitter, Normalize, Transpose

class FoodsDataset(Dataset):
    """
    步骤一:继承paddle.io.Dataset类
    """
    def __init__(self, mode='train'):
        """
        步骤二:实现构造函数,定义数据读取方式,划分训练和测试数据集
        """
        super(FoodsDataset, self).__init__()
        train_image_dir = '/home/aistudio/work/trainImages'
        eval_image_dir = '/home/aistudio/work/evalImages'
        test_image_dir = '/home/aistudio/work/evalImages'

        transform_train = Compose([Normalize(mean=[127.5, 127.5, 127.5],std=[127.5, 127.5, 127.5],data_format='HWC'), Transpose()])
        transform_eval = Compose([Normalize(mean=[127.5, 127.5, 127.5],std=[127.5, 127.5, 127.5],data_format='HWC'), Transpose()])
        train_data_folder = DatasetFolder(train_image_dir, transform=transform_train)
        eval_data_folder = DatasetFolder(eval_image_dir, transform=transform_eval)
        test_data_folder = DatasetFolder(test_image_dir)
        self.mode = mode
        if self.mode  == 'train':
            self.data = train_data_folder
        elif self.mode  == 'eval':
            self.data = eval_data_folder
        elif self.mode  == 'test':
            self.data = test_data_folder

    def __getitem__(self, index):
        """
        步骤三:实现__getitem__方法,定义指定index时如何获取数据,并返回单条数据(训练数据,对应的标签)
        """
        data = np.array(self.data[index][0]).astype('float32')

        if self.mode  == 'test':
            return data
        else:
            label = np.array([self.data[index][1]]).astype('int64')

            return data, label

    def __len__(self):
        """
        步骤四:实现__len__方法,返回数据集总数目
        """
        return len(self.data)

train_dataset = FoodsDataset(mode='train')
val_dataset = FoodsDataset(mode='eval')
test_dataset = FoodsDataset(mode='test')
# 查看训练数据,共80800条训练数据
print(len(train_dataset))
80800

二、注意力机制

注意力机制(Attention Mechanism)源于对人类视觉的研究。在认知科学中,由于信息处理的瓶颈,人类会选择性地关注所有信息的一部分,同时忽略其他可见的信息。上述机制通常被称为注意力机制。人类视网膜不同的部位具有不同程度的信息处理能力,即敏锐度(Acuity),只有视网膜中央凹部位具有最强的敏锐度。为了合理利用有限的视觉信息处理资源,人类需要选择视觉区域中的特定部分,然后集中关注它。例如,人们在阅读时,通常只有少量要被读取的词会被关注和处理。综上,注意力机制主要有两个方面:决定需要关注输入的哪部分;分配有限的信息处理资源给重要的部分。

1.简介

人类视觉通过快速扫描全局图像,获得需要重点关注的目标区域,也就是一般所说的注意力焦点,而后对这一区域投入更多注意力资源,以获取更多所需要关注目标的细节信息,而抑制其他无用信息。

这是人类利用有限的注意力资源从大量信息中快速筛选出高价值信息的手段,是人类在长期进化中形成的一种生存机制,人类视觉注意力机制极大地提高了视觉信息处理的效率与准确性。

上面这张图形象化地展示了人类在看到一副图像时是如何高效分配有限的注意力资源的,其中红色区域表明视觉系统更关注的目标,很明显对于上图所示的场景,人们会把注意力更多投入到人的脸部,文本的标题以及文章首句等位置。

深度学习中的注意力机制从本质上讲和人类的选择性视觉注意力机制类似,核心目标也是从众多信息中选择出对当前任务目标更关键的信息。

2.卷积块注意力网络——CBAM

CBAM是Convolutional Block Attention Module的缩写,即卷积块注意力网络。论文地址:https://arxiv.org/abs/1807.06521

卷积块注意力网络CBAM 包含2个子模块:

  • Channel Attention Module(CAM) ——通道注意力模块
  • Spartial Attention Module(SAM) ——空间注意力模块

卷积块注意力模块CBAM是一种用于前馈卷积神经网络的简单而有效的注意力模块。给定一个中间特征图,CBAM会沿着两个独立的维度(通道和空间)依次推断注意力图,然后将注意力图与输入特征图相乘以进行自适应特征细化。由于CBAM是轻量级的通用模块,因此可以将其无缝集成到任何CNN架构中,而开销却可以忽略不计,并且可以与基础CNN一起进行端到端训练。

作者通过对ImageNet-1K,MS〜COCO检测和VOC〜2007检测数据集进行了广泛的实验来验证CBAM。实验表明,使用各种模型在分类和检测性能方面都有提升,证明了CBAM的广泛适用性。

通道注意力模块CAM

通道注意力机制(Channel Attention Module)是将特征图在空间维度上进行压缩,得到一个一维矢量后再进行操作。在空间维度上进行压缩时,不仅考虑到了平均值池化(Average Pooling)还考虑了最大值池化(Max Pooling)。

平均池化和最大池化可用来聚合特征映射的空间信息,送到一个共享网络,压缩输入特征图的空间维数,逐元素求和合并,以产生通道注意力图。

单就一张图来说,通道注意力,关注的是这张图上哪些内容是有重要作用的。平均值池化对特征图上的每一个像素点都有反馈,而最大值池化在进行梯度反向传播计算时,只有特征图中响应最大的地方有梯度的反馈。

代码实现CAM:

import paddle
import math
import paddle.fluid as fluid
from paddle import nn

class CAM_Module(nn.Layer):  
    def __init__(self, channels, reduction=16):  
        super(CBAM_Module, self).__init__()  
        self.avg_pool = nn.AdaptiveAvgPool2D(output_size=1)  
        self.max_pool = nn.AdaptiveMaxPool2D(output_size=1)  
        self.fc1 = nn.Conv2D(in_channels=channels, out_channels=channels // reduction, kernel_size=1, padding=0)  
        self.relu = nn.ReLU()  
        self.fc2 = nn.Conv2D(in_channels=channels // reduction, out_channels=channels, kernel_size=1, padding=0)  
        self.sigmoid_channel = nn.Sigmoid()  

    def forward(self, x):  
        # Channel Attention Module  
        module_input = x  
        avg = self.relu(self.fc1(self.avg_pool(x)))  
        avg = self.fc2(avg)  
        mx = self.relu(self.fc1(self.max_pool(x)))  
        mx = self.fc2(mx)  
        x = avg + mx  
        x = self.sigmoid_channel(x)  

        return x  

空间注意力模块SAM

空间注意力机制(Spatial Attention Module)是对通道进行压缩,在通道维度分别进行了平均值池化和最大值池化。

  • MaxPool的操作就是在通道上提取最大值,提取的次数是高乘以宽;
  • AvgPool的操作就是在通道上提取平均值,提取的次数也是是高乘以宽。

接着将前面所提取到的特征图(通道数都为1)合并得到一个2通道的特征图。

代码实现:

import paddle
import math
import paddle.fluid as fluid
from paddle import nn

class SAM_Module(nn.Layer):  
    def __init__(self, channels, reduction=16):  
        super(CBAM_Module, self).__init__()  
        self.conv_after_concat = nn.Conv2D(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3)  
        self.sigmoid_spatial = nn.Sigmoid()  

    def forward(self, x):  
        # Spatial Attention Module  
        x = module_input * x  
        module_input = x  
        avg = paddle.mean(x, axis=1, keepdim=True)  
        mx = paddle.argmax(x, axis=1, keepdim=True)
        mx = paddle.cast(mx, 'float32')
        x = paddle.concat([avg, mx], axis=1)
        x = self.conv_after_concat(x)  
        x = self.sigmoid_spatial(x)  
        x = module_input * x  

        return x  

3.将通道、空间注意力模块合并

将通道、空间注意力模块合并就可以得到卷积块注意力模块CBAM,下面是基于PaddlePaddle的CBAM代码实现:

import paddle
import math
import paddle.fluid as fluid
from paddle import nn

class CBAM_Module(nn.Layer):  
    def __init__(self, channels, reduction=16):  
        super(CBAM_Module, self).__init__()  
        self.avg_pool = nn.AdaptiveAvgPool2D(output_size=1)  
        self.max_pool = nn.AdaptiveMaxPool2D(output_size=1)  
        self.fc1 = nn.Conv2D(in_channels=channels, out_channels=channels // reduction, kernel_size=1, padding=0)  
        self.relu = nn.ReLU()  
        self.fc2 = nn.Conv2D(in_channels=channels // reduction, out_channels=channels, kernel_size=1, padding=0)  

        self.sigmoid_channel = nn.Sigmoid()  
        self.conv_after_concat = nn.Conv2D(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3)  
        self.sigmoid_spatial = nn.Sigmoid()  

    def forward(self, x):  
        # Channel Attention Module  
        module_input = x  
        avg = self.relu(self.fc1(self.avg_pool(x)))  
        avg = self.fc2(avg)  
        mx = self.relu(self.fc1(self.max_pool(x)))  
        mx = self.fc2(mx)  
        x = avg + mx  
        x = self.sigmoid_channel(x)  

        # Spatial Attention Module  
        x = module_input * x  
        module_input = x  
        avg = paddle.mean(x, axis=1, keepdim=True)  
        mx = paddle.argmax(x, axis=1, keepdim=True)
        mx = paddle.cast(mx, 'float32')
        x = paddle.concat([avg, mx], axis=1)
        x = self.conv_after_concat(x)  
        x = self.sigmoid_spatial(x)  
        x = module_input * x  

        return x  

三、基于卷积块注意力模块的ResNet-CBAM

ResNet论文地址:https://arxiv.org/pdf/1512.03385.pdf

在ResNet中添加通道注意力机制和空间注意力机制,构成ResNet-CBAM。因为不能改变ResNet的网络结构,所以CBAM不能加在block里面(加进去网络结构发生了变化,导致不能用预训练参数)。故将CBAM加在第一层卷积和最后一层卷积不会改变网络,且可以用预训练参数。

  • Channel Attention:

  • Spatial Attention

1.代码实现

import paddle
import paddle.nn as nn
from paddle.utils.download import get_weights_path_from_url

__all__ = [
    'ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101', 'resnet152'
]

model_urls = {
    'resnet18': ('https://paddle-hapi.bj.bcebos.com/models/resnet18.pdparams',
                 'cf548f46534aa3560945be4b95cd11c4'),
    'resnet34': ('https://paddle-hapi.bj.bcebos.com/models/resnet34.pdparams',
                 '8d2275cf8706028345f78ac0e1d31969'),
    'resnet50': ('https://paddle-hapi.bj.bcebos.com/models/resnet50.pdparams',
                 'ca6f485ee1ab0492d38f323885b0ad80'),
    'resnet101': ('https://paddle-hapi.bj.bcebos.com/models/resnet101.pdparams',
                  '02f35f034ca3858e1e54d4036443c92d'),
    'resnet152': ('https://paddle-hapi.bj.bcebos.com/models/resnet152.pdparams',
                  '7ad16a2f1e7333859ff986138630fd7a'),
}


class BasicBlock(nn.Layer):
    expansion = 1

    def __init__(self,
                 inplanes,
                 planes,
                 stride=1,
                 downsample=None,
                 groups=1,
                 base_width=64,
                 dilation=1,
                 norm_layer=None):
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D

        if dilation > 1:
            raise NotImplementedError(
                "Dilation > 1 not supported in BasicBlock")

        self.conv1 = nn.Conv2D(
            inplanes, planes, 3, padding=1, stride=stride, bias_attr=False)
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2D(planes, planes, 3, padding=1, bias_attr=False)
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class BottleneckBlock(nn.Layer):

    expansion = 4

    def __init__(self,
                 inplanes,
                 planes,
                 stride=1,
                 downsample=None,
                 groups=1,
                 base_width=64,
                 dilation=1,
                 norm_layer=None):
        super(BottleneckBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2D
        width = int(planes * (base_width / 64.)) * groups

        self.conv1 = nn.Conv2D(inplanes, width, 1, bias_attr=False)
        self.bn1 = norm_layer(width)

        self.conv2 = nn.Conv2D(
            width,
            width,
            3,
            padding=dilation,
            stride=stride,
            groups=groups,
            dilation=dilation,
            bias_attr=False)
        self.bn2 = norm_layer(width)

        self.conv3 = nn.Conv2D(
            width, planes * self.expansion, 1, bias_attr=False)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU()
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class ResNet(nn.Layer):
    def __init__(self, block, depth, num_classes=1000, with_pool=True):
        super(ResNet, self).__init__()
        layer_cfg = {
            18: [2, 2, 2, 2],
            34: [3, 4, 6, 3],
            50: [3, 4, 6, 3],
            101: [3, 4, 23, 3],
            152: [3, 8, 36, 3]
        }
        layers = layer_cfg[depth]
        self.num_classes = num_classes
        self.with_pool = with_pool
        self._norm_layer = nn.BatchNorm2D

        self.inplanes = 64
        self.dilation = 1

        self.conv1 = nn.Conv2D(
            3,
            self.inplanes,
            kernel_size=7,
            stride=2,
            padding=3,
            bias_attr=False)
        self.bn1 = self._norm_layer(self.inplanes)
        self.relu = nn.ReLU()
        self.CBAM_Module1 = CBAM_Module(channels=self.inplanes)
        self.maxpool = nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.CBAM_Module2 = CBAM_Module(channels=2048)
        if with_pool:
            self.avgpool = nn.AdaptiveAvgPool2D((1, 1))

        if num_classes > 0:
            self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, planes, blocks, stride=1, dilate=False):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = 1
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2D(
                    self.inplanes,
                    planes * block.expansion,
                    1,
                    stride=stride,
                    bias_attr=False),
                norm_layer(planes * block.expansion), )

        layers = []
        layers.append(
            block(self.inplanes, planes, stride, downsample, 1, 64,
                  previous_dilation, norm_layer))
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.inplanes, planes, norm_layer=norm_layer))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.CBAM_Module1(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.CBAM_Module2(x)

        if self.with_pool:
            x = self.avgpool(x)

        if self.num_classes > 0:
            x = paddle.flatten(x, 1)
            x = self.fc(x)

        return x


def _resnet(arch, Block, depth, pretrained, **kwargs):
    model = ResNet(Block, depth, **kwargs)
    if pretrained:
        assert arch in model_urls, "{} model do not have a pretrained model now, you should set pretrained=False".format(
            arch)
        weight_path = get_weights_path_from_url(model_urls[arch][0],
                                                model_urls[arch][1])

        param = paddle.load(weight_path)
        model.set_dict(param)

    return model


def resnet18(pretrained=False, **kwargs):
    return _resnet('resnet18', BasicBlock, 18, pretrained, **kwargs)


def resnet34(pretrained=False, **kwargs):
    return _resnet('resnet34', BasicBlock, 34, pretrained, **kwargs)


def resnet50(pretrained=False, **kwargs):
    return _resnet('resnet50', BottleneckBlock, 50, pretrained, **kwargs)


def resnet101(pretrained=False, **kwargs):
    return _resnet('resnet101', BottleneckBlock, 101, pretrained, **kwargs)


def resnet152(pretrained=False, **kwargs):
    return _resnet('resnet152', BottleneckBlock, 152, pretrained, **kwargs)
model = resnet50(pretrained=True, num_classes=101)

2.模型测试

测试模型是否可以跑通,给定输入,查看输出(本项目是101种美食分类,因此输出维度为 [1, 101] )

x = paddle.rand([1, 3, 512, 512])
out = model(x)

print(out)
Tensor(shape=[1, 101], dtype=float32, place=CUDAPlace(0), stop_gradient=False,
       [[ 0.23755740,  0.15992080, -0.16790576, -0.02421574,  0.23948371,  0.01540074, -0.05914545, -0.06471819, -0.15462203,  0.07059185, -0.09104817, -0.08824400,  0.16376866,  0.01088102, -0.01639843,  0.07510512, -0.25128710,  0.05310057, -0.05061390, -0.24302137,  0.24108808,  0.26871991,  0.11471137, -0.10154713,  0.16017962,  0.38808146,  0.39115551, -0.06520218, -0.06546519, -0.04215863, -0.39803913, -0.02926474, -0.21277788, -0.05047140,  0.20483626, -0.00560332,  0.00816562,  0.11082268, -0.02240067, -0.31493288, -0.34661019, -0.15874574, -0.04415106,  0.08496793, -0.14479199,  0.07015306,  0.03542121, -0.06248808, -0.36255446,  0.23171450,  0.01219252,  0.06549657,  0.05162504,  0.02651403,  0.28627244, -0.02422512,  0.09902165,  0.01188086, -0.05695777, -0.01429159,  0.10739808,  0.15823485,  0.08081408,  0.16685896, -0.03923680, -0.25720799,  0.18960142, -0.37058586, -0.15431085,  0.16415425, -0.13622791, -0.04410422,  0.08821643,  0.32092187, -0.00823142, -0.14378656,  0.17974210,  0.18032075,  0.16180043, -0.03393000,  0.01341872,  0.34255776,  0.29252559, -0.11773793, -0.12506239,  0.13361360, -0.41730911, -0.03966195,  0.03181494,  0.16027087,  0.11529364, -0.24660280, -0.11513865, -0.09760797, -0.00116460,  0.17974031, -0.00829839,  0.24515726, -0.09149191, -0.35889381,  0.19253115]])


/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:636: UserWarning: When training, we now always track global mean and variance.
  "When training, we now always track global mean and variance.")

3.查看网络结构

import paddle

model = paddle.Model(model)
model.summary((16, 3, 512, 512))
--------------------------------------------------------------------------------
   Layer (type)          Input Shape          Output Shape         Param #    
================================================================================
     Conv2D-1        [[16, 3, 512, 512]]   [16, 64, 256, 256]       9,408     
   BatchNorm2D-1    [[16, 64, 256, 256]]   [16, 64, 256, 256]        256      
      ReLU-1        [[16, 64, 256, 256]]   [16, 64, 256, 256]         0       
AdaptiveAvgPool2D-1 [[16, 64, 256, 256]]     [16, 64, 1, 1]           0       
     Conv2D-2         [[16, 64, 1, 1]]       [16, 4, 1, 1]           260      
      ReLU-2           [[16, 4, 1, 1]]       [16, 4, 1, 1]            0       
     Conv2D-3          [[16, 4, 1, 1]]       [16, 64, 1, 1]          320      
AdaptiveMaxPool2D-1 [[16, 64, 256, 256]]     [16, 64, 1, 1]           0       
     Sigmoid-1        [[16, 64, 1, 1]]       [16, 64, 1, 1]           0       
     Conv2D-4        [[16, 2, 256, 256]]   [16, 1, 256, 256]         99       
     Sigmoid-2       [[16, 1, 256, 256]]   [16, 1, 256, 256]          0       
   CBAM_Module-1    [[16, 64, 256, 256]]   [16, 64, 256, 256]         0       
    MaxPool2D-1     [[16, 64, 256, 256]]   [16, 64, 128, 128]         0       
     Conv2D-6       [[16, 64, 128, 128]]   [16, 64, 128, 128]       4,096     
   BatchNorm2D-3    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
      ReLU-3        [[16, 256, 128, 128]] [16, 256, 128, 128]         0       
     Conv2D-7       [[16, 64, 128, 128]]   [16, 64, 128, 128]      36,864     
   BatchNorm2D-4    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
     Conv2D-8       [[16, 64, 128, 128]]  [16, 256, 128, 128]      16,384     
   BatchNorm2D-5    [[16, 256, 128, 128]] [16, 256, 128, 128]       1,024     
     Conv2D-5       [[16, 64, 128, 128]]  [16, 256, 128, 128]      16,384     
   BatchNorm2D-2    [[16, 256, 128, 128]] [16, 256, 128, 128]       1,024     
 BottleneckBlock-1  [[16, 64, 128, 128]]  [16, 256, 128, 128]         0       
     Conv2D-9       [[16, 256, 128, 128]]  [16, 64, 128, 128]      16,384     
   BatchNorm2D-6    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
      ReLU-4        [[16, 256, 128, 128]] [16, 256, 128, 128]         0       
     Conv2D-10      [[16, 64, 128, 128]]   [16, 64, 128, 128]      36,864     
   BatchNorm2D-7    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
     Conv2D-11      [[16, 64, 128, 128]]  [16, 256, 128, 128]      16,384     
   BatchNorm2D-8    [[16, 256, 128, 128]] [16, 256, 128, 128]       1,024     
 BottleneckBlock-2  [[16, 256, 128, 128]] [16, 256, 128, 128]         0       
     Conv2D-12      [[16, 256, 128, 128]]  [16, 64, 128, 128]      16,384     
   BatchNorm2D-9    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
      ReLU-5        [[16, 256, 128, 128]] [16, 256, 128, 128]         0       
     Conv2D-13      [[16, 64, 128, 128]]   [16, 64, 128, 128]      36,864     
  BatchNorm2D-10    [[16, 64, 128, 128]]   [16, 64, 128, 128]        256      
     Conv2D-14      [[16, 64, 128, 128]]  [16, 256, 128, 128]      16,384     
  BatchNorm2D-11    [[16, 256, 128, 128]] [16, 256, 128, 128]       1,024     
 BottleneckBlock-3  [[16, 256, 128, 128]] [16, 256, 128, 128]         0       
     Conv2D-16      [[16, 256, 128, 128]] [16, 128, 128, 128]      32,768     
  BatchNorm2D-13    [[16, 128, 128, 128]] [16, 128, 128, 128]        512      
      ReLU-6         [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-17      [[16, 128, 128, 128]]  [16, 128, 64, 64]       147,456    
  BatchNorm2D-14     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
     Conv2D-18       [[16, 128, 64, 64]]   [16, 512, 64, 64]       65,536     
  BatchNorm2D-15     [[16, 512, 64, 64]]   [16, 512, 64, 64]        2,048     
     Conv2D-15      [[16, 256, 128, 128]]  [16, 512, 64, 64]       131,072    
  BatchNorm2D-12     [[16, 512, 64, 64]]   [16, 512, 64, 64]        2,048     
 BottleneckBlock-4  [[16, 256, 128, 128]]  [16, 512, 64, 64]          0       
     Conv2D-19       [[16, 512, 64, 64]]   [16, 128, 64, 64]       65,536     
  BatchNorm2D-16     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
      ReLU-7         [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-20       [[16, 128, 64, 64]]   [16, 128, 64, 64]       147,456    
  BatchNorm2D-17     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
     Conv2D-21       [[16, 128, 64, 64]]   [16, 512, 64, 64]       65,536     
  BatchNorm2D-18     [[16, 512, 64, 64]]   [16, 512, 64, 64]        2,048     
 BottleneckBlock-5   [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-22       [[16, 512, 64, 64]]   [16, 128, 64, 64]       65,536     
  BatchNorm2D-19     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
      ReLU-8         [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-23       [[16, 128, 64, 64]]   [16, 128, 64, 64]       147,456    
  BatchNorm2D-20     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
     Conv2D-24       [[16, 128, 64, 64]]   [16, 512, 64, 64]       65,536     
  BatchNorm2D-21     [[16, 512, 64, 64]]   [16, 512, 64, 64]        2,048     
 BottleneckBlock-6   [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-25       [[16, 512, 64, 64]]   [16, 128, 64, 64]       65,536     
  BatchNorm2D-22     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
      ReLU-9         [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-26       [[16, 128, 64, 64]]   [16, 128, 64, 64]       147,456    
  BatchNorm2D-23     [[16, 128, 64, 64]]   [16, 128, 64, 64]         512      
     Conv2D-27       [[16, 128, 64, 64]]   [16, 512, 64, 64]       65,536     
  BatchNorm2D-24     [[16, 512, 64, 64]]   [16, 512, 64, 64]        2,048     
 BottleneckBlock-7   [[16, 512, 64, 64]]   [16, 512, 64, 64]          0       
     Conv2D-29       [[16, 512, 64, 64]]   [16, 256, 64, 64]       131,072    
  BatchNorm2D-26     [[16, 256, 64, 64]]   [16, 256, 64, 64]        1,024     
      ReLU-10       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-30       [[16, 256, 64, 64]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-27     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-31       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-28    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
     Conv2D-28       [[16, 512, 64, 64]]   [16, 1024, 32, 32]      524,288    
  BatchNorm2D-25    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
 BottleneckBlock-8   [[16, 512, 64, 64]]   [16, 1024, 32, 32]         0       
     Conv2D-32      [[16, 1024, 32, 32]]   [16, 256, 32, 32]       262,144    
  BatchNorm2D-29     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
      ReLU-11       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-33       [[16, 256, 32, 32]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-30     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-34       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-31    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
 BottleneckBlock-9  [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-35      [[16, 1024, 32, 32]]   [16, 256, 32, 32]       262,144    
  BatchNorm2D-32     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
      ReLU-12       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-36       [[16, 256, 32, 32]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-33     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-37       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-34    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
BottleneckBlock-10  [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-38      [[16, 1024, 32, 32]]   [16, 256, 32, 32]       262,144    
  BatchNorm2D-35     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
      ReLU-13       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-39       [[16, 256, 32, 32]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-36     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-40       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-37    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
BottleneckBlock-11  [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-41      [[16, 1024, 32, 32]]   [16, 256, 32, 32]       262,144    
  BatchNorm2D-38     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
      ReLU-14       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-42       [[16, 256, 32, 32]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-39     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-43       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-40    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
BottleneckBlock-12  [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-44      [[16, 1024, 32, 32]]   [16, 256, 32, 32]       262,144    
  BatchNorm2D-41     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
      ReLU-15       [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-45       [[16, 256, 32, 32]]   [16, 256, 32, 32]       589,824    
  BatchNorm2D-42     [[16, 256, 32, 32]]   [16, 256, 32, 32]        1,024     
     Conv2D-46       [[16, 256, 32, 32]]   [16, 1024, 32, 32]      262,144    
  BatchNorm2D-43    [[16, 1024, 32, 32]]   [16, 1024, 32, 32]       4,096     
BottleneckBlock-13  [[16, 1024, 32, 32]]   [16, 1024, 32, 32]         0       
     Conv2D-48      [[16, 1024, 32, 32]]   [16, 512, 32, 32]       524,288    
  BatchNorm2D-45     [[16, 512, 32, 32]]   [16, 512, 32, 32]        2,048     
      ReLU-16       [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
     Conv2D-49       [[16, 512, 32, 32]]   [16, 512, 16, 16]      2,359,296   
  BatchNorm2D-46     [[16, 512, 16, 16]]   [16, 512, 16, 16]        2,048     
     Conv2D-50       [[16, 512, 16, 16]]   [16, 2048, 16, 16]     1,048,576   
  BatchNorm2D-47    [[16, 2048, 16, 16]]   [16, 2048, 16, 16]       8,192     
     Conv2D-47      [[16, 1024, 32, 32]]   [16, 2048, 16, 16]     2,097,152   
  BatchNorm2D-44    [[16, 2048, 16, 16]]   [16, 2048, 16, 16]       8,192     
BottleneckBlock-14  [[16, 1024, 32, 32]]   [16, 2048, 16, 16]         0       
     Conv2D-51      [[16, 2048, 16, 16]]   [16, 512, 16, 16]      1,048,576   
  BatchNorm2D-48     [[16, 512, 16, 16]]   [16, 512, 16, 16]        2,048     
      ReLU-17       [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
     Conv2D-52       [[16, 512, 16, 16]]   [16, 512, 16, 16]      2,359,296   
  BatchNorm2D-49     [[16, 512, 16, 16]]   [16, 512, 16, 16]        2,048     
     Conv2D-53       [[16, 512, 16, 16]]   [16, 2048, 16, 16]     1,048,576   
  BatchNorm2D-50    [[16, 2048, 16, 16]]   [16, 2048, 16, 16]       8,192     
BottleneckBlock-15  [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
     Conv2D-54      [[16, 2048, 16, 16]]   [16, 512, 16, 16]      1,048,576   
  BatchNorm2D-51     [[16, 512, 16, 16]]   [16, 512, 16, 16]        2,048     
      ReLU-18       [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
     Conv2D-55       [[16, 512, 16, 16]]   [16, 512, 16, 16]      2,359,296   
  BatchNorm2D-52     [[16, 512, 16, 16]]   [16, 512, 16, 16]        2,048     
     Conv2D-56       [[16, 512, 16, 16]]   [16, 2048, 16, 16]     1,048,576   
  BatchNorm2D-53    [[16, 2048, 16, 16]]   [16, 2048, 16, 16]       8,192     
BottleneckBlock-16  [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
AdaptiveAvgPool2D-2 [[16, 2048, 16, 16]]    [16, 2048, 1, 1]          0       
     Conv2D-57       [[16, 2048, 1, 1]]     [16, 128, 1, 1]        262,272    
      ReLU-19         [[16, 128, 1, 1]]     [16, 128, 1, 1]           0       
     Conv2D-58        [[16, 128, 1, 1]]     [16, 2048, 1, 1]       264,192    
AdaptiveMaxPool2D-2 [[16, 2048, 16, 16]]    [16, 2048, 1, 1]          0       
     Sigmoid-3       [[16, 2048, 1, 1]]     [16, 2048, 1, 1]          0       
     Conv2D-59        [[16, 2, 16, 16]]     [16, 1, 16, 16]          99       
     Sigmoid-4        [[16, 1, 16, 16]]     [16, 1, 16, 16]           0       
   CBAM_Module-2    [[16, 2048, 16, 16]]   [16, 2048, 16, 16]         0       
AdaptiveAvgPool2D-3 [[16, 2048, 16, 16]]    [16, 2048, 1, 1]          0       
     Linear-1           [[16, 2048]]           [16, 101]           206,949    
================================================================================
Total params: 24,295,343
Trainable params: 24,189,103
Non-trainable params: 106,240
--------------------------------------------------------------------------------
Input size (MB): 48.00
Forward/backward pass size (MB): 22449.39
Params size (MB): 92.68
Estimated Total Size (MB): 22590.07
--------------------------------------------------------------------------------






{'total_params': 24295343, 'trainable_params': 24189103}

4.模型训练

# 调用飞桨框架的VisualDL模块,保存信息到目录中。
callback = paddle.callbacks.VisualDL(log_dir='visualdl_log_dir')

def create_optim(parameters):
    step_each_epoch = len(train_dataset) // 32
    lr = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=0.01,
                                                  T_max=step_each_epoch * 10)

    return paddle.optimizer.Momentum(learning_rate=lr,
                                     parameters=parameters,
                                     weight_decay=paddle.regularizer.L2Decay(0.002))

# 模型训练配置
model.prepare(create_optim(model.parameters()),  # 优化器
              paddle.nn.CrossEntropyLoss(),        # 损失函数
              paddle.metric.Accuracy(topk=(1, 5))) # 评估指标

model.fit(train_dataset,
          val_dataset,
          epochs=10,
          shuffle=True, 
          save_dir='./chk_points/',
          batch_size=32,
          callbacks=callback,
          verbose=1)
The loss value printed in the log is the current step, and the metric is the average value of previous step.
Epoch 1/10


/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and


step 2525/2525 [==============================] - loss: 1.6131 - acc_top1: 0.4604 - acc_top5: 0.7048 - 1s/step         
save checkpoint at /home/aistudio/chk_points/0
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 2.3558 - acc_top1: 0.4844 - acc_top5: 0.7572 - 983ms/step         
Eval samples: 20200
Epoch 2/10
step 2525/2525 [==============================] - loss: 1.4524 - acc_top1: 0.6161 - acc_top5: 0.8601 - 1s/step        
save checkpoint at /home/aistudio/chk_points/1
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 2.2584 - acc_top1: 0.5791 - acc_top5: 0.8365 - 1s/step        
Eval samples: 20200
Epoch 3/10
step 2525/2525 [==============================] - loss: 1.3926 - acc_top1: 0.6323 - acc_top5: 0.8714 - 1s/step        
save checkpoint at /home/aistudio/chk_points/2
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 2.0664 - acc_top1: 0.4394 - acc_top5: 0.7272 - 966ms/step         
Eval samples: 20200
Epoch 4/10
step 2525/2525 [==============================] - loss: 0.9868 - acc_top1: 0.6557 - acc_top5: 0.8828 - 1s/step         
save checkpoint at /home/aistudio/chk_points/3
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 3.4569 - acc_top1: 0.5897 - acc_top5: 0.8445 - 955ms/step         
Eval samples: 20200
Epoch 5/10
step 2525/2525 [==============================] - loss: 1.4642 - acc_top1: 0.6878 - acc_top5: 0.9018 - 1s/step         
save checkpoint at /home/aistudio/chk_points/4
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 1.6742 - acc_top1: 0.6254 - acc_top5: 0.8488 - 969ms/step        
Eval samples: 20200
Epoch 6/10
step 2525/2525 [==============================] - loss: 0.8651 - acc_top1: 0.7369 - acc_top5: 0.9241 - 1s/step        
save checkpoint at /home/aistudio/chk_points/5
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 1.4554 - acc_top1: 0.6617 - acc_top5: 0.8814 - 965ms/step         
Eval samples: 20200
Epoch 7/10
step 2525/2525 [==============================] - loss: 0.8263 - acc_top1: 0.7954 - acc_top5: 0.9487 - 1s/step         
save checkpoint at /home/aistudio/chk_points/6
Eval begin...
The loss value printed in the log is the current batch, and the metric is the average value of previous step.
step 632/632 [==============================] - loss: 1.9954 - acc_top1: 0.7577 - acc_top5: 0.9257 - 951ms/step         
Eval samples: 20200
Epoch 8/10
step 1730/2525 [===================>..........] - loss: 0.5766 - acc_top1: 0.8649 - acc_top5: 0.9735 - ETA: 17:04 - 1s/ste
model.save('infer/foods', training=False)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/hapi/model.py:1738: UserWarning: 'inputs' was not specified when Model initialization, so the input shape to be saved will be the shape derived from the user's actual inputs. The input shape to be saved is [[32, 3, 512, 512]]. For saving correct input shapes, please provide 'inputs' for Model initialization.
  % self._input_info[0])

5.模型预测

# 进行预测操作
result = model.predict(val_dataset)
Predict begin...
step 20200/20200 [==============================] - 42ms/step         
Predict samples: 20200
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt

# 随机抽样展示
indexs = [11, 15, 201, 999, 1000, 5778, 6656, 9199, 10384, 20030]

def get_label(predict):
    label = list(categorys.keys())[list(categorys.values()).index(predict)]
    return label

# 定义画图方法
def show_img(img, predict):
    plt.figure()
    plt.title('predict: {}'.format(get_label(predict)))
    img = np.transpose(img, (2,0,1))
    img = np.transpose(img, (2,0,1))
    plt.imshow(img, cmap='gray')
    plt.show()

for idx in indexs:
t Image
import numpy as np
import matplotlib.pyplot as plt

# 随机抽样展示
indexs = [11, 15, 201, 999, 1000, 5778, 6656, 9199, 10384, 20030]

def get_label(predict):
    label = list(categorys.keys())[list(categorys.values()).index(predict)]
    return label

# 定义画图方法
def show_img(img, predict):
    plt.figure()
    plt.title('predict: {}'.format(get_label(predict)))
    img = np.transpose(img, (2,0,1))
    img = np.transpose(img, (2,0,1))
    plt.imshow(img, cmap='gray')
    plt.show()

for idx in indexs:
    show_img(val_dataset[idx][0], np.argmax(result[0][idx]))

请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述
请添加图片描述

请添加图片描述

四、总结与升华

一开始做这个项目的时候,我使用的模型是VGG,我尝试了VGG-11和VGG-16,但VGG的模型参数太多,训练较慢且效果不佳,于是我又尝试了其他网络,如MobileNetV1、MobileNetV2、ResNet等,但是在训练过程中,Loss一直停留在4.5左右降不下去,因此我改变思路,决定原有网络的基础上进行改进,因此,基于卷积块注意力模块的残差网络(ResNet-CBAM)由此诞生。

经过多轮调试,我最终选择了ResNet50-CBAM.在实践中,我发现网络并不是越深越好,网络加深的同时会带来计算量的增大,无论是训练还是预测,都会带来一些不好的影响,所以综合考虑之后,我选择在ResNet50的基础上增加注意力机制,并取得了一定的效果。

个人简介

北京联合大学 机器人学院 自动化专业 2018级 本科生 郑博培

百度飞桨开发者技术专家 PPDE

百度飞桨官方帮帮团、答疑团成员

深圳柴火创客空间 认证会员

百度大脑 智能对话训练师

我在AI Studio上获得至尊等级,点亮9个徽章,来互关呀!!!

https://aistudio.baidu.com/aistudio/personalcenter/thirdview/147378

已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 博客之星2020 设计师:CY__ 返回首页