当前位置：首页 > news >正文

Day 46 - 通道注意力机制

news 2026/7/5 8:37:30

一、引言：什么是注意力机制？

在深度学习中，注意力机制（Attention Mechanism）是一种让模型学会“选择性关注重要信息”的技术。

这就好比人类在看一张照片时，会自动忽略背景（如蓝天、草地），而聚焦于图片中的主体（如一只猫或一辆车）。

传统的卷积神经网络（CNN）对所有输入特征一视同仁，而注意力机制引入了“动态权重”的概念：

卷积：固定权重（训练好后卷积核参数不变），对局部特征进行加权求和。
注意力：动态权重（根据输入数据动态计算权重），输入不同，关注点不同。

为什么需要注意力？

不同任务对特征的需求不同：

识别物体：需要关注特定的纹理或形状（通道注意力）。
定位物体：需要关注物体所在的位置（空间注意力）。

二、特征图可视化 (Feature Map Visualization)

在深入注意力机制之前，我们需要先理解CNN“看”到了什么。通过可视化特征图（Feature Maps），我们可以直观地看到CNN在不同深度提取了哪些特征。

核心代码：特征图可视化函数

该函数利用 PyTorch 的hook机制捕获指定层的输出，并将其可视化。

import torch import numpy as np import matplotlib.pyplot as plt def visualize_feature_maps(model, test_loader, device, layer_names, num_images=5, num_channels=9): """ 可视化CNN各层的特征图 参数: model: 训练好的模型 test_loader: 测试数据加载器 device: 计算设备 layer_names: 需要可视化的层名称列表 (如 ['conv1', 'conv2']) num_images: 可视化的图片数量 num_channels: 每张图片显示的通道数 """ model.eval() images_list = [] labels_list = [] # 获取一批测试图像 for images, labels in test_loader: images_list.append(images) labels_list.append(labels) if len(images_list) * test_loader.batch_size >= num_images: break # 拼接并截取到目标数量 images = torch.cat(images_list, dim=0)[:num_images].to(device) labels = torch.cat(labels_list, dim=0)[:num_images].to(device) with torch.no_grad(): # 存储各层特征图 feature_maps = {} hooks = [] # 定义钩子函数 def hook(module, input, output, name): feature_maps[name] = output.cpu() # 注册钩子 for name in layer_names: module = getattr(model, name) hook_handle = module.register_forward_hook(lambda m, i, o, n=name: hook(m, i, o, n)) hooks.append(hook_handle) # 前向传播 _ = model(images) # 移除钩子 for hook_handle in hooks: hook_handle.remove() # 可视化绘图 for img_idx in range(num_images): # 还原原始图像用于对比 img = images[img_idx].cpu().permute(1, 2, 0).numpy() img = img * np.array([0.2023, 0.1994, 0.2010]).reshape(1, 1, 3) + np.array([0.4914, 0.4822, 0.4465]).reshape(1, 1, 3) img = np.clip(img, 0, 1) num_layers = len(layer_names) fig, axes = plt.subplots(1, num_layers + 1, figsize=(4 * (num_layers + 1), 4)) # 1. 显示原始图像 axes[0].imshow(img) axes[0].set_title(f'Original\\nLabel: {labels[img_idx].item()}') axes[0].axis('off') # 2. 显示各层特征图 for layer_idx, layer_name in enumerate(layer_names): fm = feature_maps[layer_name][img_idx] fm = fm[:num_channels] # 仅取前几个通道 # 计算子图网格布局 num_rows = int(np.sqrt(num_channels)) num_cols = num_channels // num_rows if num_rows != 0 else 1 layer_ax = axes[layer_idx + 1] layer_ax.set_title(f'{layer_name} Feature Maps') layer_ax.axis('off') # 在子图中绘制多个通道 for ch_idx, channel in enumerate(fm): ax = layer_ax.inset_axes([ch_idx % num_cols / num_cols, (num_rows - 1 - ch_idx // num_cols) / num_rows, 1/num_cols, 1/num_rows]) ax.imshow(channel.numpy(), cmap='viridis') ax.axis('off') plt.tight_layout() plt.show()

特征图解读

浅层卷积 (如 conv1): 保留较多细节纹理，类似边缘检测，能看清物体轮廓。
深层卷积 (如 conv3): 特征变得抽象，不再像原始图像。这些特征图代表了更高级的语义信息（如“有轮子”、“有翅膀”等概念），是模型分类的关键依据。

三、通道注意力 (Channel Attention) 详解

通道注意力旨在解决“关注什么”的问题。它通过显式地建模通道之间的依赖关系，自适应地重新校准通道的特征响应。

最经典的实现是SE 模块 (Squeeze-and-Excitation)。

SE 模块核心步骤

Squeeze (压缩): 使用全局平均池化（Global Average Pooling），将空间维度 ($H \times W$) 压缩为 $1 \times 1$。这相当于把每个通道的二维特征图浓缩成一个实数，代表该通道的全局分布。
Excitation (激发): 使用全连接层学习通道间的相关性，并通过 Sigmoid 生成权重（0~1之间）。
Scale (加权): 将生成的权重乘回原始特征图，增强重要通道，抑制无效通道。

代码实现：ChannelAttention 模块

import torch.nn as nn class ChannelAttention(nn.Module): """ 通道注意力模块 (SE Block) """ def __init__(self, in_channels, reduction_ratio=16): """ Args: in_channels: 输入通道数 reduction_ratio: 降维比率，用于减少全连接层参数量 """ super(ChannelAttention, self).__init__() # 1. Squeeze: 全局平均池化 self.avg_pool = nn.AdaptiveAvgPool2d(1) # 2. Excitation: 全连接层 -> ReLU -> 全连接层 -> Sigmoid self.fc = nn.Sequential( # 降维 nn.Linear(in_channels, in_channels // reduction_ratio, bias=False), nn.ReLU(inplace=True), # 升维回原通道数 nn.Linear(in_channels // reduction_ratio, in_channels, bias=False), # 输出权重 (0~1) nn.Sigmoid() ) def forward(self, x): batch_size, channels, height, width = x.size() # Step 1: 压缩空间维度 [B, C, H, W] -> [B, C, 1, 1] avg_pool_output = self.avg_pool(x) # Step 2: 展平并计算通道权重 [B, C] avg_pool_output = avg_pool_output.view(batch_size, channels) channel_weights = self.fc(avg_pool_output) # Step 3: 恢复维度以便广播 [B, C, 1, 1] channel_weights = channel_weights.view(batch_size, channels, 1, 1) # Step 4: 通道加权 return x * channel_weights

四、模型集成：在 CNN 中插入注意力模块

将ChannelAttention模块插入到卷积块之后、池化层之前，可以强化特征提取能力。

改进后的 CNN 模型结构

class CNN(nn.Module): def __init__(self): super(CNN, self).__init__() # --- Block 1 --- self.conv1 = nn.Conv2d(3, 32, 3, padding=1) self.bn1 = nn.BatchNorm2d(32) self.relu1 = nn.ReLU() # 插入注意力 self.ca1 = ChannelAttention(in_channels=32, reduction_ratio=16) self.pool1 = nn.MaxPool2d(2, 2) # --- Block 2 --- self.conv2 = nn.Conv2d(32, 64, 3, padding=1) self.bn2 = nn.BatchNorm2d(64) self.relu2 = nn.ReLU() # 插入注意力 self.ca2 = ChannelAttention(in_channels=64, reduction_ratio=16) self.pool2 = nn.MaxPool2d(2) # --- Block 3 --- self.conv3 = nn.Conv2d(64, 128, 3, padding=1) self.bn3 = nn.BatchNorm2d(128) self.relu3 = nn.ReLU() # 插入注意力 self.ca3 = ChannelAttention(in_channels=128, reduction_ratio=16) self.pool3 = nn.MaxPool2d(2) # --- Classifier --- self.fc1 = nn.Linear(128 * 4 * 4, 512) self.dropout = nn.Dropout(p=0.5) self.fc2 = nn.Linear(512, 10) def forward(self, x): # Block 1 x = self.conv1(x) x = self.bn1(x) x = self.relu1(x) x = self.ca1(x) # Apply Attention x = self.pool1(x) # Block 2 x = self.conv2(x) x = self.bn2(x) x = self.relu2(x) x = self.ca2(x) # Apply Attention x = self.pool2(x) # Block 3 x = self.conv3(x) x = self.bn3(x) x = self.relu3(x) x = self.ca3(x) # Apply Attention x = self.pool3(x) # Flatten & FC x = x.view(-1, 128 * 4 * 4) x = self.fc1(x) x = self.relu3(x) x = self.dropout(x) x = self.fc2(x) return x

训练策略：学习率调度

为了获得更好的收敛效果，使用了ReduceLROnPlateau调度器。

机制：当验证集指标（如 loss）不再下降时，自动减少学习率。
优势：训练初期使用较大 LR 快速下降，后期使用较小 LR 精细逼近最优解。

import torch.optim as optim optimizer = optim.Adam(model.parameters(), lr=0.001) scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='min', patience=3, factor=0.5 ) # 在每个 epoch 结束时更新 # scheduler.step(val_loss)