# ComputerVisionHomeWork **Repository Path**: karllzy/computer-vision-home-work ## Basic Information - **Project Name**: ComputerVisionHomeWork - **Description**: 用来存下机器视觉作业 - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-01-17 - **Last Updated**: 2021-01-17 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # 机器视觉教学内容整理 开源代码地址链接:[链接](https://gitee.com/karllzy/computer-vision-home-work) [TOC] ## 1. 图片读取与展示 **项目内容:** > 1. 通过MATLAB读取目录下的任1张图片并进行显示。 > > 2. 通过MATLAB循环读取并显示目录下图片并显示。 **重点指令:** > > 1. A = imread(filename) 从 filename 指定的文件读取图像,并从文件内容推断出其格式。如果 filename 为多图像文件,则 imread 读取该文件中的第一个图像。 > > 2. imshow(I) 在图窗中显示灰度图像 I。imshow 使用图像数据类型的默认显示范围,并优化图窗、坐标区和图像对象属性以便显示图像。 **项目代码:** ```matlab file_path = 'train images/A'; images_list = dir(file_path); num = size(images_list, 1)-2; while 1 for i = 1:num ext = strsplit(images_list(i+2).name, '.'); path = fullfile(file_path,images_list(i+2).name); img_temp = imread(path); imshow(img_temp); end end ``` ## 2. 中值滤波与均值滤波 **项目内容:** > 1. 对于给出的图片进行均值滤波和中值滤波 > > ![img](./pics/img.png) > > 2. 比较不同滤波核大小下的结果 **重点指令** > medfilt2中值滤波 > > imfilter图像滤波 **项目代码** 代码1 ```matlab raw_img = imread('img.png'); img = rgb2gray(raw_img); img_medfilt = medfilt2(img, [5 5]); diff_med = img - img_medfilt; histogram(diff_med, 10); figure();imshow(img);title('raw'); figure();imshow(img_medfilt);title('med') kernel = fspecial('average', 3); img_avg = filter2(kernel, img)/255; figure();imshow(img_avg);title('avg') ``` 代码2 ```matlab i = imread('img.png'); imshow(i); i_gray = rgb2gray(i); i = imread('img.png'); i_gray = rgb2gray(i); max_row = 4; for i=1:max_row kernel_size = 2*i+1; i_med = medfilt2(i_gray, [kernel_size kernel_size]); difference = i_gray - i_med; img_path = strcat('中值滤波', num2str(kernel_size),... 'kernel_size.png'); imwrite(i_med, img_path); subplot(max_row, 4, (i-1)*4+1);imshow(i_gray);title('raw img'); str = strcat(num2str(kernel_size) ,' kernel size med img'); subplot(max_row, 4, (i-1)*4+2);imshow(i_med);title(str); subplot(max_row, 4, (i-1)*4+3);imshow(difference);title('difference'); subplot(max_row, 4, (i-1)*4+4);histogram(difference);title('histogram') end savefig('中值滤波.fig') ``` 代码3 ```matlab i = imread('img.png'); i_gray = rgb2gray(i); max_row = 4; edges = [1:255]; for i=1:max_row kernel_size = 2*i+1; kernel = fspecial('average',kernel_size); i_avg = imfilter(i_gray, kernel); difference = i_gray - i_avg; img_path = strcat('均值滤波', num2str(kernel_size),... 'kernel_size.png'); imwrite(i_avg, img_path); subplot(max_row, 4, (i-1)*4+1);imshow(i_gray);title('raw img'); str = strcat(num2str(kernel_size) ,' avg img'); subplot(max_row, 4, (i-1)*4+2);imshow(i_avg);title(str); subplot(max_row, 4, (i-1)*4+3);imshow(difference);title('difference'); subplot(max_row, 4, (i-1)*4+4);histogram(difference, edges);title('histogram') end savefig('均值滤波.fig') ``` ## 3.色彩空间变换与花瓣变色项目 **项目内容:** > 对于给出的图片中花瓣部分的颜色进行变换 > > ![pictures](./pics/picture1.jpg) **重点指令** > rgb2hsv色彩空间变换 > > find寻找符合条件的目录位置 **项目代码** ```matlab img = imread('picture1.jpg'); imshow(img); img_hsi = rgb2hsv(img); histogram(img_hsi(:, :, 1)); h_channel = img_hsi(:, :, 1); histogram(h_channel) imshow(h_channel) idx1 = find(h_channel>0.55); idx2 = find(h_channel<0.75); idx = intersect(idx1, idx2); while 1 for i = 0:0.02:1 h_channel(idx) = i; img_hsi(:, :, 1) = h_channel; img_raw = hsv2rgb(img_hsi); imshow(img_raw) pause(0.01) end end ``` ## 3. 花的颜色聚类分析 **项目要求** > 与上一项目使用相同图片,对于图片中的色彩进行分类,使用LAB色彩空间区分出花瓣和其他物体 > > 自动聚类进行色彩分割 **项目代码** ```matlab % Step1. 加载图片 % 加载图片 flower = imread("pic_flower.jpg"); imshow(flower) title("Flower"); % Setp2. 获取各个颜色的选区 % 选择各个颜色的区域,获得颜色样本 nColors = 3; sample_regions = load("region.mat", 'sample_regions').sample_regions; % sample_regions = false([size(flower, 1), size(flower, 2), nColors]); % for count = 1:nColors % sample_regions(:, :, count) = roipoly(flower) % end % 选择完区域后展示选择区域 imshow(sample_regions(:,:,1)) title('紫色的选择区域') % 可以保存一下选择区域,省的每次都选择 % save('region.mat','sample_regions') % 将色彩空间转换到L*a*b*空间 lab_flower = rgb2lab(flower); % 计算各个颜色的a, b的均值 a = lab_flower(:, :, 2); b = lab_flower(:, :, 3); color_markers = zeros([nColors, 2]); for count = 1:nColors color_markers(count, 1) = mean2(a(sample_regions(:, :, count))); color_markers(count, 2) = mean2(b(sample_regions(:, :, count))); end 计算完成各个图像的均值后看一下颜色2均值的结果大概咋样 fprintf('[%0.3f,%0.3f] \n',color_markers(2,1),color_markers(2,2)); % Step 3. 将每个像素点分类到最近邻的一类上 % 计算每个颜色与color marker里记录的距离 % 0 = 紫色的花瓣, 1=绿色的叶子,2=黄色的花蕊 color_labels = 0:nColors-1; a = double(a); b = double(b); distance = zeros([size(a), nColors]); % 接下来开始分类 for count = 1:nColors distance(:,:,count) = ( (a - color_markers(count,1)).^2 + ... (b - color_markers(count,2)).^2 ).^0.5; end [~,label] = min(distance,[],3); label = color_labels(label); clear distance; Step 4: 展示最近邻分类的结果 rgb_label = repmat(label,[1 1 3]); segmented_images = zeros([size(flower), nColors],'uint8'); for count = 1:nColors color = flower; color(rgb_label ~= color_labels(count)) = 0; segmented_images(:,:,:,count) = color; end montage({segmented_images(:,:,:,1),segmented_images(:,:,:,2) ... segmented_images(:,:,:,3)}); title("紫色、绿色和黄色的拼接图") Step 5: 在平面上展示被分类出的'a*'和‘b*’的值 purple = [119/255 73/255 152/255]; plot_labels = {'k', 'r', 'g', purple, 'm', 'y'}; figure for count = 1:nColors plot(a(label==count-1),b(label==count-1),'.','MarkerEdgeColor', ... plot_labels{count}, 'MarkerFaceColor', plot_labels{count}); hold on; end title('划分出的像素颜色在a*b* 上的散点图'); xlabel('''a*'' 值'); ylabel('''b*'' 值'); ``` ## 4. 水果分割与识别任务 **项目要求** > 对于task 2_sementation目录下的水果图片进行分割并识别出分割后的图像 **项目代码** ```matlab 水果分割与识别任务 2020年11月04日 by l.z.y 读入图片 img = imread("task 2_sementation\citrus_fruits_01.png"); imshow(img); 1.首先将图像中的水果从图像里分割出来 1.1 LA*B*色彩空间与灰度色彩下的尝试 我们使用基于颜色的方法进行图像的分割,在分割出,所以可以先观察图像中的物体的色彩亮度阈值 img_lab = rgb2lab(img); histogram(img_lab(:, :, 1)); 通过观察,我们可以把与阈值选择在20左右进行尝试创建一个mask去观察我们的结果 area = find(img_lab(:, :, 1) > 20); % 然后我们构建一个mask进行观察 mask = zeros(size(img, 1), size(img, 2)); mask(area) = 255; imshow(mask); 通过观察我们发现,这个阈值取得并不好。水果有些比较暗的区域也被当做是背景了,我们把阈值取得小一点再进行观察。 area = find(img_lab(:, :, 1) > 15); % 然后我们构建一个mask进行观察 mask = zeros(size(img, 1), size(img, 2)); mask(area) = 255; imshow(mask); 在多次尝试后我们发现LA*B*色彩空间下的分类结果不尽如人意。为此,我们尝试转化为灰度图,这样可以将更多的有效信息纳入考量。 img_gray = rgb2gray(img); histogram(img_gray); 这里明显的选择48为值会比较好 area = find(img_gray(:, :, 1) > 48); % 然后我们构建一个mask进行观察 mask = zeros(size(img, 1), size(img, 2)); mask(area) = 255; imshow(mask); 跟上边一样,这里头有太多噪声了,而且不见得比上头的好,不过开运算做完应该也差不多了,所以我们就姑且用这个作为分割的依据吧。 为了去掉上边的洞洞和一些小噪点,我们使用开运算配合空洞填充把小的噪声去掉。 % strel函数的功能是运用各种形状和大小构造结构元素 se = strel('disk',5); % 这里是创建一个半径为5的平坦型圆盘结构元素 img_erode = imerode(mask,se); img_open = imdilate(img_erode, se); img_filled = imfill(img_open, 'holes'); imshow(img_filled); 从上图可以看出来,虽然边缘已经被大卷积核给弄得坑坑洼洼的了,但是至少还是能输出个数量,使用颜色的话,分类应该影响也不大。我们来数一下连通域的个数就可以得到想要的数量了。 CC = bwconncomp(img_filled); num = CC.NumObjects; disp(num); 从上边可以看到,数数的功能已经正常了,我们可以进入下一部分,识别。 当然,上述过程中还很不完美,主要是分割结果不太令人满意,所以,我们可以尝试下HSV色彩空间。 1.2 HSV色彩空间下的尝试 之前分割结果很不理想,存在不少的小区域,这会影响到连通域的统计。而且边缘在经过了开运算后坑坑洼洼的,小的区域都被腐蚀干净了,所以我们再试一下hsv色彩空间。 试了下HSV色彩空间的H通道之后,发现效果拔群。明显的,前景区域的H通道值要小于背景的。 img_hsv = rgb2hsv(img); imshow(img_hsv(:, :, 1)); histogram(img_hsv(:, :, 1)); 从上图中可以明显看出,背景和前景的区别是十分明显的。我们直接使用0.35作为阈值分割点将前景和背景进行区分。就可以得到如下的mask图像: area = img_hsv(:, :, 1) < 0.35; imshow(area); 通过计数图像中的连通域个数,就可以找出相应的水果个数了。 但是其实上述图中有些许小小的噪点(如果黑白翻转过来看就很明显了),所以这些噪点会让我们后续的识别出现问题。为了识别结果的完美,我们可以先加入一个开运算,吃掉一些小的点,防止对计数造成影响。 SE = strel('disk', 10, 4); % 碟形卷积核 area_o = imopen(area, SE); area_o = imfill(area_o, 'holes'); imshow(area_o); CC = bwconncomp(area_o); num = CC.NumObjects; disp(num) 这样就可以完美的完成计数与分割了,接着,我们再尝试一下边缘检测的方法。 1.3 边缘检测的方式寻找: 进行边缘检测分割的过程分为以下步骤: 图像滤波去除高次噪声 图像的边缘检测 图像mask的生成 1. 图像滤波 为了让后续的边缘寻找更为顺畅,我们先对图像进行滤波。 d0=100; %阈值 [M ,N]=size(img_gray); img_f = fft2(double(img_gray));%傅里叶变换得到频谱 img_f=fftshift(img_f); %移到中间 m_mid=floor(M/2);%中心点坐标 n_mid=floor(N/2); h = zeros(M,N);%高斯低通滤波器构造 for i = 1:M for j = 1:N d = ((i-m_mid)^2+(j-n_mid)^2); h(i,j) = exp(-d/(2*(d0^2))); end end img_lpf = h.*img_f; img_lpf=ifftshift(img_lpf); %中心平移回原来状态 img_lpf=uint8(real(ifft2(img_lpf))); %反傅里叶变换,取实数部分 2. 进行边缘检测 在进行滤波后,对图像进行边缘检测。 img_edges=edge(img_lpf,'Canny',0.05); imshow(img_edges); 可以看到,边缘检测出来的边缘不够理想,有些需要连接的线都没有连,这样如果进行fill hole的话就无法填满,效果如下图所示。 img_filled = imfill(img_edges, 'holes'); imshow(img_filled); 所以我们进行一个闭运算操作,把一些小的没有连接的点给连上。 SE = strel('disk', 4, 4); % 碟形卷积核 img_edge_c = imclose(img_edges, SE); imshow(img_edge_c); 连上以后我们把洞填上。 img_edge_c = imfill(img_edge_c, 'holes'); imshow(img_edge_c); 在填完洞以后我们还发现周围有不少的细小的边缘线,都是噪声,我们把这些噪声用一个开运算去掉。 SE = strel('disk', 10, 4); % 碟形卷积核 img_edge_o = imopen(img_edge_c, SE); imshow(img_edge_o); 可以看到,这一系列的操作后,虽然我们只是得到了一个边缘不完美的结果,但勉强也能数出个数量来。 CC = bwconncomp(img_edge_o); num = CC.NumObjects; disp(num) 综合上述过程,我们使用了LA*B*色彩空间中的亮度特征、HSV色彩空间中的H通道特征以及图像中的边缘特征。这三类不同的特征中,显然HSV色彩空间中的H通道显然对于物体的分割最为有效,分割边缘清晰,缺失点也较少。图像的亮度特征(L通道)经常受到物体自身的阴影影响,使得物体的边缘出现暗点,识别效果较差。图像的边缘特征,也容易受到图像中的阴影影响(尤其是在图5中),影响会造成阴影边缘也被检测,并且水果阴影会连接两个水果,使得识别出错。故综合考虑多个特征后,确定使用H通道作为物体分割的特征依据。 2. 水果的种类识别 水果的种类识别思路就是看颜色接近那种水果就是哪种水果,所以我们的识别算法需要分为以下两个步骤: 1. 训练过程:水果颜色均值的计算 2. 预测过程:计算水果颜色的均值,然后看更接近哪种水果就是哪种。 2.1. 训练过程 在训练过程中,我们选取图像中的部分区域然后计算出色彩的均值。 第一类为lemon,第二类为orange。 clear; clc; img = imread("task 2_sementation\citrus_fruits_05.png"); nColors = 2; sample_regions = load("region_20201104.mat", 'sample_regions').sample_regions; % sample_regions = false([size(img, 1) size(img, 2) nColors]); % for count = 1:nColors % sample_regions(:,:,count) = roipoly(img); % end subplot(2, 2, [1, 2]); imshow(img); title("Picture") subplot(2, 2, 3); imshow(sample_regions(:, : ,1)); title('Sample Region for Lemon'); subplot(2, 2, 4); imshow(sample_regions(:, :, 2)); title("Sample Region for Orange"); 上图所示就是我们选择出来的色彩较为明显的选区了。为了省事,我们可以把相关的选区存一下。 % save('region_20201104.mat','sample_regions'); 为了比较色彩,我们使用色彩空间转换,变换到LA*B*色彩空间,然后只关注A*通道和B*通道就可以了。 我们做个color_markers作为色彩的标签,用来当做之后分类的比色卡。 color_markers = zeros([nColors, 2]); img_lab = rgb2lab(img); a = img_lab(:,:,2); b = img_lab(:,:,3); for count = 1:nColors color_markers(count,1) = mean2(a(sample_regions(:,:,count))); color_markers(count,2) = mean2(b(sample_regions(:,:,count))); end save("color_markers_20201104.mat", "color_markers"); 我们可以画图看到这两类颜色在该空间平面上的映射结果。 figure(); plot(color_markers(:, 1), color_markers(:, 2), 'o'); 可以看到的是,这两种颜色在a*和b*通道的位置是不一样的,因此我们可以简单的用欧氏距离来衡量每种水果与相应色彩的距离来判定所属类别。 2.2. 预测过程 在进行预测时,我们需要使用上一部分的结果。将水果从图像中分割出来然后再进行判别: % 水果分割过程 img_hsv = rgb2hsv(img); area = img_hsv(:, :, 1) < 0.35; SE = strel('disk', 10, 4); % 碟形卷积核 area_o = imopen(area, SE); area_o = imfill(area_o, 'holes'); CC = bwconncomp(area_o); num = CC.NumObjects; a = double(a); b = double(b); distance = zeros([num, nColors]); for i = 1:num mean_a = mean(a(CC.PixelIdxList{i})); mean_b = mean(b(CC.PixelIdxList{i})); for count = 1:nColors distance(i, count) = ((mean_a - color_markers(count, 1)).^2 + (mean_b - color_markers(count, 2)).^2).^0.5; end end [value, label] = min(distance, [], 2); disp(label) 2.3. 显示预测结果 为了显示预测结果,我们在图像中的相应位置画个框框。 figure(); imshow(img);hold on; label_trans = ["lemon", "orange"]; % 用于进行识别结果的转义 for i = 1:num % 先绘制bounding box [row, col] = ind2sub([size(img,1),size(img, 2)], CC.PixelIdxList{i}); x_sw = min(col);y_sw = min(row); x_ne = max(col);y_ne = max(row); w = abs(x_ne - x_sw); h = abs(y_ne - y_sw); rectangle("Position",[x_sw, y_sw, w, h], 'LineWidth', 3, "EdgeColor",'r'); % 然后把相应的类别写上 fruit_name = label_trans(label(i)); text(x_sw+15, y_sw+15, fruit_name, 'Color', 'r','FontSize', 20) end % 图片的标题使用些统计信息 total_num = size(label, 1); lemon_num = length(find(label==1)); orange_num = total_num - lemon_num; title(strcat("Found ", num2str(total_num), " Fruits: ", num2str(lemon_num)," Lemon and ", num2str(orange_num), " Orange")); 3. 封装与循环: 完成了单个图像的读取与分析后,将该过程进行重复。构成一个循环读文件夹下图片的体系: % 在mlx文件中,图片的显示会存在问题,建议创建一个m文件放置以下代码,这样就可以循环显示了。 file_path = '.\task 2_sementation'; images_list = dir(file_path); % 列出当前文件夹下的图片列表 img_num = size(images_list, 1)-2; % 将 . 和 .. 去掉 % 加载保存的color_markers color_markers = load("color_markers_20201104.mat",... 'color_markers').color_markers; nColors = size(color_markers, 1); label_trans = ["lemon", "orange"]; % 用于进行识别结果的转义 figure(); while 1 for image_count = 1:img_num ext = strsplit(images_list(image_count+2).name, '.'); path = fullfile(file_path,images_list(image_count+2).name); img = imread(path); % 水果分割与计数过程 img_hsv = rgb2hsv(img); area = img_hsv(:, :, 1) < 0.35; SE = strel('disk', 10, 4); % 碟形卷积核 area_o = imopen(area, SE); area_o = imfill(area_o, 'holes'); CC = bwconncomp(area_o); num = CC.NumObjects; % 水果识别过程 img_lab = rgb2lab(img); a = img_lab(:,:,2); b = img_lab(:,:,3); a = double(a); b = double(b); distance = zeros([num, nColors]); for fruit_count = 1:num mean_a = mean(a(CC.PixelIdxList{fruit_count})); mean_b = mean(b(CC.PixelIdxList{fruit_count})); for count = 1:nColors distance(fruit_count, count) = ... ((mean_a - color_markers(count,1)).^2 +... (mean_b - color_markers(count, 2)).^2).^0.5; end end [value, label] = min(distance, [], 2); imshow(img);hold on; for i = 1:num % 先绘制bounding box [row, col] = ind2sub([size(img,1),size(img, 2)], CC.PixelIdxList{i}); x_sw = min(col);y_sw = min(row); x_ne = max(col);y_ne = max(row); w = abs(x_ne - x_sw); h = abs(y_ne - y_sw); rectangle("Position",[x_sw, y_sw, w, h], 'LineWidth', 3, "EdgeColor",'r'); % 然后把相应的类别写上 fruit_name = label_trans(label(i)); text(x_sw+15, y_sw+15, fruit_name, 'Color', 'g','FontSize', 20) end % 图片的标题使用些统计信息 total_num = size(label, 1); lemon_num = length(find(label==1)); orange_num = total_num - lemon_num; title(strcat("Found ", num2str(total_num), " Fruits: ",... num2str(lemon_num)," Lemon and ", num2str(orange_num),... " Orange")); % 暂停5秒等等用户看看 pause(5); end end ``` ## 5. 人脸朝向识别 **项目要求** > 对于给出的Images目录下的人脸图片,识别人脸的朝向 **项目代码** ```matlab 人脸朝向识别 方案1:图像重心识别,因为人脸的边缘基本集中在人脸的前面。所以通过提取边缘的方式,找到边缘相对集中的区域,通过观察边缘线的分布情况,寻找人脸的朝向。 img = imread("./Images/1_1.bmp"); imshow(img); 1.人脸边缘特征提取 1.1 边缘提取 我们使用Canny算子寻找图像的边缘 figure(); subplot(1, 3, 1); img_edges = edge(img, 'Canny', 0.05); imshow(img_edges); subplot(1, 3, 2); img_edges = edge(img, "Canny", 0.1); imshow(img_edges); subplot(1, 3, 3); img_edges = edge(img, "Canny", 0.2); imshow(img_edges); 经过多次,实验,我们发现参数设置地稍微大一点,可以使得头发、皱纹等这样的弱边缘被有效地剔除(不被记作边缘),所以我们在这里取参数0.2,进行后续图像的边缘识别。 1.2 边缘特征降维 1.去除行方向降维 我们在上一步操作中得到了人脸的边缘,下一步,我们将人脸的边缘进行计数,方式也很简单,就是沿着图像的列方向求和,过程如下,(为了引入对比,我们加入了图片2) img2 = imread("./Images/1_5.bmp"); img2_edges = edge(img2, "Canny", 0.2); figure(); subplot(3, 2, 1); imshow(img); subplot(3, 2, 2); imshow(img2); subplot(3, 2, 3); imshow(img_edges); subplot(3, 2, 4); imshow(img2_edges); subplot(3, 2, 5); img_edge_feature = sum(img_edges, 1); plot(img_edge_feature); subplot(3, 2, 6); img2_edge_feature = sum(img2_edges, 1); plot(img2_edge_feature); disp(size(img)); 2. 列方向合并降维 可以看到,明显的,这幅图像的边缘分布是向右偏的,我们将这个特征的维度降低下,每84个为一个格子。这样,原本420的特征空间就可以降低为一个5维的特征空间。我们的特征项向量也就转化为了5维的特征vector。 feature = sum(reshape(img_edge_feature, 84, 5), 1); figure(); plot(feature); 如此看来,我们的特征已经非常的明显了呢。接下来我们进行识别分类,在进行分类前我们做了一个小小的尝试,观察是否方向可以直接用边缘最多的区域来表示,也就是5个值中最大的恰好表示了方向。 2. 人脸特征识别分类 2.1 猜想与特征验证 首先,验证一个猜想,人脸的特征已经足够简单明晰了,我们能不能在这个人脸特征上求取重心来获取朝向呢?猜想的试验过程如下: number = [1, 2, 3, 4, 5]'; feature_norm = feature./sum(feature); weight_center = feature_norm * number; disp(weight_center); 为了验证猜想,我们需要构建一个方便的结果展示过程,那么就不再使用实时脚本,而建立一个m脚本文件文件名是Untiled20201111.m。 但是最终效果并不是特别理想,有一定数量的错误。因此我们还是要使用神经网络的,并且通过这个小尝试,我们得知需要的特征维度可以大一些,但就尝试中的结果来看,不需要增加太多维度。 我们最终选择的维度数为10,则特征提取方法如下: feature = sum(reshape(img_edge_feature, 42, 10), 1); feature_norm = feature ./ sum(feature); figure(); plot(feature_norm); 2.2 批量特征提取与数据准备 lesson 20201125 我们借鉴了一下群里的faceOrient.m文件来循环生成数据,代码如下: listPng=dir('Images/*.bmp'); % 获取图片文件夹下的图片文件 len=length(listPng); % 总的图片数量 b = zeros(len, 11); % 10维特征及相应1维目标的存储矩阵 nClass = 5; % 总的输出类别有5类 for indx=1:1:len fName=listPng(indx).name; class = split(fName,'.bmp'); % 根据文件名 取类别 fName = strcat('./Images/',fName); str = cell2mat(class(1)); cls = str2double(str(end)); % 类别1~5 % 提取图片特征 img = imread(fName); img_edges = edge(img, "Canny", 0.2); % 提取边缘 img_edge_feature = sum(img_edges, 1); % 行方向上的合并 feature = sum(reshape(img_edge_feature, 42, 10), 1); % 列方向上的特征压缩 feature_norm = feature ./ sum(feature); % normalized % 把结果存一下 b(indx, 1:10) = feature_norm; b(indx, 11) = cls; % pause(0.5) % plot(feature_norm) % title(cls) end 接下来,我们把网络的输出做一下,做成onehot code的形式: X = b(:, 1:10); I = eye(nClass); target = I(b(:, 11), :); 3. 神经网络训练与测试 3.1 神经网络训练 使用神经网络工具箱即可完成所需的神经网络构建,构建过程如下: nnstart 网络在训练过程中选择的输入是X, 输出是target。经过训练后,该网络在测试集上的精度很高,我们将它部署到代码中,生成了lesson20201111_neural.m文件,调用这个文件可以获得相应的结果,我们来试试。 3.2 神经网络的测试 img = imread("./Images/5_2.bmp"); feaure = lesson20201111_feature(img); result = lesson20201111_neural(feaure); [val, cls] = max(result); imshow(img); title(num2str(cls)); 方案2:识别人眼位置,根据人眼位置进行识别; img = imread("./Images/1_1.bmp"); imshow(img) 1. 识别人眼位置 从图中可以看出,人眼的特征在于人眼的色彩较暗,即灰度值较小。所以我们先做出如下尝试。 histogram(img); 可以看出,灰度信息比较小的部分占比挺小的,大概在50以下,我们只显示这部分看看; img_eye = zeros(size(img)); img_eye(img<50) = 1; imshow(img_eye); 然后我们发现,这个人的鼻孔也挺黑的。所以我们直接就忽略那部分吧。 ``` ## 6. 迁移学习手势识别项目 **项目要求** > 对于train images\目录下的手势进行分类识别 **项目代码** ```matlab Get Started with Transfer Learning This example shows how to use transfer learning to retrain ResNet-18, a pretrained convolutional neural network, to classify a new set of images. Try this example to see how simple it is to get started with deep learning in MATLAB®. Transfer learning is commonly used in deep learning applications. You can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images. Load Data Unzip and load the new images as an image datastore. Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. % unzip('MerchData.zip'); imds = imageDatastore('train images\','IncludeSubfolders',true,'LabelSource','foldernames'); [imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized'); Load Pretrained Network Load the pretrained ResNet-18 network. If Deep Learning Toolbox™ Model for ResNet-18 Network is not installed, then the software provides a download link. ResNet-18 has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input and outputs a label for the object in the image together with the probabilities for each of the object categories. To perform transfer learning using different pretrained networks, see Train Deep Learning Network to Classify New Images. net = resnet18; Replace Final Layers To retrain ResNet-18 to classify new images, replace the last fully connected layer and the final classification layer of the network. In ResNet-18, these layers have the names 'fc1000' and 'ClassificationLayer_predictions', respectively. Set the new fully connected layer to have the same size as the number of classes in the new data set (5, in this example). To learn faster in the new layers than in the transferred layers, increase the learning rate factors of the fully connected layer. numClasses = numel(categories(imdsTrain.Labels)); lgraph = layerGraph(net); newFCLayer = fullyConnectedLayer(numClasses,'Name','new_fc','WeightLearnRateFactor',10,'BiasLearnRateFactor',10); lgraph = replaceLayer(lgraph,'fc1000',newFCLayer); newClassLayer = classificationLayer('Name','new_classoutput'); lgraph = replaceLayer(lgraph,'ClassificationLayer_predictions',newClassLayer); Train Network The network requires input images of size 224-by-224-by-3, but the images in the image datastore have different sizes. Use an augmented image datastore to automatically resize the training images. You can also use an imageDataAugmenter to specify additional augmentation operations to perform on the training images to help prevent the network from overfitting. inputSize = net.Layers(1).InputSize; augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain); augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation); Specify the training options, including mini-batch size and validation data. Set InitialLearnRate to a small value to slow down learning in the transferred layers. In the previous step, you increased the learning rate factors for the fully connected layer to speed up learning in the new final layers. This combination of learning rate settings results in fast learning only in the new layers and slower learning in the other layers. options = trainingOptions('sgdm', ... 'MiniBatchSize',10, ... 'MaxEpochs',8, ... 'InitialLearnRate',1e-4, ... 'Shuffle','every-epoch', ... 'ValidationData',augimdsValidation, ... 'ValidationFrequency',5, ... 'Verbose',false, ... 'Plots','training-progress'); Train the network using the training data. By default, trainNetwork uses a GPU if one is available (requires Parallel Computing Toolbox™ and a CUDA® enabled GPU with compute capability 3.0 or higher). Otherwise, it uses a CPU. trainedNet = trainNetwork(augimdsTrain,lgraph,options); Classify Validation Images Classify the validation images using the fine-tuned network, and calculate the classification accuracy. YPred = classify(trainedNet,augimdsValidation); accuracy = mean(YPred == imdsValidation.Labels) Learn More To perform transfer learning using different pretrained networks, see Train Deep Learning Network to Classify New Images. To try transfer learning using the Deep Network Designer app, see Transfer Learning with Deep Network Designer. For a list and comparison of the pretrained networks, see Pretrained Deep Neural Networks. Copyright 2018 The MathWorks, Inc. ``` ## 7. 创建全新的神经网络识别手势 **项目要求** > 不使用预训练的网络,使用简单的卷积神经网络尝试分类手势(最终网络识别率较低) **项目代码** ```matlab Create Simple Deep Learning Network for Classification This example shows how to create and train a simple convolutional neural network for deep learning classification. Convolutional neural networks are essential tools for deep learning, and are especially suited for image recognition. The example demonstrates how to: Load and explore image data. Define the network architecture. Specify training options. Train the network. Predict the labels of new data and calculate the classification accuracy. Load and Explore Image Data Load the digit sample data as an image datastore. imageDatastore automatically labels the images based on folder names and stores the data as an ImageDatastore object. An image datastore enables you to store large image data, including data that does not fit in memory, and efficiently read batches of images during training of a convolutional neural network. % digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ... % 'nndatasets','DigitDataset'); % imds = imageDatastore(digitDatasetPath, ... % 'IncludeSubfolders',true,'LabelSource','foldernames'); imds = imageDatastore('train images\','IncludeSubfolders',true,'LabelSource','foldernames'); Display some of the images in the datastore. figure; perm = randperm(1000,20); for i = 1:20 subplot(4,5,i); imshow(imds.Files{perm(i)}); end Calculate the number of images in each category. labelCount is a table that contains the labels and the number of images having each label. The datastore contains 1000 images for each of the digits 0-9, for a total of 10000 images. You can specify the number of classes in the last fully connected layer of your network as the OutputSize argument. labelCount = countEachLabel(imds) You must specify the size of the images in the input layer of the network. Check the size of the first image in digitData. Each image is 28-by-28-by-1 pixels. img = readimage(imds,1); size(img) Specify Training and Validation Sets Divide the data into training and validation data sets, so that each category in the training set contains 750 images, and the validation set contains the remaining images from each label. splitEachLabel splits the datastore digitData into two new datastores, trainDigitData and valDigitData. numTrainFiles = 35; [imdsTrain,imdsValidation] = splitEachLabel(imds,numTrainFiles,'randomize'); Define Network Architecture Define the convolutional neural network architecture. layers = [ imageInputLayer([480 640 3]) convolution2dLayer(3,8,'Padding','same') batchNormalizationLayer reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,16,'Padding','same') batchNormalizationLayer reluLayer maxPooling2dLayer(2,'Stride',2) convolution2dLayer(3,32,'Padding','same') batchNormalizationLayer reluLayer fullyConnectedLayer(26) softmaxLayer classificationLayer]; Image Input Layer An imageInputLayer is where you specify the image size, which, in this case, is 28-by-28-by-1. These numbers correspond to the height, width, and the channel size. The digit data consists of grayscale images, so the channel size (color channel) is 1. For a color image, the channel size is 3, corresponding to the RGB values. You do not need to shuffle the data because trainNetwork, by default, shuffles the data at the beginning of training. trainNetwork can also automatically shuffle the data at the beginning of every epoch during training. Convolutional Layer In the convolutional layer, the first argument is filterSize, which is the height and width of the filters the training function uses while scanning along the images. In this example, the number 3 indicates that the filter size is 3-by-3. You can specify different sizes for the height and width of the filter. The second argument is the number of filters, numFilters, which is the number of neurons that connect to the same region of the input. This parameter determines the number of feature maps. Use the 'Padding' name-value pair to add padding to the input feature map. For a convolutional layer with a default stride of 1, 'same' padding ensures that the spatial output size is the same as the input size. You can also define the stride and learning rates for this layer using name-value pair arguments of convolution2dLayer. Batch Normalization Layer Batch normalization layers normalize the activations and gradients propagating through a network, making network training an easier optimization problem. Use batch normalization layers between convolutional layers and nonlinearities, such as ReLU layers, to speed up network training and reduce the sensitivity to network initialization. Use batchNormalizationLayer to create a batch normalization layer. ReLU Layer The batch normalization layer is followed by a nonlinear activation function. The most common activation function is the rectified linear unit (ReLU). Use reluLayer to create a ReLU layer. Max Pooling Layer Convolutional layers (with activation functions) are sometimes followed by a down-sampling operation that reduces the spatial size of the feature map and removes redundant spatial information. Down-sampling makes it possible to increase the number of filters in deeper convolutional layers without increasing the required amount of computation per layer. One way of down-sampling is using a max pooling, which you create using maxPooling2dLayer. The max pooling layer returns the maximum values of rectangular regions of inputs, specified by the first argument, poolSize. In this example, the size of the rectangular region is [2,2]. The 'Stride' name-value pair argument specifies the step size that the training function takes as it scans along the input. Fully Connected Layer The convolutional and down-sampling layers are followed by one or more fully connected layers. As its name suggests, a fully connected layer is a layer in which the neurons connect to all the neurons in the preceding layer. This layer combines all the features learned by the previous layers across the image to identify the larger patterns. The last fully connected layer combines the features to classify the images. Therefore, the OutputSize parameter in the last fully connected layer is equal to the number of classes in the target data. In this example, the output size is 10, corresponding to the 10 classes. Use fullyConnectedLayer to create a fully connected layer. Softmax Layer The softmax activation function normalizes the output of the fully connected layer. The output of the softmax layer consists of positive numbers that sum to one, which can then be used as classification probabilities by the classification layer. Create a softmax layer using the softmaxLayer function after the last fully connected layer. Classification Layer The final layer is the classification layer. This layer uses the probabilities returned by the softmax activation function for each input to assign the input to one of the mutually exclusive classes and compute the loss. To create a classification layer, use classificationLayer. Specify Training Options After defining the network structure, specify the training options. Train the network using stochastic gradient descent with momentum (SGDM) with an initial learning rate of 0.01. Set the maximum number of epochs to 4. An epoch is a full training cycle on the entire training data set. Monitor the network accuracy during training by specifying validation data and validation frequency. Shuffle the data every epoch. The software trains the network on the training data and calculates the accuracy on the validation data at regular intervals during training. The validation data is not used to update the network weights. Turn on the training progress plot, and turn off the command window output. options = trainingOptions('sgdm', ... 'InitialLearnRate',0.01, ... 'MaxEpochs',3, ... 'Shuffle','every-epoch', ... 'ValidationData',imdsValidation, ... 'ValidationFrequency',30, ... 'Verbose',false, ... 'MiniBatchSize', 10, ... 'Plots','training-progress'); Train Network Using Training Data Train the network using the architecture defined by layers, the training data, and the training options. By default, trainNetwork uses a GPU if one is available (requires Parallel Computing Toolbox™ and a CUDA® enabled GPU with compute capability 3.0 or higher). Otherwise, it uses a CPU. You can also specify the execution environment by using the 'ExecutionEnvironment' name-value pair argument of trainingOptions. The training progress plot shows the mini-batch loss and accuracy and the validation loss and accuracy. For more information on the training progress plot, see Monitor Deep Learning Training Progress. The loss is the cross-entropy loss. The accuracy is the percentage of images that the network classifies correctly. net = trainNetwork(imdsTrain,layers,options); Classify Validation Images and Compute Accuracy Predict the labels of the validation data using the trained network, and calculate the final validation accuracy. Accuracy is the fraction of labels that the network predicts correctly. In this case, more than 99% of the predicted labels match the true labels of the validation set. YPred = classify(net,imdsValidation); YValidation = imdsValidation.Labels; accuracy = sum(YPred == YValidation)/numel(YValidation) Copyright 2012 The MathWorks, Inc. ``` 11323456789 qwertyuiop[]\asdfgqwertyuiop[]\asdfg