【原】貓狗大戰(zhàn)分類TensorFlow實(shí)戰(zhàn)分享

新用戶8173JS52 2021-06-05

展開全文

點(diǎn)擊上方“機(jī)器學(xué)習(xí)愛好者社區(qū)”

選擇“星標(biāo)”公眾號(hào)，重磅干貨，第一時(shí)間送達(dá)

Cats vs. Dogs(貓狗大戰(zhàn))是Kaggle大數(shù)據(jù)競(jìng)賽某一年的一道賽題，利用給定的數(shù)據(jù)集，用算法實(shí)現(xiàn)貓和狗的識(shí)別。數(shù)據(jù)集可以從Kaggle官網(wǎng)上下載，即https://www./c/dogs-vs-cats。數(shù)據(jù)集由訓(xùn)練數(shù)據(jù)和測(cè)試數(shù)據(jù)組成，訓(xùn)練數(shù)據(jù)包含貓和狗各12500張圖片，測(cè)試數(shù)據(jù)包含12500張貓和狗的圖片。
??首先在Pycharm上新建Cats_vs_Dogs工程，工程目錄結(jié)構(gòu)為：

data文件夾下包含test和train兩個(gè)子文件夾，分別用于存放測(cè)試數(shù)據(jù)和訓(xùn)練數(shù)據(jù)。
logs文件夾用于存放我們訓(xùn)練時(shí)的模型結(jié)構(gòu)以及訓(xùn)練參數(shù)。
input_data.py負(fù)責(zé)實(shí)現(xiàn)讀取數(shù)據(jù)，生成批次(batch)。
model.py負(fù)責(zé)實(shí)現(xiàn)我們的神經(jīng)網(wǎng)絡(luò)模型。
training.py負(fù)責(zé)實(shí)現(xiàn)模型的訓(xùn)練以及評(píng)估。

接下來分成數(shù)據(jù)讀取、模型構(gòu)造、模型訓(xùn)練、測(cè)試模型四個(gè)部分來講。

訓(xùn)練數(shù)據(jù)的讀取(input_data.py)

??首先需要引入如下模塊：

import tensorflow as tf
import numpy as np
import os

因?yàn)槲覀冃枰@取test目錄下的文件，所以要導(dǎo)入os模塊。

# 獲取文件路徑和標(biāo)簽，file_dir是文件夾路徑，該函數(shù)返回亂序后的圖片和標(biāo)簽
def get_files(file_dir):
    cats = []
    label_cats = []
    dogs = []
    label_dogs = []

    for file in os.listdir(file_dir):  # 載入數(shù)據(jù)路徑并寫入標(biāo)簽值
        name = file.split(sep='.')

        if name[0] == 'cat':
            cats.append(file_dir + file)
            label_cats.append(0)
        else:
            dogs.append(file_dir + file)
            label_dogs.append(1)

    print("There are %d cats\nThere are %d dogs" % (len(cats), len(dogs)))

    # 打亂文件順序
    image_list = np.hstack((cats, dogs))
    label_list = np.hstack((label_cats, label_dogs))
    temp = np.array([image_list, label_list])
    temp = temp.transpose()  # 轉(zhuǎn)置
    np.random.shuffle(temp)

    image_list = list(temp[:, 0])
    label_list = list(temp[:, 1])
    label_list = [int(i) for i in label_list]

    return image_list, label_list

??函數(shù)get_files的功能是獲取給定路徑file_dir下的所有的訓(xùn)練數(shù)據(jù)(包括圖片和標(biāo)簽)，以list的形式返回。由于訓(xùn)練數(shù)據(jù)前12500張是貓，后12500張是狗，如果直接按這個(gè)順序訓(xùn)練，訓(xùn)練效果可能會(huì)受影響(猜測(cè)的)，所以需要將順序打亂。因?yàn)閳D片和標(biāo)簽是一一對(duì)應(yīng)的，所以要整合到一起亂序。
??這里先用np.hstack方法將貓和狗圖片和標(biāo)簽整合到一起，得到image_list和label_list，hstack((a,b))的功能是將a和b以水平的方式連接，比如原來cats和dogs是長度為12500的向量，執(zhí)行了hstack(cats, dogs)后，image_list的長度為25000，同理label_list的長度也為25000。接著將一一對(duì)應(yīng)的image_list和label_list再合并一次。temp的大小是2 * 25000，經(jīng)過轉(zhuǎn)置(變成25000 * 2)，然后使用np.random.shuffle方法進(jìn)行亂序。
??最后從temp中分別取出亂序后的image_list和label_list列向量，作為函數(shù)的返回值。這里要注意，因?yàn)?code style="padding: 2px 4px;outline: 0px;font-family: "Source Code Pro", "DejaVu Sans Mono", "Ubuntu Mono", "Anonymous Pro", "Droid Sans Mono", Menlo, Monaco, Consolas, Inconsolata, Courier, monospace, "PingFang SC", "Microsoft YaHei", sans-serif;font-size: 14px;line-height: 22px;color: rgb(199, 37, 78);background-color: rgb(249, 242, 244);border-radius: 2px;">label_list里面的數(shù)據(jù)類型是字符串類型，所以加上label_list = [int(i) for i in label_list]這么一行將其轉(zhuǎn)為int類型。

# 生成相同大小的批次，參數(shù)capacity隊(duì)列容量，返回值是圖像和標(biāo)簽的batch
def get_batch(image, label, image_W, image_H, batch_size, capacity):
    # 將python.list類型轉(zhuǎn)換成tf能夠識(shí)別的格式
    image = tf.cast(image, tf.string)
    label = tf.cast(label, tf.int32)

    input_queue = tf.train.slice_input_producer([image, label])  # 生成隊(duì)列
    image_contents = tf.read_file(input_queue[0])
    label = input_queue[1]
    image = tf.image.decode_jpeg(image_contents, channels=3)
    # 統(tǒng)一圖片大小
    image = tf.image.resize_images(image, [image_H, image_W], \
                                   method=tf.image.ResizeMethod.NEAREST_NEIGHBOR)
    image = tf.cast(image, tf.float32)
    image_batch, label_batch = tf.train.batch([image, label], batch_size=batch_size, \
                                              num_threads=64, capacity=capacity)

    # label_batch = tf.reshape(label_batch, [batch_size])
    return image_batch, label_batch

??函數(shù)get_batch用于將圖片分批次，因?yàn)橐淮涡詫⑺?code style="padding: 2px 4px;outline: 0px;font-family: "Source Code Pro", "DejaVu Sans Mono", "Ubuntu Mono", "Anonymous Pro", "Droid Sans Mono", Menlo, Monaco, Consolas, Inconsolata, Courier, monospace, "PingFang SC", "Microsoft YaHei", sans-serif;font-size: 14px;line-height: 22px;color: rgb(199, 37, 78);background-color: rgb(249, 242, 244);border-radius: 2px;">25000張圖片載入內(nèi)存不現(xiàn)實(shí)也不必要，所以將圖片分成不同批次進(jìn)行訓(xùn)練。對(duì)于把訓(xùn)練數(shù)據(jù)集設(shè)置成一個(gè)個(gè)batch，其解釋為：如果損失函數(shù)是非凸的話，整個(gè)訓(xùn)練樣本盡管算的動(dòng)，可能會(huì)卡在局部最優(yōu)解上；分批訓(xùn)練表示全樣本的抽樣實(shí)現(xiàn)，也就是相當(dāng)于人為地引入了修正梯度上的采樣噪聲，使得一路不同，找別路的方法，更有可能搜索到全局最優(yōu)解。這里傳入的image和label參數(shù)就是函數(shù)get_files返回的image_list和label_list，是python中的list類型，所以需要將其轉(zhuǎn)為TensorFlow可以識(shí)別的tensor格式。
??這里使用隊(duì)列來獲取數(shù)據(jù)，因?yàn)殛?duì)列操作牽扯到線程，這里引用了一張圖解釋：

??我認(rèn)為大體上可以這么理解：每次訓(xùn)練時(shí)，從隊(duì)列中取一個(gè)batch送到網(wǎng)絡(luò)進(jìn)行訓(xùn)練，然后又有新的圖片從訓(xùn)練庫中注入隊(duì)列，這樣循環(huán)往復(fù)。隊(duì)列相當(dāng)于起到了訓(xùn)練庫到網(wǎng)絡(luò)模型間數(shù)據(jù)管道的作用，訓(xùn)練數(shù)據(jù)通過隊(duì)列送入網(wǎng)絡(luò)。
??我們使用slice_input_producer來建立一個(gè)隊(duì)列，將image和label放入一個(gè)list中當(dāng)做參數(shù)傳給該函數(shù)，然后從隊(duì)列中取得image和label。要注意，用read_file讀取圖片之后，要按照?qǐng)D片格式進(jìn)行解碼。本例程中訓(xùn)練數(shù)據(jù)是jpg格式的，所以使用decode_jpeg解碼器，如果是其他格式，就要用其他解碼器。注意decode出來的數(shù)據(jù)類型是uint8，之后模型卷積層里面conv2d要求輸入數(shù)據(jù)為float32類型，所以需要進(jìn)行類型轉(zhuǎn)換。
??因?yàn)橛?xùn)練庫中圖片大小是不一樣的，所以還需要將圖片裁剪成相同大小(img_W和img_H)。有些程序員使用resize_image_with_crop_or_pad方法來裁剪圖片，這種方法是從圖像中心向四周裁剪，如果圖片超過規(guī)定尺寸，最后只會(huì)剩中間區(qū)域的一部分，可能一只狗只剩下軀干，頭都不見了，用這樣的圖片訓(xùn)練結(jié)果肯定會(huì)受到影響。所以這里稍微改動(dòng)了一下，使用resize_images對(duì)圖像進(jìn)行縮放，而不是裁剪，采用NEAREST_NEIGHBOR插值方法。
??然后用tf.train.batch方法獲取batch，還有一種方法是tf.train.shuffle_batch，因?yàn)橹耙呀?jīng)亂序過了，這里用普通的batch函數(shù)。
??最后將得到的image_batch和label_batch返回，image_batch是一個(gè)4D的tensor，即[batch, width, height, channels]，label_batch是一個(gè)1D的tensor，即[batch]。
??可以用下面的代碼測(cè)試獲取圖片是否成功，因?yàn)橹皩D片轉(zhuǎn)為float32了，因此這里imshow出來的圖片色彩會(huì)有點(diǎn)奇怪，因?yàn)楸緛?code style="padding: 2px 4px;outline: 0px;font-family: "Source Code Pro", "DejaVu Sans Mono", "Ubuntu Mono", "Anonymous Pro", "Droid Sans Mono", Menlo, Monaco, Consolas, Inconsolata, Courier, monospace, "PingFang SC", "Microsoft YaHei", sans-serif;font-size: 14px;line-height: 22px;color: rgb(199, 37, 78);background-color: rgb(249, 242, 244);border-radius: 2px;">imshow是顯示uint8類型的數(shù)據(jù)(灰度值在uint8類型下是0至255，轉(zhuǎn)為float32后會(huì)超出這個(gè)范圍，所以色彩有點(diǎn)奇怪)，不過這不影響后面模型的訓(xùn)練：

import matplotlib.pyplot as plt

BATCH_SIZE = 2
CAPACITY = 256
IMG_W = 208
IMG_H = 208

train_dir = "data\\train\\"
image_list, label_list = get_files(train_dir)
image_batch, label_batch = get_batch(image_list, label_list, IMG_W, IMG_H, BATCH_SIZE, CAPACITY)

with tf.Session() as sess:
    i = 0
    coord = tf.train.Coordinator()
    threads = tf.train.start_queue_runners(coord=coord)

    try:
        while not coord.should_stop() and i < 1:
            img, label = sess.run([image_batch, label_batch])

            for j in np.arange(BATCH_SIZE):
                print("label: %d" % label[j])
                plt.imshow(img[j, :, :, :])
                plt.show()

            i += 1
    except tf.errors.OutOfRangeError:
        print("done!")
    finally:
        coord.request_stop()

    coord.join(threads)

卷積神經(jīng)網(wǎng)絡(luò)模型的構(gòu)造(model.py)

??以下仿照TensorFlow的官方例程cifar-10的網(wǎng)絡(luò)結(jié)構(gòu)來編寫的，就是兩個(gè)卷積層(每個(gè)卷積層后加一個(gè)池化層)，兩個(gè)全連接層，最后使用softmax輸出分類結(jié)果：

import tensorflow as tf

def inference(images, batch_size, n_classes):
    # conv1, shape = [kernel_size, kernel_size, channels, kernel_numbers]
    with tf.variable_scope("conv1") as scope:
        weights = tf.get_variable("weights", shape=[3, 3, 3, 16], dtype=tf.float32, \
                                  initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
        biases = tf.get_variable("biases", shape=[16], dtype=tf.float32, initializer=tf.constant_initializer(0.1))
        conv = tf.nn.conv2d(images, weights, strides=[1, 1, 1, 1], padding="SAME")
        pre_activation = tf.nn.bias_add(conv, biases)
        conv1 = tf.nn.relu(pre_activation, name="conv1")

    with tf.variable_scope("pooling1_lrn") as scope:  # pool1 && norm1
        pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding="SAME", name="pooling1")
        norm1 = tf.nn.lrn(pool1, depth_radius=4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm1')

    with tf.variable_scope("conv2") as scope:  # conv2
        weights = tf.get_variable("weights", shape=[3, 3, 16, 16], dtype=tf.float32, \
                                  initializer=tf.truncated_normal_initializer(stddev=0.1, dtype=tf.float32))
        biases = tf.get_variable("biases", shape=[16], dtype=tf.float32, initializer=tf.constant_initializer(0.1))
        conv = tf.nn.conv2d(norm1, weights, strides=[1, 1, 1, 1], padding="SAME")
        pre_activation = tf.nn.bias_add(conv, biases)
        conv2 = tf.nn.relu(pre_activation, name="conv2")

    with tf.variable_scope("pooling2_lrn") as scope:  # pool2 && norm2
        pool2 = tf.nn.max_pool(conv2, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding="SAME", name="pooling2")
        norm2 = tf.nn.lrn(pool2, depth_radius=4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm2')

    with tf.variable_scope("fc1") as scope:  # full-connect1
        reshape = tf.reshape(norm2, shape=[batch_size, -1])
        dim = reshape.get_shape()[1].value
        weights = tf.get_variable("weights", shape=[dim, 128], dtype=tf.float32, \
                                  initializer=tf.truncated_normal_initializer(stddev=0.005, dtype=tf.float32))
        biases = tf.get_variable("biases", shape=[128], dtype=tf.float32, initializer=tf.constant_initializer(0.1))
        fc1 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name="fc1")

    with tf.variable_scope("fc2") as scope:  # full_connect2
        weights = tf.get_variable("weights", shape=[128, 128], dtype=tf.float32, \
                                  initializer=tf.truncated_normal_initializer(stddev=0.005, dtype=tf.float32))
        biases = tf.get_variable("biases", shape=[128], dtype=tf.float32, initializer=tf.constant_initializer(0.1))
        fc2 = tf.nn.relu(tf.matmul(fc1, weights) + biases, name="fc2")

    with tf.variable_scope("softmax_linear") as scope:  # softmax
        weights = tf.get_variable("weights", shape=[128, n_classes], dtype=tf.float32, \
                                  initializer=tf.truncated_normal_initializer(stddev=0.005, dtype=tf.float32))
        biases = tf.get_variable("biases", shape=[n_classes], dtype=tf.float32, initializer=tf.constant_initializer(0.1))
        softmax_linear = tf.add(tf.matmul(fc2, weights), biases, name="softmax_linear")

    return softmax_linear

發(fā)現(xiàn)程序里面有很多with tf.variable_scope("name")的語句，這其實(shí)是TensorFlow中的變量作用域機(jī)制，目的是有效便捷地管理需要的變量。變量作用域機(jī)制在TensorFlow中主要由兩部分組成：

tf.get_variable(<name>, <shape>, <initializer>)：創(chuàng)建一個(gè)變量。
tf.variable_scope(<scope_name>)：指定命名空間。

如果需要共享變量，需要通過reuse_variables方法來指定。

def losses(logits, labels):
    with tf.variable_scope("loss") as scope:
        cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
                            logits=logits, labels=labels, name="xentropy_per_example")
        loss = tf.reduce_mean(cross_entropy, name="loss")
        tf.summary.scalar(scope.name + "loss", loss)

    return loss

def trainning(loss, learning_rate):
    with tf.name_scope("optimizer"):
        optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)
        global_step = tf.Variable(0, name="global_step", trainable=False)
        train_op = optimizer.minimize(loss, global_step=global_step)

    return train_op

def evaluation(logits, labels):
    with tf.variable_scope("accuracy") as scope:
        correct = tf.nn.in_top_k(logits, labels, 1)
        correct = tf.cast(correct, tf.float16)
        accuracy = tf.reduce_mean(correct)
        tf.summary.scalar(scope.name + "accuracy", accuracy)

    return accuracy

函數(shù)losses用于計(jì)算訓(xùn)練過程中的loss，這里輸入?yún)?shù)logtis是函數(shù)inference的輸出，代表圖片對(duì)貓和狗的預(yù)測(cè)概率，labels則是圖片對(duì)應(yīng)的標(biāo)簽。
??通過在程序中設(shè)置斷點(diǎn)，查看logtis的值，結(jié)果如下圖所示，一個(gè)數(shù)值代表屬于貓的概率，一個(gè)數(shù)值代表屬于狗的概率，兩者的和為1：

函數(shù)tf.nn.sparse_sotfmax_cross_entropy_with_logtis是將稀疏表示的label與輸出層計(jì)算出來結(jié)果做對(duì)比。然后因?yàn)橛?xùn)練的時(shí)候是16張圖片一個(gè)batch，所以再用tf.reduce_mean求一下平均值，就得到了這個(gè)batch的平均loss。對(duì)于training(loss, learning_rate)，loss是訓(xùn)練的loss，learning_rate是學(xué)習(xí)率，使用AdamOptimizer優(yōu)化器來使loss朝著變小的方向優(yōu)化。evaluation(logits, labels)的功能是在訓(xùn)練過程中實(shí)時(shí)監(jiān)測(cè)驗(yàn)證數(shù)據(jù)的準(zhǔn)確率，達(dá)到反映訓(xùn)練效果的作用。