研究CNTK(六):ResNet

发挥图像识别的威力还是需要ResNet结构,最早是微软亚洲研究院提出的,这是可以达到90%以上识别率的网络结构,比如resnet cifar10,需要使用高达21个卷积层,并且每一步都要进行重新的批量正则化与归一化。

谷歌2016年8月份开源了Inception-ResNet-v2,是基于TensorFlow的,识别效果更为强大,能够将阿拉斯加雪橇犬(左)和西伯利亚雪橇犬给准确分类出来,另外还有V3与V4,网络结构也变得更深,唯一值的一提的是Inception-v4没有residual连接,但是效果与V2一样。

不过这都是些是些个人计算机运行起来很吃力的东西,在CNTK里这个同样有ResNet的各种实现,默认的ResNet20_CIFAR10.cntk将会进行160次迭代,这个运算量非常大也非常缓慢,不过识别精度非常高,但是不是妨尝试运行下的。

道家阴符派博客--研究CNTK(六):ResNet--CNTK 1

在第16次迭代时可以达到准度度已经可以达到是百分之86.1%以上的正确率了,每次迭代花费时间约在25s左右,所以时间也就花费了几分钟而已,它提高识别率的速度甚至比简单的卷积网络还快。在达到28次左右时,实际上就达到了89~90%左右的识别率,真正突破90%大概是到了36次左右,即大概15分钟便可以做到90%的识别,这个完全可以接受。不过按文档说,训练到最后错误率最高也只能达到8.2%左右,而人类的估计是6%左右,这个网络结构还是不足以超过人类,而叠加到了n=18的网络,则可以达到6.2-6.5%的效果。简单测试了一下,网络的一代迭代大概153秒,要迭代160次,约要跑6.8个小时,才可以达到6.2-6.5%的效果。

 

根据有人做过的研究来看,ResNet也不是层次越多越好。

道家阴符派博客--研究CNTK(六):ResNet--CNTK 2

到了1202层的时候,反而效果比110层更差了。

 

比较了一下:

ResNet20_CIFAR10,numLayers =3,learningRatesPerMB = 1.0*80:0.1*40:0.01

ResNet18_CIFAR10,numLayers = 18,learningRatesPerMB = 0.1*1:1.0*80:0.1*40:0.01

除此之外并无区别了。

所以分析ResNet20_CIFAR10.cntk源文件就好

# ConvNet applied on CIFAR-10 dataset, with data augmentation (translation and flipping).

command = TrainConvNet:Eval

precision = "float"; traceLevel = 1 ; deviceId = "auto"

rootDir = "../.." ; configDir = "./" ; dataDir = "$rootDir$/DataSets/CIFAR-10" ;
outputDir = "./Output" ;

modelPath = "$outputDir$/Models/ResNet20_CIFAR10_DataAug"
#stderr = "$outputDir$/ResNet20_CIFAR10_DataAug_bs_out"

TrainConvNet = {
    action = "train"

    BrainScriptNetworkBuilder = {
        include "$configDir$/Macros.bs" 

        imageShape = 32:32:3 #图像都重整到32
        labelDim = 10 #分类只有十种

        featScale = 1/256
        Normalize{f} = x => f .* x

        cMap = 16:32:64
        bnTimeConst = 4096
        numLayers = 3
        
        model = Sequential (
            Normalize {featScale} :

            ConvBNReLULayer {cMap[0], (3:3), (1:1), bnTimeConst} :
            ResNetBasicStack {numLayers, cMap[0], bnTimeConst} :

            ResNetBasicInc {cMap[1], (2:2), bnTimeConst} :
            ResNetBasicStack {numLayers-1, cMap[1], bnTimeConst} :
        
            ResNetBasicInc {cMap[2], (2:2), bnTimeConst} :
            ResNetBasicStack {numLayers-1, cMap[2], bnTimeConst} :
            
            # avg pooling 平均池化
            AveragePoolingLayer {(8: 8), stride = 1} :
            LinearLayer {labelDim}
        )

        # inputs
        features = Input {imageShape}
        labels   = Input {labelDim}

        # apply model to features
        z = model (features)

        # connect to system
        ce       = CrossEntropyWithSoftmax     (labels, z)
        errs     = ClassificationError         (labels, z)
        top5Errs = ClassificationError         (labels, z, topN=5)  # only used in Eval action

        featureNodes    = (features)
        labelNodes      = (labels)
        criterionNodes  = (ce)
        evaluationNodes = (errs)  # top5Errs only used in Eval
        outputNodes     = (z)
    }

    SGD = {
        epochSize = 0
        minibatchSize = 128

        # Note that learning rates are 10x more than in the paper due to a different
        # momentum update rule in CNTK: v{t + 1} = lr*(1 – momentum)*g{t + 1} + momentum*v{t}

        #这里的学习率是Paper上的10倍,这里有所不同。而学习自动变化则momentum的方式,Nesterov Momentum基于凸优化理论,它的收敛更好。
        learningRatesPerMB = 1.0*80:0.1*40:0.01
        momentumPerMB = 0.9

       #迭代次数
        maxEpochs = 160

        #L2正则化权重,关于这个详细,应该参考:https://msdn.microsoft.com/zh-cn/dn904675.aspx
        L2RegWeight = 0.0001

        numMBsToShowResult = 100
    }

    reader = {
        verbosity = 0 ; randomize = true
        deserializers = ({
            type = "ImageDeserializer" ; module = "ImageReader"
            file = "$dataDir$/train_map.txt"
            input = {
                features = { transforms = (
                    { type = "Crop" ; cropType = "random" ; cropRatio = 0.8 ; jitterType = "uniRatio" } :
                    { type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
                    { type = "Mean" ; meanFile = "$dataDir$/CIFAR-10_mean.xml" } :
                    { type = "Transpose" }
                )}
                labels = { labelDim = 10 }
            }
        })
    }
}

# Eval action
Eval = {
    action = "eval"
    evalNodeNames = errs:top5Errs  # also test top-5 error rate
    # Set minibatch size for testing.
    minibatchSize = 128

    reader = {
        verbosity = 0 ; randomize = false
        deserializers = ({
            type = "ImageDeserializer" ; module = "ImageReader"
            file = "$dataDir$/test_map.txt"
            input = {
                features = { transforms = (
                   { type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
                   { type = "Mean"; meanFile = "$dataDir$/CIFAR-10_mean.xml" } :
                   { type = "Transpose" }
                )}
                labels = { labelDim = 10 }
            }
        })
    }
}

 

发表评论

电子邮件地址不会被公开。 必填项已用*标注