发挥图像识别的威力还是需要ResNet结构,最早是微软亚洲研究院提出的,这是可以达到90%以上识别率的网络结构,比如resnet cifar10,需要使用高达21个卷积层,并且每一步都要进行重新的批量正则化与归一化。
谷歌2016年8月份开源了Inception-ResNet-v2,是基于TensorFlow的,识别效果更为强大,能够将阿拉斯加雪橇犬(左)和西伯利亚雪橇犬给准确分类出来,另外还有V3与V4,网络结构也变得更深,唯一值的一提的是Inception-v4没有residual连接,但是效果与V2一样。
不过这都是些是些个人计算机运行起来很吃力的东西,在CNTK里这个同样有ResNet的各种实现,默认的ResNet20_CIFAR10.cntk将会进行160次迭代,这个运算量非常大也非常缓慢,不过识别精度非常高,但是不是妨尝试运行下的。
在第16次迭代时可以达到准度度已经可以达到是百分之86.1%以上的正确率了,每次迭代花费时间约在25s左右,所以时间也就花费了几分钟而已,它提高识别率的速度甚至比简单的卷积网络还快。在达到28次左右时,实际上就达到了89~90%左右的识别率,真正突破90%大概是到了36次左右,即大概15分钟便可以做到90%的识别,这个完全可以接受。不过按文档说,训练到最后错误率最高也只能达到8.2%左右,而人类的估计是6%左右,这个网络结构还是不足以超过人类,而叠加到了n=18的网络,则可以达到6.2-6.5%的效果。简单测试了一下,网络的一代迭代大概153秒,要迭代160次,约要跑6.8个小时,才可以达到6.2-6.5%的效果。
根据有人做过的研究来看,ResNet也不是层次越多越好。
到了1202层的时候,反而效果比110层更差了。
比较了一下:
ResNet20_CIFAR10,numLayers =3,learningRatesPerMB = 1.0*80:0.1*40:0.01
ResNet18_CIFAR10,numLayers = 18,learningRatesPerMB = 0.1*1:1.0*80:0.1*40:0.01
除此之外并无区别了。
所以分析ResNet20_CIFAR10.cntk源文件就好
# ConvNet applied on CIFAR-10 dataset, with data augmentation (translation and flipping).
command = TrainConvNet:Eval
precision = "float"; traceLevel = 1 ; deviceId = "auto"
rootDir = "../.." ; configDir = "./" ; dataDir = "$rootDir$/DataSets/CIFAR-10" ;
outputDir = "./Output" ;
modelPath = "$outputDir$/Models/ResNet20_CIFAR10_DataAug"
#stderr = "$outputDir$/ResNet20_CIFAR10_DataAug_bs_out"
TrainConvNet = {
action = "train"
BrainScriptNetworkBuilder = {
include "$configDir$/Macros.bs"
imageShape = 32:32:3 #图像都重整到32
labelDim = 10 #分类只有十种
featScale = 1/256
Normalize{f} = x => f .* x
cMap = 16:32:64
bnTimeConst = 4096
numLayers = 3
model = Sequential (
Normalize {featScale} :
ConvBNReLULayer {cMap[0], (3:3), (1:1), bnTimeConst} :
ResNetBasicStack {numLayers, cMap[0], bnTimeConst} :
ResNetBasicInc {cMap[1], (2:2), bnTimeConst} :
ResNetBasicStack {numLayers-1, cMap[1], bnTimeConst} :
ResNetBasicInc {cMap[2], (2:2), bnTimeConst} :
ResNetBasicStack {numLayers-1, cMap[2], bnTimeConst} :
# avg pooling 平均池化
AveragePoolingLayer {(8: 8), stride = 1} :
LinearLayer {labelDim}
)
# inputs
features = Input {imageShape}
labels = Input {labelDim}
# apply model to features
z = model (features)
# connect to system
ce = CrossEntropyWithSoftmax (labels, z)
errs = ClassificationError (labels, z)
top5Errs = ClassificationError (labels, z, topN=5) # only used in Eval action
featureNodes = (features)
labelNodes = (labels)
criterionNodes = (ce)
evaluationNodes = (errs) # top5Errs only used in Eval
outputNodes = (z)
}
SGD = {
epochSize = 0
minibatchSize = 128
# Note that learning rates are 10x more than in the paper due to a different
# momentum update rule in CNTK: v{t + 1} = lr*(1 – momentum)*g{t + 1} + momentum*v{t}
#这里的学习率是Paper上的10倍,这里有所不同。而学习自动变化则momentum的方式,Nesterov Momentum基于凸优化理论,它的收敛更好。
learningRatesPerMB = 1.0*80:0.1*40:0.01
momentumPerMB = 0.9
#迭代次数
maxEpochs = 160
#L2正则化权重,关于这个详细,应该参考:https://msdn.microsoft.com/zh-cn/dn904675.aspx
L2RegWeight = 0.0001
numMBsToShowResult = 100
}
reader = {
verbosity = 0 ; randomize = true
deserializers = ({
type = "ImageDeserializer" ; module = "ImageReader"
file = "$dataDir$/train_map.txt"
input = {
features = { transforms = (
{ type = "Crop" ; cropType = "random" ; cropRatio = 0.8 ; jitterType = "uniRatio" } :
{ type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
{ type = "Mean" ; meanFile = "$dataDir$/CIFAR-10_mean.xml" } :
{ type = "Transpose" }
)}
labels = { labelDim = 10 }
}
})
}
}
# Eval action
Eval = {
action = "eval"
evalNodeNames = errs:top5Errs # also test top-5 error rate
# Set minibatch size for testing.
minibatchSize = 128
reader = {
verbosity = 0 ; randomize = false
deserializers = ({
type = "ImageDeserializer" ; module = "ImageReader"
file = "$dataDir$/test_map.txt"
input = {
features = { transforms = (
{ type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
{ type = "Mean"; meanFile = "$dataDir$/CIFAR-10_mean.xml" } :
{ type = "Transpose" }
)}
labels = { labelDim = 10 }
}
})
}
}
发表评论