那棵树看起来生气了
Caffe Int8量化推理
前言
8位(INT8)推理也称为低精度推理,它可以以较低的精度损失加速推理。与FP32相比,它具有更高的吞吐量和更低的存储器需求。Intel发布了一个精确工具(Calibrator)。该工具将首先生成初始量化参数,并尝试调整它们以满足以后的精度要求。它最终会产生一个量化的原型TXTXT。
正文
环境搭建
编译安装Caffe
安装依赖库
$ sudo yum install protobuf-devel leveldb-devel snappy-devel opencv-devel boost-devel hdf5-devel
$ sudo yum install gflags-devel glog-devel lmdb-devel
$ sudo yum install openblas-devel
下载源码
$ git clone https://github.com/intel/caffe.git
进入caffe目录,执行编译命令
$ mkdir build && cd build && cmake -DCPU_ONLY=1 -DUSE_MLSL=0 -DCMAKE_BUILD_TYPE=Release ..
$ make all -j$(nproc)
问题ERROR
在编译的过程中出现类似如下报错只需要以root用户重新编译即可。
/usr/local/lib/libopencv_highgui.so:对‘TIFFIsTiled@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFOpen@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFReadEncodedStrip@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFSetField@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFWriteScanline@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFGetField@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFScanlineSize@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFSetWarningHandler@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFSetErrorHandler@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFReadEncodedTile@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFReadRGBATile@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFClose@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFRGBAImageOK@LIBTIFF_4.0’未定义的引用
/usr/local/lib/libopencv_highgui.so:对‘TIFFReadRGBAStrip@LIBTIFF_4.0’未定义的引用
设置环境变量
进入caffe根目录
安装依赖package
$ CAFFE_ROOT=$PWD
$ for req in $(cat python/requirements.txt) pydot; do pip install $req; done
写入环境变量
$ vim ~/.bashrc
CAFFE_ROOT=$(YOUR CAFFE FOLDER)
export PYCAFFE_ROOT=$CAFFE_ROOT/python
export PYTHONPATH=$PYCAFFE_ROOT:$PYTHONPATH
export PATH=$CAFFE_ROOT/build/tools:$PYCAFFE_ROOT:$PATH
$ source ~/.bashrc
测试,import caffe,没有报错,代表安装成功!
$ python
Python 2.7.14 |Anaconda, Inc.| (default, Dec 7 2017, 17:05:42)
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import caffe
>>>
常规FP32测试
数据集预处理
这里使用mnist进行测试
$ cd $CAFFE_ROOT/data/mnist
下载数据集
$ ./get_mnist.sh
-rw-rw-r--. 1 dyb dyb 7840016 7月 21 2000 t10k-images-idx3-ubyte
-rw-rw-r--. 1 dyb dyb 10008 7月 21 2000 t10k-labels-idx1-ubyte
-rw-rw-r--. 1 dyb dyb 47040016 7月 21 2000 train-images-idx3-ubyte
-rw-rw-r--. 1 dyb dyb 60008 7月 21 2000 train-labels-idx1-ubyte
转换数据格式
$ cd $CAFFE_ROOT
$ examples/mnist/create_mnist.sh
Training
使用下面的命令训练,准确率99.1%左右
cd $CAFFE_ROOT
./examples/mnist/train_lenet.sh
_iter_10000.solverstate
I0920 10:07:37.475653 209726 solver.cpp:735] Snapshot end
I0920 10:07:37.476914 209726 solver.cpp:436] Iteration 10000, loss = 0.0025977
I0920 10:07:37.476943 209726 solver.cpp:474] Iteration 10000, Testing net (#0)
I0920 10:07:37.663137 209726 solver.cpp:562] Test net output #0: accuracy = 0.991
I0920 10:07:37.663183 209726 solver.cpp:562] Test net output #1: loss = 0.0275588 (* 1 = 0.0275588 loss)
I0920 10:07:37.663195 209726 solver.cpp:443] Optimization Done.
I0920 10:07:37.663203 209726 caffe.cpp:345] Optimization Done.
Test
使用Caffe自带命令 caffe test 测试准确率
$ caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
# 结果如下,可以看到准确率 0.9868
I0920 10:11:32.973040 210045 caffe.cpp:500] Batch 47, loss = 0.0123037
I0920 10:11:32.974900 210045 caffe.cpp:500] Batch 48, accuracy = 0.98
I0920 10:11:32.974917 210045 caffe.cpp:500] Batch 48, loss = 0.0551675
I0920 10:11:32.976784 210045 caffe.cpp:500] Batch 49, accuracy = 0.99
I0920 10:11:32.976802 210045 caffe.cpp:500] Batch 49, loss = 0.0154025
I0920 10:11:32.976811 210045 caffe.cpp:505] Loss: 0.040997
I0920 10:11:32.976828 210045 caffe.cpp:517] accuracy = 0.9868
I0920 10:11:32.976861 210045 caffe.cpp:517] loss = 0.040997 (* 1 = 0.040997 loss)
使用Caffe自带命令 caffe time 测试性能
$ caffe time -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
# 可以看到每层消耗的时间、平均时间、总时间
I0920 10:14:25.767056 210246 caffe.cpp:626] mnist forward: 0.40748 ms.
I0920 10:14:25.767081 210246 caffe.cpp:630] mnist backward: 0.00026 ms.
I0920 10:14:25.767089 210246 caffe.cpp:626] conv1 forward: 0.10298 ms.
I0920 10:14:25.767097 210246 caffe.cpp:630] conv1 backward: 1.40416 ms.
I0920 10:14:25.767105 210246 caffe.cpp:626] pool1 forward: 0.28126 ms.
I0920 10:14:25.767127 210246 caffe.cpp:630] pool1 backward: 0.06454 ms.
I0920 10:14:25.767134 210246 caffe.cpp:626] conv2 forward: 0.19562 ms.
I0920 10:14:25.767145 210246 caffe.cpp:630] conv2 backward: 0.86186 ms.
I0920 10:14:25.767168 210246 caffe.cpp:626] pool2 forward: 0.09786 ms.
I0920 10:14:25.767189 210246 caffe.cpp:630] pool2 backward: 0.0285 ms.
I0920 10:14:25.767211 210246 caffe.cpp:626] ip1 forward: 0.063 ms.
I0920 10:14:25.767221 210246 caffe.cpp:630] ip1 backward: 0.07934 ms.
I0920 10:14:25.767230 210246 caffe.cpp:626] relu1 forward: 0.007 ms.
I0920 10:14:25.767241 210246 caffe.cpp:630] relu1 backward: 0.01172 ms.
I0920 10:14:25.767251 210246 caffe.cpp:626] ip2 forward: 0.02298 ms.
I0920 10:14:25.767261 210246 caffe.cpp:630] ip2 backward: 0.03466 ms.
I0920 10:14:25.767269 210246 caffe.cpp:626] loss forward: 0.03128 ms.
I0920 10:14:25.767279 210246 caffe.cpp:630] loss backward: 0.0022 ms.
I0920 10:14:25.767288 210246 caffe.cpp:636] Average Forward pass: 1.21234 ms.
I0920 10:14:25.767295 210246 caffe.cpp:639] Average Backward pass: 2.49054 ms.
I0920 10:14:25.767303 210246 caffe.cpp:641] Average Forward-Backward: 3.72 ms.
I0920 10:14:25.767308 210246 caffe.cpp:644] Total Time: 186 ms.
到此,常规FP32测试完毕
量化Int8测试
生成量化文件
首先转化模型结构文件prototxt为量化文件
$ cd $CAFFE_ROOT/scripts
$ python calibrator.py -r ../build/ -m ../examples/mnist/lenet_train_test.prototxt -w ../examples/mnist/lenet_iter_10000.caffemodel -i 100 -n accuracy -l 0.001 -d 0
# 结果如下
I0920 10:21:36.894049 211048 caffe.cpp:500] Batch 99, loss = 0.0121913
I0920 10:21:36.894055 211048 caffe.cpp:505] Loss: 0.0275588
I0920 10:21:36.894075 211048 caffe.cpp:517] accuracy = 0.991
I0920 10:21:36.894104 211048 caffe.cpp:517] loss = 0.0275588 (* 1 = 0.0275588 loss)
Updated prototxt /home/dyb/hub/caffe/examples/mnist/lenet_train_test_quantized.prototxt is generated.
可以看到生成了如下文件,这个便是量化需要用到的文件
examples/mnist/lenet_train_test_quantized.prototxt
测试量化结果
准确率
$ caffe test -model examples/mnist/lenet_train_test_quantized.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
I0920 10:24:29.601625 211250 caffe.cpp:500] Batch 49, loss = 0.0154025
I0920 10:24:29.601634 211250 caffe.cpp:505] Loss: 0.040997
I0920 10:24:29.601650 211250 caffe.cpp:517] accuracy = 0.9868
I0920 10:24:29.601675 211250 caffe.cpp:517] loss = 0.040997 (* 1 = 0.040997 loss)
性能
$ caffe time -model examples/mnist/lenet_train_test_quantized.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
I0920 10:25:37.735370 211347 caffe.cpp:626] mnist forward: 0.39882 ms.
I0920 10:25:37.735394 211347 caffe.cpp:630] mnist backward: 0.00038 ms.
I0920 10:25:37.735404 211347 caffe.cpp:626] conv1 forward: 0.10124 ms.
I0920 10:25:37.735424 211347 caffe.cpp:630] conv1 backward: 1.40078 ms.
I0920 10:25:37.735432 211347 caffe.cpp:626] pool1 forward: 0.27842 ms.
I0920 10:25:37.735469 211347 caffe.cpp:630] pool1 backward: 0.06578 ms.
I0920 10:25:37.735477 211347 caffe.cpp:626] conv2 forward: 0.19304 ms.
I0920 10:25:37.735500 211347 caffe.cpp:630] conv2 backward: 0.86114 ms.
I0920 10:25:37.735509 211347 caffe.cpp:626] pool2 forward: 0.09626 ms.
I0920 10:25:37.735519 211347 caffe.cpp:630] pool2 backward: 0.02854 ms.
I0920 10:25:37.735532 211347 caffe.cpp:626] ip1 forward: 0.06208 ms.
I0920 10:25:37.735544 211347 caffe.cpp:630] ip1 backward: 0.08062 ms.
I0920 10:25:37.735554 211347 caffe.cpp:626] relu1 forward: 0.00688 ms.
I0920 10:25:37.735565 211347 caffe.cpp:630] relu1 backward: 0.01098 ms.
I0920 10:25:37.735576 211347 caffe.cpp:626] ip2 forward: 0.0222 ms.
I0920 10:25:37.735585 211347 caffe.cpp:630] ip2 backward: 0.0346 ms.
I0920 10:25:37.735596 211347 caffe.cpp:626] loss forward: 0.02954 ms.
I0920 10:25:37.735605 211347 caffe.cpp:630] loss backward: 0.00234 ms.
I0920 10:25:37.735616 211347 caffe.cpp:636] Average Forward pass: 1.1921 ms.
I0920 10:25:37.735622 211347 caffe.cpp:639] Average Backward pass: 2.4883 ms.
I0920 10:25:37.735627 211347 caffe.cpp:641] Average Forward-Backward: 3.7 ms.
I0920 10:25:37.735635 211347 caffe.cpp:644] Total Time: 185 ms.
对比FP32差距较小,可能是因为mnist数据集本身太简单的缘故,可以尝试使用复杂的模型,比如说googlenet,vgg等。
结束
小伙伴可以尝试更多模型来比较性能差异
推荐一篇讲量化原理的博客,参考 https://blog.csdn.net/yiran103/article/details/81912690#commentBox
Intel Github: https://github.com/intel/caffe/wiki/Introduction-of-Accuracy-Calibration-Tool-for-8-Bit-Inference
三合一收款
下面三种方式都支持哦