Detecting objects using Rolling Convolution and Recurrent Neural Network

Authors

  • Wenqing Huang School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou
  • YaMing Wang School of Information Science and Technology, Zhejiang Sci-Tech University, Hangzhou

Abstract

AbstractAt present, most of the existing target detection algorithms use the method of region proposal to search for the target in the image. The most effective regional proposal method usually requires thousands of target prediction areas to achieve high recall rate.This lowers the detection efficiency. Even though recent region proposal network approach have yielded good results by using hundreds of proposals, it still faces the challenge when applied to small objects and precise locations. This is mainly because these approaches use coarse feature. Therefore, we propose a new method for extracting more efficient global features and multi-scale features to provide target detection performance. Given that feature maps under continuous convolution lose the resolution required to detect small objects when obtaining deeper semantic information; hence, we use rolling convolution (RC) to maintain the high resolution of low-level feature maps to explore objects in greater detail, even if there is no structure dedicated to combining the features of multiple convolutional layers. Furthermore, we use a recurrent neural network of multiple gated recurrent units (GRUs) at the top of the convolutional layer to highlight useful global context locations for assisting in the detection of objects. Through experiments in the benchmark data set, our proposed method achieved 78.2% mAP in PASCAL VOC 2007 and 72.3% mAP in PASCAL VOC 2012 dataset. It has been verified through many experiments that this method has reached a more advanced level of detection.

References

WöHler C, Anlauf J K. An adaptable time-delay neural-network algorithm for image sequence analysis[J]. IEEE Transactions on Neural Networks, 1999, 10(6):1531-1536.

Dalal N, Triggs B. Histograms of oriented gradients for human detection[C]// Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on. IEEE, 2005:886-893.

Laptev I. Improvements of Object Detection Using Boosted Histograms[C]// British Machine Vision Conference 2006, Edinburgh, Uk, September. DBLP, 2006:949-958.

Shet V D, Neumann J, Ramesh V, et al. Bilattice-based Logical Reasoning for Human Detection[C]// Computer Vision and Pattern Recognition, 2007. CVPR '07. IEEE Conference on. IEEE, 2007:1-8.

Zhang L, Wu B, Nevatia R. Detection and Tracking of Multiple Humans with Extensive Pose Articulation[C]// IEEE, International Conference on Computer Vision. IEEE, 2007:1-8.

Azizpour H, Laptev I. Object Detection Using Strongly-Supervised Deformable Part Models[M]// Computer Vision – ECCV 2012. Springer Berlin Heidelberg, 2012:836-849.

Dalal N, Triggs B, Schmid C. Human detection using oriented histograms of flow and appearance[J]. 2006, 3952:428-441.

Dollar P, Wojek C, Schiele B, et al. Pedestrian Detection: An Evaluation of the State of the Art[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2012, 34(4):743.

Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Neural Information Processing Systems. (2012)1106–1114

Lin, M., Chen, Q., Yan, S.: Network in network. In: International Conference on Learning Representations. (2014)

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv preprintarXiv:1409.4842 (2014)

Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scaleimage recognition. arXiv preprint arXiv:1409.1556 (2014)

He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition.arXiv preprint arXiv:1512.03385 (2015)

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. (2014) 580–587

Uijlings, J., van de Sande, K., Gevers, T., Smeulders, A.: Selective search for object recognition. International Journal on Computer Vision 104(2) (2013) 154–171

Zitnick, C.L., Doll´ ar, P.: Edge boxes: Locating object proposals from edges. In:European Conference on Computer Vision. (2014) 391–405

Arbeláez, P., Pont-Tuset, J., Barron, J., Marques, F., Malik, J.: Multiscale combinatorial grouping. In: IEEE Conference on Computer Vision and Pattern Recognition. (2014) 328–335

Girshick, R.: Fast r-cnn. In: IEEE Conference on Computer Vision and Pattern Recognition. (2015) 1440–1448

Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Neural Information Processing Systems.(2015) 91–99

Liang, X., Wei, Y., Shen, X., Jie, Z., Feng, J., Lin, L., Yan, S.: Reversible recursiveinstance-level object segmentation. arXiv preprint arXiv:1511.04517 (2015)

Zeng, X., Ouyang, W., Wang, X.: Window-object relationship guided representation learning for generic object detections. arXiv preprint arXiv:1512.02736 (2015)

Gidaris, S., Komodakis, N.: Object detection via a multi-region and semanticsegmentation-aware cnn model. In: IEEE International Conference on ComputerVision. (2015) 1134–1142

Long, J., Shelhamer, E., Darrell, T. Fully convolutionalnetworks for semantic segmentation. In CVPR, 2015.

Hariharan, B., Arbeláez, P., Girshick, R., Malik, J. Hypercolumns for object segmentation and fine-grained localization. In CVPR, 2015.

Kong, T., Yao, A., Chen, Y., Sun, F. Hypernet: Towards accurate region proposal generation and joint object detection. In CVPR, 2016.

Liu, W., Rabinovich, A., Berg, A.C. ParseNet: Lookingwider to see better. In ICLR workshop, 2016.

Bell, S., Zitnick, C.L., Bala, K., Girshick, R. Inside-outside net: Detecting objects in context with skip poolingand recurrent neural networks. In CVPR, 2016.

Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N. A unified multi-scale deep convolutional neural network for fast object detection. In ECCV, 2016.

Li J, Wei Y, Liang X, et al. Attentive Contexts for Object Detection[J]. IEEE Transactions on Multimedia, 2017, 19(5):944-954.

Stewart, R., Andriluka, M. End-to-end people detection in crowded scenes. arXiv preprint arXiv:1506.04878 (2015).

Downloads

Published

2024-04-19

Issue

Section

Image Processing