• Computer vision and deep learning:
    • Object detection, recognition and tracking using deep learning.
    • Semantic segmentation and action recognition using deep learning.
    • Depth map estimation and processing using deep learning.
    • Multi-cue pedestrian detection and tracking.
  • Image/video processing and communication:
    • Low-complexity, error-resilient video coding.
    • Compressed video sensing.
    • Multi-view video coding, processing and transmission.
  • Computer vision for robotics:
    • Real-tome 3D reconstruction using RGB-D.
    • RGB-D simultaneous localization and mapping.
    • Large-scale 3D scene modeling.


  • Korea Electronics Technology Institute (KETI), "Hardware/software co-design of low-complexity video coding algorithms for wireless video surveillance," 2011-2012
  • Memsoft, "Power-optimized distributed video codec design for wireless video sensor networks," 2012-2014
  • Korea Electronics Technology Institute (KETI), "Stereo-based pedestrian detection and tracking algorithms for advanced driver assistant systems," 2012-2017
  • Argonne National Laboratory, "Enhancement of 3D model reconstruction technology," 2012-2017
  • Korea Electronics Technology Institute (KETI), "Camera-based artificial intelligence system for autonomous cars", 2017-2021


  • Deep learning based object detection for autonomous driving
  • The task of the research is to perform multi-class multi-scale object detection for traffic participants (cars, cyclists and pedestrians) using deep neural networks. Traffic participants interact with the autonomous car all the time and must be detected in real-time to avoid accidents. Multi-class object detection for these traffic participants is challenging because of the high scale variation, heavy occlusion, and cluttered backgrounds. Based on the popular two-stage detector (Faster R-CNN) and one-stage detector (SSD), our research focuses on improving the detection accuracy and the processing speed using spatial and temporal context information. The spatial context information extraction is carried out using spatial recurrent neural network and location-aware deformable convolution. Short-term and long-term temporal context information is modeled by optical flow and convolutional LSTM.

  • Semantic segmentation using deep learning
  • Semantic segmentation is an important but challenging task in computer vision. Nowadays, applications such as autonomous driving, path navigation, image search engine, or augmented reality require accurate semantic analysis and efficient segmentation mechanisms. For semantic segmentation of still images, we developed a method which exploits multi-scale contextual features and features from shallow stages. For video semantic segmentation, we focus on designing deep neural networks which can achieve low computational complexity while maintaining segmentation accuracy.

  • Stereo-based pedestrian detection
  • Vision-based pedestrian detection plays an important role in advanced driver-assistance systems (ADAS) and autonomous driving. The task is to detect and locate pedestrians accurately and fast in a driving scene. Speed is very crucial to avoid hitting the pedestrian. Traditional exhaustive search-based methods waste lots of time on generating regions-of-interest(ROIs) on the locations where no pedestrian would appear. Our research speeds up the detection process by utilizing stereo camera and depth information to detect the ground plane where pedestrians would normally stand upon. Thus, the ROI search range is greatly reduced, and the detection can be performed at real-time speed. Our research also features depth map estimation (stereo matching) based on modified Census transform and mutual information which is fast and robust against the brightness inconsistency in the outdoor environment. The pedestrian detection system is implemented on the low-profile mini-PC and tested in the Chicago urban area.

  • Real-time 3D reconstruction using RGB-D
  • 3D reconstruction like KinectFusion has been a well-established area of study in the field of robotics and computer vision. The objective is to recreate a real-world scene, and it has applications in areas like augmented reality (AR), robotic teleoperation, medical analysis, video games, etc. We focus on improving the performance of 3D reconstruction for challenging tasks such as moving object segmentation, tracking fast camera motion, modeling complex environments, etc. We are planing to extend our work to track and reconstruct moving objects simultaneously.


  • Distributed video coding (DVC).
  • DVC is commonly referred to as the reverse paradigm of conventional video coding where the complexity is transferred from the encoder to the decoder. The target applications are video surveillance cameras, video conferencing and remote video sensors which need to perform video encoding on resource constrained devices.

  • Multiple description coding in DVC.
  • Multiple description coding in DVC is a an effort to reduce the effects of channel losses in DVC By intentionally introducing redundancy into the encoded bitstreams, the ability of the codec to withstand channel losses are increased.

  • Multiview video coding.
  • Multiview video coding (MVC) is a hot research topic these owing to the advances in processing power and algorithms to process multiview video content. The lab focuses on all aspects of MVC starting from

    1. Depth extraction from a pair of stereoscopic images.
    2. Multiview video coding focusing on compression of depth maps, layered depth images and layered depth videos.
    3. View synthesis at arbitrary camera locations using concepts of depth image based rendering.

  • Video anomaly detection.
  • Anomaly detection in video is a very important topic that the lab is conducting research into. Using concepts from machine learning, pattern recognition and stochastic processes, research is being conducted to extract useful features from video. Such research will be very useful to law enforcement authorities for automatic detection of anomalies in surveillance cameras distributed around a city.

  • Network-aware and error-resilient video coding
  • Research is being conducted into improvement of H.264/AVC, joint source-channel coding, rate control and error control, error concealment, video distortion estimation and quality assessment.

  • Wireless multimedia communication
  • Quality of Service provision for video transmission in wireless networks, multi-path transport over wireless ad-hoc networks, multimedia transport protocols.

  • Cross-layer design for video transmission over wireless ad-hoc networks
  • Application-layer techniques for adaptive video transmission over wireless networks, adaptive cross-layer error protection, adaptive link-layer techniques, power-distortion optimized routing and scheduling.