Automatic Generation of Video Navigation from Google
Street View Data with Car Detection and Inpainting

Yuan-Bang Cheng, Chuan-Kai Yang, Guan-Chung Chang, and Teng-Wen Chang

Department of Information Management
National Taiwan University of Science and Technology
No. 43, Sec. 4, Keelung Road
Taipei, 106, Taiwan
davinban@gmail.com, ckyang@cs.ntust.edu.tw, M10009113@mail.ntust.edu.tw

Department of Digital Media Design
National Yunlin University of Science and Technology
No. 123, Sec. 3, Daxue Road
Douliou City, 640, Taiwan
tengwen@yuntech.edu.tw

Abstract

In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.
Keywords: Google Earth, Google Street View, HOG and Exemplar-SVMs, HAAR and Adaboost, Region Growing, Image Inpainting, Caffe, Faster R-CNN

Results: Output 1 and Output2

 

Navigation Location: Sydney Opera House

 

Output1
video format: .avi

Output2
video format: .mov

The lower right video region in Output2
video format .avi

The lower right video region in Output2
video format .mp4

Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file.
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds.
   

 

Navigation Location: Taroko National Park

 

Output1
video format: .avi

Output2
video format: .mov

The lower right video region in Output2
video format .avi

The lower right video region in Output2
video format .mp4

Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file.
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds.
   

 

Navigation Location: Pingtung Kenting Coast

 

Output1
video format: .avi

Output2
video format: .mov

The lower right video region in Output2
video format .avi

The lower right video region in Output2
video format .mp4

Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file.
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds.
   

 

Navigation Location: Tainan Gold Coast

 

Output1
video format: .avi

Output2
video format: .mov

The lower right video region in Output2
video format .avi

The lower right video region in Output2
video format .mp4

Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file.
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds.
   

 

Navigation Location: Taipei City

 

Output1
video format: .avi

Output2
video format: .mov

The lower right video region in Output2
video format .avi

The lower right video region in Output2
video format .mp4

Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file.
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation.
Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds.
   

 

Results: Comparisons with other similar Apps

Note that the results shown from our proposed system were conducted using 
the old version of Google Street View API, as this is a continuing study
started from years ago when the old API was still supported; however,
the Apps that we compare about are all from the currently on-line selling
App store, so they use the new version of Google Street View API and
new image dataset. As a result, these Apps may sometimes be able to fetch
better and more key frames.
Nevertheless, as can be shown in the ensuing comparison results, our system
can still generate better and smoother videos than those Apps due to the fact
that this system performs transitional transformations among key frames,
together with the object detection and inpainting.

Comparison 1

Name Our system Route Player
Icon NTUSTCGM_AGVN

Route Player
(Route Player and Course Preview were produced by the same company)

Platform PC (Google Chrome) iPhone iOS
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 28
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

NO

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

bad
Vertical play/use mode  
Horizontal play/use mode  


Comparison 2

Name Our system Course Preview
Icon NTUSTCGM_AGVN

Course Preview
(Route Player and Course Preview were produced by the same company)

Platform PC (Google Chrome) iPhone iOS
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 28
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

NO

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

bad
Vertical play/use mode  
Horizontal play/use mode  


Comparison 3

Name Our system StreetsPlayer
Icon NTUSTCGM_AGVN StreetsPlayer
Platform PC (Google Chrome) iPhone iOS
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 4 (vertical play/use mode) to 8 (horizontal play/use mode)
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

YES

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

very bad
Vertical play/use mode  
Horizontal play/use mode  


Comparison 4

Name Our system StreetWatcher
Icon NTUSTCGM_AGVN StreetWatcher
Platform PC (Google Chrome) iPhone iOS
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 41
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

YES

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

good
Vertical play/use mode  
Horizontal play/use mode  


Comparison 5

Name Our system Drive around & Go around WH
Icon NTUSTCGM_AGVN

Drive around & Go around WH
(Produced by the same company, and the outputs and UIs of both Apps are the same as well)

Platform PC (Google Chrome) iPhone iOS
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 9
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

YES

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

very bad
Vertical play/use mode  
Horizontal play/use mode  


Comparison 6

Name Our system Streetview Player
Icon NTUSTCGM_AGVN Streetview Player
Platform PC (Google Chrome) Android or PC (Google Chrome)
Starting Location and Ending Location (Navigation Location)

Taroko National Park

(從 花蓮縣秀林鄉富世135號 到 花蓮縣秀林鄉富世154號)

# of Key Frame Fetched 35 about 11
(using the new version of Google Street View API and new image dataset of Street View)
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame).

YES

NO

Is there a transition effect in the video (output) of the system?
Note: the transition effect is the effect of transition from a frame to another frame using transformation matrices for smoothly playing/generating the video.

YES

NO

How smooth do the video look like?
(very good, good, bad, very bad)

very good

very bad
Vertical play/use mode  
Horizontal play/use mode

 

Acknowledgment

 This work was supported in part by the Ministry of Science and Technology
of Taiwan under the grants MOST 104-2221-E-011-083-MY2, MOST 105-2218-E-011-005, MOST 105-2218-E-001-001, MOST 106-3114-E-011-003, and MOST 106-2221-E-011-148-MY3. Authors thank to Google and its APIs for providing map and street view data in this work.

 

On-line Video Watching
Powered by Google Chrome