In spite of the existence of numerous navigation tools/systems, Google Street View, offering only a single static image at a time, is still sometimes preferred for the provision of a realistic scene. However, for the sake of navigation, given the starting and ending locations, a navigation video consisting of images obtained from Google Street View service is desired. Several papers have tried to address this issue in some sense; however, there is still much room for further improvement. First, the generation of navigation video is not very smooth, i.e., the transition from one frame to another frame is not properly controlled, thus resulting a potential abrupt change from one scene toward another. Second, the generated video oftentimes contains many undesired vehicles and people, and the removal of these distracting objects would greatly enhance the quality of the navigational video. In this paper, we first make use of HOG and/or Haar features for detecting vehicles and people, and then we have also made some preliminary trials of using Faster R-CNN and Caffe to speed up detecting vehicles and people. Results are demonstrated to prove the effectiveness of our approaches and compared with similar approaches when applicable to show our improvement. In addition, a post-processing tool is also developed to interactively refine the results in case the automatic object detection is not perfect.
Keywords: Google Earth, Google Street View, HOG and Exemplar-SVMs, HAAR and Adaboost, Region Growing, Image Inpainting, Caffe, Faster R-CNN
Navigation Location: Sydney Opera House |
|||
Output1 |
Output2 |
The lower right video region in Output2 |
The lower right video region in Output2 |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file. |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds. |
Navigation Location: Taroko National Park |
|||
Output1 |
Output2 |
The lower right video region in Output2 |
The lower right video region in Output2 |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file. |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds. |
Navigation Location: Pingtung Kenting Coast |
|||
Output1 |
Output2 |
The lower right video region in Output2 |
The lower right video region in Output2 |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file. |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds. |
Navigation Location: Tainan Gold Coast |
|||
Output1 |
Output2 |
The lower right video region in Output2 |
The lower right video region in Output2 |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file. |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds. |
Navigation Location: Taipei City |
|||
Output1 |
Output2 |
The lower right video region in Output2 |
The lower right video region in Output2 |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that the output2’s area is showing as a “black box”, because the screen recorder software and/or our computer cannot record a region of video-playing avi/mp4 file. |
Please note that because of waiting time (computing time) of this system, the duration of this clips’ “Please wait” has been cut for better representation. Please also note that we use a camera to record our computer’s screen, and the video’s length has been shrunk to 60 seconds. |
Note that the results shown from our proposed system were conducted using
the old version of Google Street View API, as this is a continuing study
started from years ago when the old API was still supported; however,
the Apps that we compare about are all from the currently on-line selling
App store, so they use the new version of Google Street View API and
new image dataset. As a result, these Apps may sometimes be able to fetch
better and more key frames.
Nevertheless, as can be shown in the ensuing comparison results, our system
can still generate better and smoother videos than those Apps due to the fact
that this system performs transitional transformations among key frames,
together with the object detection and inpainting.
Name | Our system | Route Player |
Icon |
|
|
Platform | PC (Google Chrome) | iPhone iOS |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 28 (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system?
Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
NO |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
bad |
Vertical play/use mode | ||
Horizontal play/use mode |
Name | Our system | Course Preview |
Icon |
|
|
Platform | PC (Google Chrome) | iPhone iOS |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 28 (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system? Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
NO |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
bad |
Vertical play/use mode | ||
Horizontal play/use mode |
Name | Our system | StreetsPlayer |
Icon | ||
Platform | PC (Google Chrome) | iPhone iOS |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 4 (vertical play/use mode) to 8 (horizontal play/use mode) (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system? Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
YES |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
very bad |
Vertical play/use mode | ||
Horizontal play/use mode |
Name | Our system | StreetWatcher |
Icon | ||
Platform | PC (Google Chrome) | iPhone iOS |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 41 (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system? Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
YES |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
good |
Vertical play/use mode | ||
Horizontal play/use mode |
Name | Our system | Drive around & Go around WH |
Icon | & |
|
Platform | PC (Google Chrome) | iPhone iOS |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 9 (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system? Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
YES |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
very bad |
Vertical play/use mode | ||
Horizontal play/use mode |
Name | Our system | Streetview Player |
Icon | ||
Platform | PC (Google Chrome) | Android or PC (Google Chrome) |
Starting Location and Ending Location (Navigation Location) | Taroko National Park |
|
# of Key Frame Fetched | 35 | about 11 (using the new version of Google Street View API and new image dataset of Street View) |
Is there a blending effect in the video (output) of the system? Note: the blending effect is referring to the effect of blending/overlapping from the ending frames of one key frame with the starting set of frames from another key frame (from a key frame to another key frame). |
YES |
NO |
Is there a transition effect in the video (output) of the system?
|
YES |
NO |
How smooth do the video look like? |
very good |
very bad |
Vertical play/use mode | ||
Horizontal play/use mode |
This work was supported in part by the Ministry of Science and Technology
of Taiwan under the grants MOST 104-2221-E-011-083-MY2, MOST 105-2218-E-011-005, MOST 105-2218-E-001-001, MOST 106-3114-E-011-003, and MOST 106-2221-E-011-148-MY3. Authors thank to Google and its APIs for providing map and street view data in this work.
On-line Video Watching |