Video surveillance is usually requiring a human operator to interpret the complex information in video streams. With the increasing amount of surveillance cameras installed all over the world, the research community has started to develop advanced algorithms to detect objects in the images and classify them. Wildlife Security has initiated a couple of projects to develop algorithms to detect moving objects in video data and to classify them into humans, rhinos other animals.
Smartphone as Camera Traps
Video surveillance usually requires power supply and ethernet connection to the internet or a local computer. Image classification also requires a powerful processor in the camera. The absence of power and Internet infrastructure in the savannah makes the standard solution challenging. Smartphones with an external solar panel, however, have everything in one unit. Our goal is to implement object classification as an app, and to report the classes with time stamps and position through the cellular network, as an SMS as a last resort.
For the development, 13 smart phones were placed on the savannah of Kolmården Wildlife Park during one day, generating 35 hours of videos of the animals. The system we have developed consists of trap devices (android phones) that can identify rhinos and humans in images and a backend server that collects all information sent from the trap devices.
The implementation uses the fact that the phones are stationary which enables a modelling of the background, a separation of moving objects and the background. These moving objects is then classified. The classification is done in two steps, first interesting features from the image is calculated with a HOG, Histogram of Oriented Gradients. With those features a linear SVM, Support Vector Machine, is used to tell which class the object belongs to.
The operators don't want a report from each frame with the same rhino in it so a tracking is needed to only send one report per object. This is done by a Kalman filter which estimates position, width and height. If the object is standing still too long, it will blend into the background model and when if starts moving again it will be reported one more time.
UL: original image, UR: background model, LL: bounding box on background model, LR: bounding box on original image. The bounding box is green if the object is classified as rhino.
Such reports are then sent to the backend server. A report contains the class of the seen object, the trap device's id and position, a timestamp and an image of the seen object. The backend server saves all information that is sent from the trap devices. Additionally it has functionality for sending a warning SMS if a human has been seen.
Below is a slideshow of some images of animals that have been detected from the videos captured at Kolmården zoo (see previous update). These images (and many more) are used to train the computer to recognise rhinos.
Airborne Camera Systems