Image learning

Video surveillance is usually requiring a human operator to interpret the complex information in video streams. With the increasing amount of surveillance cameras installed all over the world, the research community has started to develop advanced algorithms to detect objects in the images and classify them. Wildlife Security has initiated a couple of projects to develop algorithms to detect moving objects in video data and to classify them into humans, rhinos other animals.

Smartphone as Camera Traps

Video surveillance usually requires power supply and ethernet connection to the internet or a local computer. Image classification also requires a powerful processor in the camera. The absence of power and Internet infrastructure in the savannah makes the standard solution challenging. Smartphones with an external solar panel, however, have everything in one unit. Our goal is to implement object classification as an app, and to report the classes with time stamps and position through the cellular network,  as an SMS as a last resort.

For the development, 13 smart phones were placed on the savannah of Kolmården Wildlife Park during one day, generating 35 hours of videos of the animals. The system we have developed consists of trap devices (android phones) that can identify rhinos and humans in images and a backend server that collects all information sent from the trap devices.

The implementation uses the fact that the phones are stationary which enables a modelling of the background, a separation of moving objects and the background. These moving objects is then classified. The classification is done in two steps, first interesting features from the image is calculated with a HOG, Histogram of Oriented Gradients. With those features a linear SVM, Support Vector Machine, is used to tell which class the object belongs to.

The operators don't want a report from each frame with the same rhino in it so a tracking is needed to only send one report per object. This is done by a Kalman filter which estimates position, width and height. If the object is standing still too long, it will blend into the background model and when if starts moving again it will be reported one more time.

 rhino 4

UL: original image, UR: background model, LL: bounding box on background model, LR: bounding box on original image. The bounding box is green if the object is classified as rhino.

Such reports are then sent to the backend server. A report contains the class of the seen object, the trap device's id and position, a timestamp and an image of the seen object. The backend server saves all information that is sent from the trap devices. Additionally it has functionality for sending a warning SMS if a human has been seen. 

Below is a slideshow of some images of animals that have been detected from the videos captured at Kolmården zoo (see previous update). These images (and many more) are used to train the computer to recognise rhinos.


Airborne Camera Systems

Implementing the object classification algorithm on a moving camera platform is even more challenging than a stationary camera, in particular a flying platform observing the animals from the above. We have initiated a project to replicate the algorithm in the camera trap for a aerial sensor platform with both video and thermal cameras. 
The result is presented in the MSc thesis by Carl Karlsson Schmidt, with the title 'Rhino and Human Detection in Overlapping RGB and LWIR Images'.