Rapid Object Detection in .NET - CodeProject
The most popular and the fastest implementation of Viola-Jones object detection algorithm is undoubtedly the implementation of OpenCV. But OpenCV requires wrapper classes to be usable with .NET languages and
Bitmap objects of .NET have to be converted to
IplImage format before it is used with OpenCV. On the other hand, programs that use OpenCV are dependent on all OpenCV libraries and their wrappers. These are not problems when functions of OpenCV are used. But if we need only to detect objects on a Bitmap, it isn't worth to make our programs dependent on all OpenCV libraries and wrappers...
I have written a library (HaarCascadeClassifier.dll) that makes object detection possible in .NET without any other library requirement. It is an Open Source project that contains implementation of the Viola-Jones object detection algorithm. The library uses haar cascades generated by OpenCV (XML files) to detect particular objects such as faces.
It can be used for object detection purposes or only to understand the algorithm and how parameters affect the result or the speed.
In fact, my purpose is to share HaarCascadeClassifier.dll and its usage. So I will try to summarize the algorithm.
Algorithm of Viola and Jones doesn't use pixels directly to detect objects. It uses rectangular features that are called "haar-like features". These features can be represented using 2, 3, or 4 rectangles.
The value of a haar-like feature is calculated as the difference between the average of the pixels under the rectangle that covers the feature and the average of the pixels under black rectangles. Each rectangle is represented with 5 values; x, y, width, height, and weight. x and y are coordinates where the rectangle will be placed on the searching window. Width and height are the size of the rectangle. Weight is multiplied by the sum of all the pixels insinde the rectangle. (Therefore we can say that weight values of black rectangles are negative.)
Cascades and Stages
A cascade of weak classifiers are used to determine if a subwindow on an image contains the searched object. The structure of a cascade can be represented programmatically as follows:
' Feature rectangle Public Structure FeatureRect Public Rectangle As Rectangle Public Weight As Single End Structure ' Binary tree nodes Public Structure Node ' Feature rectangles Public FeatureRects As List(Of FeatureRect) ' Threshold for determining what to select (left value/right value) or where to go on ' binary tree (left or right) Public Threshold As Single ' Left value Public LeftVal As Single ' Right value Public RightVal As Single ' Does this node have a left node? (Checking a boolean takes less time then to control ' if left node is null or not.) Public HasLNode As Boolean ' Left node. If the current node doesn't have a left node, this will be null. Public LeftNode As Integer ' Does this node have a right node? Public HasRNode As Boolean ' Right node. If the current node doesn't have a right node, this will be null. Public RightNode As Integer End Structure ' Will be used as a binary tree Public Structure Tree Public Nodes As List(Of Node) End Structure ' Stages Public Structure Stage ' Trees in the stage. Public Trees As List(Of Tree) ' Threshold of the stage. Public Threshold As Single End Structure ' Stages of the cascade Public Stages As List(Of Stage)
Features are applied to a subimage until the subimage is rejected by one of them or passed from all.
The value of a node is calculated as the sum of all the features inside the node and it is used to determine the next node or the value that will be added to the stage value. If the current node is a leaf (it means it has no nodes) and the calculated node value is less than the threshold, then the left value (else right value) is added to the stage value (
LeftVal). If the current node is not a leaf, features on the next node (if the feature value is smaller than the threshold
RightNode) that is selected according to the value of the current node have to be applied to the same subimage. After it is done for each tree in the current stage, the calculated stage value is compared with the stage threshold. If it is less, it means the subimage doesn't contain the searched object and it is eliminated directly without applying the remaining stages.
Integral image is an approach that speeds up the algorithm. Integral image at a location is the sum of all pixels above and on the left of the location. (
CalculateCumSums8bpp()). It is used to calculate the sum of pixels within a rectangle very rapidly.
For example, the sum of pixels under the hatched rectangle can be calculated as IntegralImage(D) - IntegralImage(B) - IntegralImage(C) + IntegralImage(A).
During the training process, variance of each image is normalized to reduce the effect of different lighting conditions. Therefore, normalization must be done for all subwindows during detection. Also thresholds can be normalized instead of normalizing image variance (this is faster). Thresholds are multiplied by standard deviation of pixels within the current subwindow for normalization.
The scanning process is an important part of the algorithm. Allmost all parameters required by the
Detect() function of the library are used to optimize this process.
During the scanning process, classifiers are applied to all subimages using a scaled searching window. (Features are also scaled.)
While Scaler < MaxScale .... Scaler = Scaler * ScaleMult End While
The step size of the searching window is calculated by multiplying
SlidingRatio and the current searching window width. It means each step of the searching window is equal to
For i = 0 To ImageWidth - SearchingWinWidth - 1 Step StepSize For j = 0 To ImageWidth - SearchingWinHeight - 1 Step StepSize ... Next Next
Average of Nested Rectangles
Objects can be detected at different scales more than once. This causes the rectangles that represent locations of objects to be nested.
The average of these rectangles must be calculated and all the nested rectangles must be eliminated from results that will be returned.
If distances between edges of rectangles are short enough, they are considered to be nested. The destination threshold is obtained by multiplying
sizeMultForNesRectCon by the size of one of these rectangles (
Using the Library
The library is easy to use in all .NET languages. An example is given below in C#. More examples can be downloaded with compiled executables from here.
||The detector stops searching when number of detected objects reaches this value.|
||Minimum neighbour areas that have to be passed from the detector to verify existence of searched object.|
||First scaler of searching window.|
||Maximum scaler of searching window.|
||The ratio of step size to scaled searching window width. (
Preparing the required parameters:
int maxDetCount = Int32.MaxValue; int minNRectCount = 0; float firstScale = detector.Size2Scale(100); float maxScale = detector.Size2Scale(400); float scaleMult = 1.1f; float sizeMultForNesRectCon = 0.3f; float slidingRatio = 0.2f; Pen pen = new Pen(Brushes.Red, 4); HaarCascadeClassifer.HaarDetector.DetectionParams detectorParameters; detectorParameters = new HaarCascadeClassifer.HaarDetector.DetectionParams(maxDetCount, minNRectCount, firstScale, maxScale, scaleMult, sizeMultForNesRectCon, slidingRatio, pen);
Loading an embedded XML haar cascade:
XmlDocument xmlDoc=new XmlDocument(); xmlDoc.LoadXml(HaarCascadeClassifer.My.Resources.Resources.haarcascade_frontalface_alt); HaarDetector detector = new HaarDetector(xmlDoc);
Loading a haar cascade from an external XML file:
XmlDocument xmlDoc=new XmlDocument(); xmlDoc.Load("haarcascade_frontalface_alt.xml"); HaarDetector detector = new HaarDetector(xmlDoc);
Bitmap bmp = new Bitmap("C:\bitmap.bmp"); HaarCascadeClassifer.HaarDetector.DResults results = detector.Detect(ref bmp, detectorParameters);
results" contains the following:
results.NOfObjects // Number of detected objects results.DetectedOLocs // Locations of detected objects (Rectangles) results.SearchedSubRegionCount // Number of searched subregions