The proposed tracking framework designed as: Tracker estimates the motion of vehicle or vehicles between the frame sequences. Detector processes in each frame independently and localise the target vehicle or vehicles based on the training classifier. The training codeshoppy classifier updates constantly from the learning process. The learning component also estimates the errors of the detector which it can make two types of errors: the false positive and false negative. In addition, the learning component also can generate positive and negative training samples based on the error estimation for the future detection to avoid errors. It is assumed that both detector and tracker can make errors so FBT has been proposed to monitoring the performance of the tracker. By using the proposed method, more training samples based on the current input video can be generated which the classifier will be updated more accurate.
Vehicle tracking methods can use various features, such as points , models , shapes , and motions . This paper focuses on using the points and the motions of the targets. Window tracking is a widely used in object tracking and there are two approaches in the window tracking process: static template model  and adaptive model . The main difference between them is that the adaptive model can update the template during the tracking process and the other is not. However, the disadvantage of the window tracking is that the templates are limited for appearance modelling. In this process, an adaptive discriminative tracking model has proposed, which the model template of the targets are updated continually in both offline and during the process. The positive results in the neighbourhood frames by the tracking process are used to be the positive training samples in the following detection and tracking process, similarly, the negative results are used as negative training samples. The update strategy can handle the problems of changing appearance of the target and short-term occlusion which is another problem in tracking as tracking will be affected by any frames lost or random similar appearances of background during tracking. The TLD  algorithm built an online feature detector of a single target at the first frame, which can search the target continuously during the entire tracking process. Positive and negative samples are generated for update the detector classification model. This approach addresses the problem of recovering the tracking target in the event of tracking failures but it can only track the area selected in the first frame by the operator. Appearance-based and motion-based are the methods of the vehicle detection. Appearance-based methods recognize vehicles directly from a single image and the motion-based methods require a set of sequenced images or frames of a video in order to recognize vehicles. Most of the literatures used appearance-based methods because this method can detect vehicles from a single image rather than sequenced frames. In this paper, the vehicle detection is applied on flying UAVs so the stationary vehicles cannot be detected from the background by the motion-based method. Thus, the appearance-based detection is used in the detection process. One of the commonly used appearance-based detection approach is the HoG, which is extracted by evaluating edge operators over the whole image and discretizing and binning the orientations of the edge intensities into histogram descriptors that are used for creating classification models. HoG based approach is a commonly used in the appearance-feature-based vehicle detection. HoG features are extracted by evaluating edge operators over the whole image and discretizing and binning the orientations of the edge intensities into histogram descriptors that are used for creating classification models. Su et al.  proposed a vehicle detection approach using HoG feature with the sliding window method. The primary gradient direction has been calculated in order to estimate the orientation of the vehicle. One weakness of the HoG is that it is not rotational invariant feature, which is sensitive to the direction of the targets. They tackle this problem by rotating the sliding window to get the integral histogram values. Gleason et al.  compared the performance of HoG feature and Histogram of Gabor Coefficients (HGC) features used as the descriptors of vehicles, it obtaining an average detection rate of 80%. According to the detection rate figures the HoG has obtained better performance. They also applied Harris corner detectors to identify the interest area of detection as they assumed that vehicles usually contain a large number of edges and corners. Point descriptor is also used in classification method apart from HoG which acts as an area descriptor. Sahli et al.  proposed a local feature-based approach based on Scale-Invariant Feature Transform (SIFT) . They used SIFT feature of vehicles and background to train a SVM classifier to create a model that was used to classify vehicles and background in query images. They obtained an accuracy of 95.2%. Comparing the detection results between the HoG feature and SIFT feature it apparently seems that SIFT feature is better. However, in terms of real-time detection, SIFT feature needs to use more computational resources especially when processing the whole image for small targets. In this paper, the proposed approach integrated feature based method and sliding window method by using HoG feature with a corner detection algorithm FAST (Features from Accelerated Segment Test) which can process quicker than the SIFT feature. Furthermore, the SIFT features have been applied in the tracking section because of its high matching accuracy and the long processing time problem has tackled by narrow the search
2432015 Seventh International Conference of Soft Computing and Pattern Recognition (SoCPaR 2015)area that the targets are most likely to appear in the tracking process.
A.FAST-HoG Detection Method In this detection method, we integrated the FAST corner detection with the HoG descriptors feature because the FAST detection can narrow down the Region of Interest (RoI) for the HoG detection, which can reduce the large processing time of the sliding window process. The FAST detector classifies a pixel p as a corner by performing a simple brightness test on a discretized circle of sixteen pixels around the pixel p. A corner is detected at p if there are twelve contiguous pixels in the circle with intensities that are all brighter or darker than the centre pixel p by a threshold t. A score function is evaluated for each candidate corner in order to perform non-maximal suppression for the final detection where Sbright is the subset of pixels in the circle that are brighter than p by the threshold t ,and Sdark the subset of pixels that are darker than p by t. The HoG feature was originally developed for detecting humans. The idea of the HoG descriptor is that the shape of the objects can always be identified by the distribution of the edge even without precise information about the edges themselves. However, a weakness of the HoG descriptor is that it is not rotationally invariant. To solve this problem, four different directions (0, 45, 90, 135 degrees) of each training samples were used in the proposed method. Each group of the orientated training sample has its own classification model and the final classification model is calculated based on all four of the orientated classification models. The extraction of a HoG feature vector starts with colour and gamma normalisation, then edges are detected by convolving the image patch with the simple mask [-1, 0, 1] both horizontally and vertically. The image patch is then subdivided into rectangular regions cells, and within each cell the gradient for each pixel is computed. In the next step each pixel computes a weighted vote for the orientation of the cell by the gradient magnitude. Those votes are accumulated in to orientation bins with the range of 0 to 180 degrees which identify as the gradient angle that stored in a histogram. Local contrast normalisation is used to suppress the effects of changes in illumination and contrast with the background on the gradient magnitude. This step was found to be essential for better performance which is achieved by grouping cells into large blocks and normalising within these blocks, ensuring that low contrast regions are stretched. The HoG feature vectors extracted from the regions of interest are imported into a binary classifier that determines the presence of a vehicle in the image patch. The method used separate SVMs to train on sample vehicle images that are categorised into four angular offsets (0, 45, 90, and 135). These four SVM’s models are then intergraded as a single classifier model that evaluates a rotationally invariant response for a single HoG feature vector. The Support Vector Machines were chosen as the learning algorithm used in classification as they demonstrated a very high accuracy in previous vehicle detection research.
B.HSV-GLCM Detection Method The second detection method uses HSV colour feature with the GLCM feature. The GLCM is a tabulation of how often different combinations of pixel brightness values occur in an image. The idea of using GLCM for detection is to calculate the values of GLCM by using sliding window method in the input image. These values are considered as the descriptors of the GLCM feature. The GLCM texture can be classified into three groups: contrast (CON), homogeneity (HOM) and entropy (ENT). Before computing the GLCM, there are some measurements of the GLCM need to be set. First of all, the number of grey levels has to be set. A grey image has 256 grey levels, so there will be 256 × 256 (65,536) combinations in the GLCM matrix; analysing it will require huge computing power and waste lots of time. Therefore, to save time and computing power, we could reduce the number grey levels. Usually, we have the choice of 16, 32, or 64. The greater the value, the better the effect, but also the longer the time required. According to the previous research , this value was set to 32 in our process. Secondly, the directions of the offset were set from four different orientations: 0°, 45°, 90°, and 135°. This is because GLCM is not direction invariant. Vehicles are coming from different directions in the video; we define four main directions to detect vehicles, which can solve this invariant problem. Furthermore, the offset distance between the pixels has to be set. Usually, a value of 1 is chosen for the distance between the two pixels. F. Zhou et al.  proposed that idea that there are relationships between distance and the calculated values (CON, HOM and ENT). In the perspective of the authors of this paper, using a Markov Random Field (MRF) could prove that a calculation is correct only when the distance is greater than the value of a GLCM feature. Conversely, when the distance is small, the results of GLCM calculations are random or change. A. Chaddad et al.  also proposed the same idea; in their opinion, when the distance is small, or the two chosen pixels are close together, the result of GLCM calculations rapidly change with any increase of distance. But when the distance becomes large, the result will be more stable. The conclusions of these two papers are same. As a result, it is essential to find a suitable value for the distance. The offset distance was set to 3 in the proposed detection. Once the GLCM measurements were set, the GLCM values of the training samples have been calculated and inserted into the SVM classification model. Each GLCM contains six values, which are the mean and standard deviation for CON, HOM and ENT..
This paper proposed a self-learning tracing method for vehicle detection and tracking from aerial videos. The proposed method can tackle problems in the tracking and detection where the training sample are taken from different vehicles from the tracking target at the first place, which can easily cause errors because of the different feature descriptors. The proposed method can learn the vehicle features and created a unique detection model for each vehicle during the tracking process. A Forward and Backward Tracking mechanism was proposed to check the errors from the tracking and detection process. The proposed method demonstrated a reasonably high accuracy and can successfully detect and track a variety of differing vehicle types under varying rotation, sheering and blurring conditions. This paper also proposed two detection method using FAST-HoG and HSV-GLCM. 5 aerial videos were used for different challenges in the testing. The results have shown that the proposed method can improve the detection performance by using the self-learning tracking mechanism. This paper also compared the proposed approach with other tracking approaches and the results show that the proposed approach has slightly better performance. For the future work, using more training samples can improve the classifier accuracy for the initial detection. Also, the lerning component could be improved by revising the error checking part between the detector and tracker.