Innovative local SEO strategies for modest business growth

Espion Infotech can help you find your target market using local search results on Google, Yelp, Bing, and other search engines. Generally speaking, generic SEO techniques are effective, but if local…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




German Traffic Signs Detection using Yolov3

The German Traffic Sign Detection Benchmark is an object detection problem where the task at hand is to detect traffic signs.Traffic sign detection is still a challenging real-world problem of high industrial relevance.Participating algorithms need to pinpoint the location of given categories of traffic signs (prohibitory, mandatory,danger or other)

The Data-set Information is as follows :

A single-image detection problem

● 900 Images (divided in 600 training images and 300 evaluation

images)

The Yolov3 Algorithm uses concepts like Fully Convolutional Networks, Strides and Upsampling from Deep Learning to learn Objects in a given image.The term Fully Convolutional Network simply means the model only uses convolutions throughout right from providing input till getting the output.

Yolo achieves high accuracy while also being able to run in real-time.The Algorithm looks only once at the image in the sense that it requires only one forward propagation pass through the neural networks to make predictions.

The Dataset is publicly available in the following link.

Image format

Annotation format

Annotations contain the following information:

Performance Metric :mAP (Mean Average Precision )

The computer vision community has converged on the metric mAP to compare the performance of object detection systems.To understand mean average precision, we must spend some meaningful time with the precision-recall curve.

The Precision Recall Curve

Precision is a measure of, “when your model guesses how often does it guess correctly?” Recall is a measure of “has your model guessed every time that it should have guessed?

Models that involve an element of confidence can tradeoff precision for recall by adjusting the level of confidence they need to make a prediction.In other words, if the model is in a situation where avoiding false positives is more important than avoiding false negatives, it can set its confidence threshold higher to encourage the model to only produce high precision predictions at the expense of lowering its amount of coverage.

The process of plotting the models precision and recall as a function of the model’s confidence threshold is the precision recall curve. It is downward sloping because as confidence is decreased, more predictions are made (helping recall) and less precise predictions are made (hurting precision).

As the model is getting less confident, the curve is sloping downwards. If the model has an upward sloping precision and recall curve, the model likely has problems with its confidence estimation.

Object detection systems make predictions in terms of a bounding box and a class label.

A sketch of object detection by yours truly

In practice, the bounding boxes predicted in the X1, X2, Y1, Y2 coordinates are sure to be off (even if slightly) from the ground truth label. We know that we should count a bounding box prediction as incorrect if it is the wrong class, but where should we draw the line on bounding box overlap?

The Intersection over Union (IoU) provides a metric to set this boundary at, measured as the amount of predicted bounding box that overlaps with the ground truth bounding box divided by the total area of both bounding boxes.

A graphical depiction of the IoU metric by yours truly.

Picking the right single threshold for the IoU metric seems arbitrary. One researcher might justify a 60 percent overlap, and another is convinced that 75 percent seems more reasonable. So why not have all of the thresholds considered in a single metric? Enter mAP.

In order to calculate mAP, we draw a series of precision recall curves with the IoU threshold set at varying levels of difficulty.

A sketch of mAP precision-recall curves by yours truly.

In my sketch, red is drawn with the highest requirement for IoU (perhaps 90 percent) and the orange line is drawn with the most lenient requirement for IoU (perhaps 10 percent). The number of lines to draw is typically set by challenge. The COCO challenge, for example, sets ten different IoU thresholds starting at 0.5 and increasing to 0.95 in steps of .05.

Almost there!

Finally, we draw these precision-recall curves for the dataset split out by class type.

A sketch of mAP by object class by yours truly

The metric calculates the average precision (AP) for each class individually across all of the IoU thresholds. Then the metric averages the mAP for all classes to arrive at the final estimate. 🤯

First,There are 43 traffic signs in total.Let’s see their names.We have a csv file containing the names of all traffic signs.Pandas Dataframe would help us view the names.

As information given along with the data-set,the traffic signs which are circular,white ground with red color belong to the Prohibitory class.Traffic Signs which are circular and blue ground belong to the Mandatory class.The Triangular traffic signs with Red Ground belong to the Danger class and the remaining belong to Other Class.

Our Annotations are provided in a file called gt.txt.

Let’s summarize with a small snippet of code segregating traffic signs according to the above four classes

The FinalClassID column in the above image depicts our four final class id’s.

Next Step is to create XML files for our annotations.

For creating XML files we have a package called Element Tree in python.The following snippet of code would help us achieve in creating XML files.

Yolo -Agorithm

YOLO is a clever convolutional neural network (CNN) for doing object detection in real-time. The algorithm applies a single neural network to the full image, and then divides the image into regions and predicts bounding boxes and probabilities for each region. These bounding boxes are weighted by the predicted probabilities.

The theory of Yolov3 is nicely explained in the blog below.It’s nice to give it a read than me replicating it.

Now let’s define some simple terms

Conventionally, one of the biggest challenges in the object detection is to find multiple objects of various shapes within the same neighboorhood. For example, the picture below shows that a person is standing on a boat and hence the two objects are in the close vicinity.

YOLO uses the idea of “Anchor box” to wisely detect multiple objects, lying in close neighboorhood. YOLO’s Anchor box requires users to predefine two hyperparameters:

so that multiple objects lying in close neighboorhood can be assigned to different anchor boxes.

The more anchor boxes, the more objects YOLO can detect in a close neighboorhood with the cost of more parameters in deep learning model.

Here we will use pre-defined anchors that is dervied using K-Means Clustering on Pascal-VOC Dataset.

Non-max Suppression

NMS is used to make sure that in object detection, a particular object is identified only once. Consider a 100X100 image with a 9X9 grid and there is a car that we want to detect. If this car lies in multiple cells of grid, NMS ensures we identify the optimal cell among all candidates where this car belongs.

The way NMS works is :
→ It first discards all those cells where probability of object being present (calculated in final softmax layer) is <= 0.6
→ Then it takes the cell with largest probability among candidates for object as a prediction
→ Finally we discard any remaining cell with Intersection over union value >= 0.5 with the prediction cell.

Feature extractor

A new 53-layer Darknet-53 is used to replace the Darknet-19 as the feature extractor. Darknet-53 mainly compose of 3 × 3 and 1× 1 filters with skip connections like the residual network in ResNet. Darknet-53 has less BFLOP (billion floating point operations) than ResNet-152, but achieves the same classification accuracy at 2x faster.

Yolov3 Model Architecture

The Following snippet of code implements the above architecture in tensorflow.

Keras’s Batch Generator

I will combine all the input and output encoding discussed above and create a Keras’s batch generator. The idea behind using a Keras generator is to get batches of input and corresponding output on the fly during training process, e.g. reading in 100 images, getting corresponding 100 label vectors and then feeding this set to the gpu for training step. I need to use generator as the entire input data (Nimages, height, width, N channel) is pretty large and does not fit in memory.

During the training,

is called at the beginning of every batch calculation to prepare input batch and output batch.

The Following snippet of code achieves the above.

Yolov3 Loss

YOLO predicts multiple bounding boxes per grid cell. To compute the loss for the true positive, we only want one of them to be responsible for the object. For this purpose, we select the one with the highest IoU (intersection over union) with the ground truth. This strategy leads to specialization among the bounding box predictions. Each prediction gets better at predicting certain sizes and aspect ratios.

YOLO uses sum-squared error between the predictions and the ground truth to calculate loss. The loss function composes of:

Extraction of the ground truth output is simpler than the extraction of the prediction output because the ground truth is already encoded in the correct scales.

The localization loss measures the errors in the predicted boundary box locations and sizes. We only count the box responsible for detecting the object.

We do not want to weight absolute errors in large boxes and small boxes equally. i.e. a 2-pixel error in a large box is the same for a small box. To partially address this, YOLO predicts the square root of the bounding box width and height instead of the width and height. In addition, to put more emphasis on the boundary box accuracy, we multiply the loss by λcoord.

Confidence loss

If an object is detected in the box, the confidence loss (measuring the objectness of the box) is:

Most boxes do not contain any objects. This causes a class imbalance problem, i.e. we train the model to detect background more frequently than detecting objects. To remedy this, we weight this loss down by a factor λnoobj (default: 0.5).

Classification loss

If an object is detected, the classification loss at each cell is the squared error of the class conditional probabilities for each class:

Most classifiers assume output labels are mutually exclusive. It is true if the output are mutually exclusive object classes. Therefore, YOLO applies a softmax function to convert scores into probabilities that sum up to one. YOLOv3 uses multi-label classification. For example, the output labels may be “pedestrian” and “child” which are not non-exclusive. (the sum of output can be greater than 1 now.) YOLOv3 replaces the Softmax function with independent logistic classifiers to calculate the likeliness of the input belongs to a specific label. Instead of using mean square error in calculating the classification loss, YOLOv3 uses binary cross-entropy loss for each label. This also reduces the computation complexity by avoiding the softmax function.

Bounding box prediction & cost function calculation

YOLOv3 predicts an objectness score for each bounding box using logistic regression. YOLOv3 changes the way in calculating the cost function. If the bounding box prior (anchor) overlaps a ground truth object more than others, the corresponding objectness score should be 1. For other priors with overlap greater than a predefined threshold (default 0.5), they incur no cost. Each ground truth object is associated with one boundary box prior only. If a bounding box prior is not assigned, it incurs no classification and localization lost, just confidence loss on objectness. We use tx and ty (instead of bx and by) to compute the loss.

The final loss adds localization, confidence and classification losses together.

Feature Pyramid Networks (FPN) like Feature Pyramid

YOLOv3 makes 3 predictions per location. Each prediction composes of a boundary box, a objectness and 80 class scores, i.e. N × N × [3 × (4 + 1 + 80) ] predictions.

YOLOv3 makes predictions at 3 different scales (similar to the FPN):

To determine the priors, YOLOv3 applies k-means cluster. Then it pre-select 9 clusters. For COCO, the width and height of the anchors are (10×13),(16×30),(33×23),(30×61),(62×45),(59× 119),(116 × 90),(156 × 198),(373 × 326). These 9 priors are grouped into 3 different groups according to their scale. Each group is assigned to a specific feature map above in detecting objects.

Training

The training is done through batches of images using keras image batch generator which has already been discussed.The training code is given below.We need to pass the necessary parameters as well.

Final Predictions

Our Model does a very good job and achieves a mean Average Precision of 0.7994.

Predicted Images

Deployment of Model should be done.

The Whole Code for the project can be accessed from the following link

Add a comment

Related posts:

5 Reasons why you should invest in Stemi through Funderbeam platform

Investments into startups come with a high risk, but also a high return on the investment. Studies show that, on average, exits generated 2.6 times the invested capital in 3.5 years from investment…

The Hamilton Song Bracket

So I ranked the top 32 Hamilton Musical songs by views on YouTube and created a Hamilton song bracket.. “The Hamilton Song Bracket” is published by Mark Joseph.

DAOs and the Data Economy

What happens if you create Web3 sign-in and a payments process for a data marketplace? Traditional data marketplaces, such as Thompson Reuters and Dun & Bradstreet, require purchasers to have credit…