Deep Learning for End-to-End Automatic Target Recognition from Synthetic Aperture Radar Imagery
2018 Feb 06Deep Learning for End-to-End Automatic Target Recognition from Synthetic Aperture Radar Imagery
Abstract: The standard architecture of synthetic aperture radar (SAR) automatic target recognition (ATR) consists of three stages: detection, discrimination, and classification. In recent years, convolutional neural networks (CNNs) for SAR ATR have been proposed, but most of them classify target classes from a target chip extracted from SAR imagery, as a classification for the third stage of SAR ATR. In this report, we propose a novel CNN for end-to-end ATR from SAR imagery. The CNN named verification support network (VersNet) performs all three stages of SAR ATR end-to-end. VersNet inputs a SAR image of arbitrary sizes with multiple classes and multiple targets, and outputs a SAR ATR image representing the position, class, and pose of each detected target. This report describes the evaluation results of VersNet which trained to output scores of all 12 classes: 10 target classes, a target front class, and a background class, for each pixel using the moving and stationary target acquisition and recognition (MSTAR) public dataset.
Conclusion: By applying CNN to the third stage classification in the standard architecture of SAR ATR, the performance has been improved. In order to improve the overall performance of SAR ATR, it is important not only to improve the performance of the third stage classification but also to improve the performance of the first stage detection and the second stage discrimination. In this report, we proposed a CNN based on a new architecture of SAR ATR that consists of a single stage, i.e. endto-end, not the standard architecture of SAR ATR. Unlike conventional CNNs for target classification, the CNN named VersNet inputs a SAR image of arbitrary sizes with multiple classes and multiple targets, and outputs a SAR ATR image representing the position, class, and pose of each detected target. We trained the VersNet to output scores include ten target classes on MSTAR dataset and evaluated its performance. The average IoU for all the pixels of testing (2420 target chips) is over 0.9. Also, the classification accuracy is about 99.5%, if we select the majority class of maximum probability for each pixel as the predicted class.