Analysis of dropout learning regarded as ensemble learning

Abstract: Deep learning is the state-of-the-art in fields such as visual object recognition and speech recognition. This learning uses a large number of layers, huge number of units, and connections. Therefore, overfitting is a serious problem. To avoid this problem, dropout learning is proposed. Dropout learning neglects some inputs and hidden units in the learning process with a probability, p, and then, the neglected inputs and hidden units are combined with the learned network to express the final output. We find that the process of combining the neglected hidden units with the learned network can be regarded as ensemble learning, so we analyze dropout learning from this point of view.

Results: After the learning, the ensemble output is calculated by using the average of the sub-network outputs. We showed that dropout learning can be regarded as ensemble learning except for using a different set of hidden units in every learning iteration. Using a different set of hidden unit outperforms ensemble learning. We also showed that dropout learning achieves the same performance as the L2 regularizer. Our future work is the theoretical analysis of dropout learning with ReLU activation function.