Deep Learning models have ushered in leapfrog development in Automatic Modulation Classification (AMC). However, existing AMC models frequently fail to generalize well in testing data with different distribution. In this letter, we propose a simple and efficient baseline: incorporating self-distillation (SD) training strategy into an advanced backbone network. SD constructs a series of training tasks to continuously retrain on source dataset and generates a generalized backbone. Then the trained model is served as the foundation for finetuning on target dataset. The backbone network utilized multi-stream inputs and multi-scale convolutional kernels for increasing feature diversity. Besides, Convolutional Block Attention Module (CBAM) and Residual connection are also introduced into backbone network. Experiments conducted on RadioML2016.10a and RadioML2018.01a demonstrate that the superiority of the proposed backbone network and SD additional boost the generalization ability of the model.