Crater detection is one of the most important methods for planetary exploration. However, complex backgrounds can confuse crater detection, and a large number of small craters will lose features during the training process. To address these problems, we propose a new DEtection TRansformer (DETR) variant network for crater detection called Crater-DETR. First, we design the Correspond Regional Attention Upsample (CRAU) and Pooling (CRAP) operators by computing cross-attention between local features at different scales, which tackle the problem of foreground-background confusion caused by the loss of features after multiple downsampling for small craters. Then, some two-stage DETR variants have the issue of weak supervision in the Transformer Encoder. To alleviate this problem, we propose the Dense Auxiliary Head Supervise (DAHS) training, which could enhance the feature learning ability of the Encoder. Next, Automatic DeNoising (ADN) training is proposed to solve the problem of sparse positive queries in the Decoder to improve the decoding capability. Finally, we present a Small Object Stable IoU (SOSIoU) Loss to optimize the training process since the matching process is more unstable in small craters compared to other sizes of craters. The extensive experiments based on the DACD and the AI-TOD datasets show that Crater-DETR achieves state-of-the-art performance, especially in small craters detection.