Craniomaxillofacial (CMF) fractures, often resulting from traffic accidents, falls, and head traumas, necessitate prompt diagnosis and analysis with CT images. Our study leverages a segmentation model named 3D Swin UNETR to develop an automated detection system for these fractures. The key finding of this study is the significant improvement in the quality of CMF fracture detection achieved by incorporating an additional input channel containing labels of skull regions, using an additional loss function named Proximity loss, and performing an ensemble inference approach using different models trained by different settings. Clinical evaluations were manually performed by experts where the best-performing model achieved the positive predictive value (PPV) of 82.49%, true positive rate (TPR) of 96.03%, false detection rate (FDR) of 17.51%, false negative rate (FNR) of 3.97%, and F1-score (F1) of 88.23%.