In situations where visible lighting is inadequate for sensing, infrared sensors are commonly employed. However, they often yield blurry images lacking clear textures and terrain/object boundaries. Unfortunately, human visibility diminishes even with infrared sensors providing more visual information than visible sensors, especially at night aerial imagery. To enhance the visibility of aerial infrared images, we propose adopting semantic segmentation, which assigns pixel-wise class labels to various input images, thereby clarifying substantial boundaries. However, training an accurate semantic segmentation model necessitates extensive pixel-wise annotations corresponding to input images, which are lacking in aerial infrared images with ground truth datasets. To address this challenge, we introduce a novel method that automatically generates pixel-wise class labels using solely infrared images and metadata such as GPS coordinates. Our method comprises two pivotal functions: coarse alignment with metadata in geographic information system (GIS) space and fine alignment based on multimodal image registration between aerial images. Aerial image datasets spanning three domains-day, twilight, and night-were created using shortwave infrared (SWIR) and mid-wave infrared (MWIR) images captured by optical sensors mounted on helicopters. Experimental results demonstrate that training on GIS data as label images enables high-precision semantic segmentation across both daytime and nighttime conditions.