Landslide dams can result in substantial flood hazards caused by dam formation, overtopping, and dam failure. Previous studies have established datasets on a regional or global scale and identified indices to estimate the probability of landslide dam formation. These datasets are collections of landslide dam records from multiple data sources. However, the precision and accuracy of the landslide dam record’s spatial information hinder the completeness of data and prevents the possibility of linking with other relevant datasets, and thus hinders the exploration of factors affecting landslide dam formation. We established a new global-scale landslide dam dataset, named River Augmented Global Landslide Dams (RAGLAD), which geolocates those records whose location was vaguely known or completely unknown and combined this with additional data from global fluvial datasets to make the data record more comprehensive. We use RAGLAD to study the processes of landslide dam formation. The spatial distribution of landslide dam records, data distribution, triggering processes and preconditions, and the relationships between geomorphological parameters directly derived from RAGLAD help understanding areas prone to landslide dam conditions, and delineate potential thresholds for landslide dam formation. The results are compared with relationships achieved from general landslides studies to find the specific conditions of landslide dam formation. These conditions can be further applied for filtering the potential hazard occurrence area and calculating the landslide dam formation susceptibility.