With the aid of multiple autonomous unmanned aerial vehicles (UAVs), data collection from the large-scale Wireless Sensor Network (WSN) is a highly efficient, but challenging solution. In this paper, to minimize the total energy consumption of both UAVs, and the effective use of sink nodes’ power, we optimize both the number of sink nodes and the trajectories of multiple UAVs in WSN. To do this, the UAV can start from the docking station and come back to the same dock station after the completion of the mission. To increase the lifetime WSN, we specifically use the Genetic Algorithm to choose the number of sink nodes. Then, we utilize a Deep Q-network-based method providing a Markov decision pr to establish all trajectory paths. We can show that our method can expedite the process of identifying the best answers by simulating its performance and contrasting it with other traditional heuristic benchmark techniques.