The age-of-information (AoI) is used to measure the freshness of the data. In IoT networks, the traditional resource management schemes rely on message exchange between the devices and the base station (BS) prior to communication which causes high AoI, high energy consumption, and low reliability. Unmanned aerial vehicles (UAVs) as flying BSs have many advantages in minimizing the AoI, energy-saving and throughput improvement. In this paper, we present a novel learning-based framework that estimates the traffic arrival of the IoT devices and optimizes the trajectory of multiple UAVs and their scheduling policy. First, the BS predicts the future traffic of the devices. We compare two traffic predictors: 1) the forward algorithm (FA) and 2) the long short-term memory (LSTM). Afterwards, we propose a deep reinforcement learning (DRL) approach to optimize the optimal policy of each UAV. Finally, we manipulate the optimum reward function for the proposed DRL approach. Simulation results show that the proposed algorithm outperforms the random-walk (RW) baseline model regarding the AoI, scheduling accuracy, and transmission power.