TSP is one of the most famous problems in graph theory, as well as one of the typical NP-hard problems in combinatorial optimization. Its applications range from how to plan the most reasonable and efficient road traffic to how to better set up nodes in the Internet environment to facilitate information flow, among others. Reinforcement learning has been widely regarded as an effective tool for solving combinatorial optimization problems. This paper attempts to solve the TSP problem using different reinforcement learning algorithms and evaluated the performance of three RL algorithms (Q-learning, Sarsa, and Double Q-Learning) under different reward functions, ε-greedy decay strategies, and running times. The results show that the Double Q-Learning algorithm is the best algorithm, as it could produce results closest to the optimal solutions, and by analyzing the results, better reward strategies and epsilon-greedy decay strategies are obtained.