Remote electrical tilt (RET) optimization is a critical aspect of wireless communication networks to achieve efficient coverage and capacity. This paper presents a comprehensive comparison of reinforcement learning (RL)-based agents for RET optimization, exploring various flavors of agents and combinations thereof. We investigate single-step learning and multi-step training approaches, employing either deep Q-network (DQN) or proximal policy optimization (PPO) algorithms, with the agents themselves that either are able to change network state by dictating a single action at a single antenna or dictate actions for multiple cells simultaneously. Additionally, we evaluate the performance of the single-agent and collaborative multi-agent architectures in the context of RET optimization. In other words, we assess learnings of a single agent that learns from 'inputaction-reward' cycles for all of the antennas, in comparison to the myopic yet collaborative learnings of its antenna-specific counterparts, which acquire knowledge (i. e., input to action mappings) based on the 'input-action-reward' cycles specific to the respective antennas. To this end, we train the agents through either a series of single-step episodes or the multi-step episodes, shaping their reward function in accordance. Through extensive experimentation and evaluation, we analyze key metrics such as coverage, capacity, and interference reduction to assess the effectiveness of different agent configurations. The findings provide valuable insights into the strengths and weaknesses of each approach, aiding in the selection and design of RL-based agents for optimizing RET in wireless communication networks.