Routing Protocol for Low-power and lossy networks (RPL), as the de-facto routing protocol for IoT networks, neglects to exploit IoT devices’ full capacity to tune their transmission power. One of the reasons is that optimizing the transmission power in parallel with the routing strategy is challenging, given the dynamic nature of wireless links and the constrained resources in IoT devices. Optimizing the transmission power requires evaluating the probability of packet collisions, energy consumption, the number of hops, and interference. We propose Adaptive Control of Transmission pOwer for RPL (ACTOR) for dynamic optimization of transmission power. ACTOR aims at improving throughput in dense networks by passively exploring different transmission power levels. The extent of resources used for this exploration significantly affects the network throughput. Thus, the exploration needs to adapt to dynamism in the environment. We formulate this exploration strategy using the Multi-Armed Bandit framework. The classic solutions of bandit theory including Upper Confidence Bound and Discounted Upper Confidence Bound accelerate the convergence of the exploration and guarantee its optimality. We also enhance ACTOR by mechanisms from RPL to blacklist undesirable transmission power levels and stabilize the topology. Results of the experiments on our 40-node testbed and simulations show that ACTOR achieves higher throughput (increasing the packet delivery ratio by 20%), energy consumption, end-to-end delay, and the number of retransmissions are significantly improved against the standard RPL and the selected benchmark.