One of the core problems in millimeter wave (mmWave) massive multiple-input-single-output (MISO) communication systems, which significantly affects the data rate, is the misalignment of the beam direction of the transmitter towards the receiver. In this paper, we investigate strategies that identify the best beam within a fixed duration of time. To this end, we develop an algorithm, named Unimodal Bandit for Best Beam (UB3), that exploits the unimodal structure of the mean received signal strength as a function of the available beams and identifies the best beam within a fixed time duration using pure exploration strategies. We derive an upper bound on the probability of misidentifying the best beam, and we prove that the upper bound is of the order O (log 2 K exp {−αnA}), where K is the number of beams, A is a problem-dependent constant, and αn is the number of pilots used in the channel estimation phase. In contrast, when the unimodal structure is not exploited, the error probability is of order O (log 2 K exp {−αnA/(K log K)}). Thus, by exploiting the unimodal structure, we achieve a much better error probability, which depends only logarithmically on K. We demonstrate that UB3 outperforms the state-of-the-art algorithms through extensive simulations.