Heating, Ventilation, and Air Conditioning (HVAC) systems contribute significantly to a building’s energy consumption. In the recent years, there is an increased interest in developing transactive approaches which could enable automated and flexible scheduling of HVAC systems based on the customer demand and the electricity prices decided by the suppliers. Flexible and automated scheduling of the HVAC systems make it a prime source for participation in residential demand response or transactive energy systems. Therefore, it is of significant interest to identify an optimal strategy to control the HVAC systems. In this paper, reducing the energy cost while keeping the comfort level acceptable to the users, we argue that such a control strategy should consider both the energy cost and user c omfort simultaneously. Accordingly, we develop the control strategy through the solution of an optimization problem that balances between the energy cost and consumer’s dissatisfaction. This optimization enables us to solve a decision-making problem through first price prediction and then choosing HVAC temperature settings throughout the day based on the predicted price, history of the price and HVAC settings, and outside temperature. More specifically, we formulate the control design as a Markov decision process (MDP) using deep neural networks and use Deep Deterministic Policy Gradients (DDPG)-based deep reinforcement learning algorithm to find the optimal control strategy for HVAC systems that balances between electricity cost and user comfort.