In safe reinforcement learning (RL), ensuring cost-constraint satisfaction while minimizing the sacrifice of reward acquisition for agent training presents a significant challenge. This paper proposes a simple yet efficient penalty function-based safe RL algorithm called stretchable penalty-based safe policy optimization (S2P2O). S2P2O utilizes the Swish function for shaping the penalty loss, which enables stretching of cost constraint violation situations, leading to improved reward outcomes and tightened cost thresholds. A Kullback-Leibler (KL) divergence stretching mechanism is designed within the Swish-penalty paradigm to enhance sample efficiency and further boost performance. The theoretical error bound between the optimal value of the Swish-shaped objective and the original objective is analyzed. Comprehensive experiments are conducted to benchmark S2P2O against several state-of-the-art (SOTA) safe RL algorithms. The results demonstrate that S2P2O exhibits the enhanced cost constraint satisfaction, superior reward acquisition capacity, and accelerated cost convergence rates.