Graph Neural Networks(GNNs) integrating contrastive learning have attracted growing attentions in urban traffic flow forecasting. However, most existing graph contrastive learning works donot perform well in capturing local-global spatial dependencies and designing contrastive learning scheme in both spatial and temporal dimensions. We argue that these works cannot well extract the spatial-temporal features and are easily affected by data noise. In light of these challenges, this paper proposes an innovative Urban Spatial-Temporal Graph Contrastive Learning framework(UrbanGCL) to improve the accuracy of urban traffic flow forecasting. Specifically, UrbanGCL proposes feature-level and topology-level data augmentation to solve data noise and incompleteness, learn both local and global topology features. Afterwards, the augmented traffic feature matrices and adjacency matrices are fed into a simple yet effective dual-branch network with shared parameters to capture spatialtemporal correlations within traffic sequences. Moreover, we propose the spatial and temporal contrastive learning auxiliary tasks, so as to alleviate the sparsity of supervision signal, and extract the most critical spatial-temporal information. Extensive experimental results on four real-world urban datasets demonstrate that UrbanGCL significantly outperforms other state-ofthe-art methods. The source code of our model is available at https://github.com/panlinnn/UrbanGCL.