Peers’ conversation provides a domain of rich emotional information. The latter, apart from facial and gestural expressions, it is also naturally conveyed via peers’ speech, contributing to the establishment of a dynamic emotion climate (EC) during their conversational interaction. Recognition of EC could provide an additional source in understating peers’ social interaction and behavior on top of peers’ actual conversational content. Here, we propose a novel approach for speech-based EC recognition, namely AffECt, by combining peers’ complex affect dynamics (AD) with deep features extracted from speech signals using Temporary Convolutional Neural Networks (TCNNs). AffECt was tested and cross-validated on data drawn from there open datasets, i.e., K-EmoCon, IEMOCAP, and SEWA, in terms of EC arousal/valence level classification. The experimental results have shown that AffECt achieves EC classification accuracy up to 83.3% and 80.2% for arousal and valence, respectively, clearly surpassing the results reported in the literature, exhibiting robust performance across different languages. Moreover, there is a distinct improvement when the AD are combined with the TCNN, compared to the baseline deep learning approaches. These results demonstrate the effectiveness of AffECt in speech-based EC recognition, paving the way for many applications, e.g., in patients’ group therapy, negotiations, and emotion-aware mobile applications