The popularity of WiFi devices and the development of WiFi sensing have alerted people to the threat of WiFi sensingbased privacy leakage, especially the privacy of human poses. Existing work on human pose estimation is deployed in indoor scenarios or simple occlusion (e.g., a wooden screen) scenarios, which are less privacy-threatening in attack scenarios. To reveal the risk of leakage of the pose privacy to users from commodity WiFi devices, we propose CSIPose, a privacy-acquisition attack that passively estimates dynamic and static human poses in through-the-wall scenarios. We design a three-branch network based on knowledge distillation, self-encoder, and self-attention mechanisms to realize the supervision of video frames over CSI frames to generate human pose skeleton frames. Notably, we design AveCSI, a unified framework for preprocessing and feature extraction of CSI data corresponding to dynamic and static poses. This framework uses the average of CSI sequences to generate CSI frames to mitigate the instability of passively collected CSI data, and utilizes a self-attention mechanism to enhance key features. We evaluate the performance of CSIPose across different room layouts, subjects, devices, subject locations, and device locations, and the evaluation results emphasize the generalizability of the system. Finally, we discuss measures to mitigate this attack.