360° images and videos have become an economic and popular way to provide VR experiences using real-world content. However, the manipulation of the stereo panoramic content remains less explored. In this paper, we focus on the 360° image composition problem, and develop a solution that can take an object from a stereo image pair and insert it at a given 3D position in a target stereo panorama, with well-preserved geometry information. Our method uses recovered 3D point clouds to guide the composited image generation. More specifically, we observe that using only a one-off operation to insert objects into equirectangular images will never produce satisfactory depth perception and generate ghost artifacts when users are watching the result from different view directions. Therefore, we propose a novel view-dependent projection method that segments the object in 3D spherical space with the stereo camera pair facing in that direction. A deep depth densification network is proposed to generate depth guidance for the stereo image generation of each view segment according to the desired position and pose of the inserted object. We finally merge the synthesized view segments and blend the objects into the target stereo 360° scene. A user study demonstrates that our method can provide good depth perception and removes ghost artifacts. The view-dependent solution is a potential paradigm for other content manipulation methods for 360° images and videos.