Cooperative multi-agent manipulation systems allow to extend on the manipulative limitations of individual agents, increasing the complexity of the manipulation tasks the ensemble can handle. Controlling such a system requires meticulous planning of subsequent subtasks, queried to the individual agents, in order to execute the master task successfully. Real-time planning is essential to ensure the task can still be achieved when subtasks execution suffers from uncertainty or when the master task changes intermittently requiring real-time reconfiguration of the plan. In this work we develop a supervisory control architecture tailored to the cooperation of two robotic manipulators equipped with standard pick-and-place facilities in the plane. We consider a toy task description where we control the planar position and orientation of an object. A time-invariant policy function is trained using deep reinforcement learning, which can determine a finite sequence pick-and-place maneuvers to manipulate the object into its desired configuration. A comparison is made between two strategies, with the distinction made based on different treatments of the final step. The more information is given to the policy the easier it trains. In return, it becomes less adaptable and loses some of its generalizability.