George Stamatelis

and 1 more

We  consider active hypothesis testing with multiple heterogeneous agents. Each agent has access to its own set of experiments, has different action costs and forms its own beliefs. Additionally,  each experiment carries a global cost, and the agents must try to keep the expected cumulative cost below a certain threshold. We study a centralized and a decentralized scenario. Under the centralized scenario, the agents  are instructed how to act by a central controller. Under the decentralized scenario they communicate and exchange information over a directed graph. For each scenario we propose  separate deep reinforcement learning algorithms based on proximal policy optimization. Solutions to the decentralized problem start from fully decentralised training, and progressively introduce two levels of centralisation during training.  We assess the proposed algorithms in an example of anomaly detection over sensor networks, considering three different decentralised communication settings. We infer that all algorithms achieve the required accuracy level considerably faster than a single deep reinforcement learning agent, while satisfying the expected cost constraints when required. Under the assumption that the communication  graph is fully connected, the decentralised agents perform just as well, and sometimes better than the centralised  controller. Out of all decentralised algorithms, the one that uses a global critic is by far the best performing and can compete with the central controller even when the communication graph is not complete.