George Stamatelis - 21DOCS Test Area

We consider active hypothesis testing with multiple heterogeneous agents. Each agent has access to its own set of experiments, has different action costs and forms its own beliefs. Additionally, each experiment carries a global cost, and the agents must try to keep the expected cumulative cost below a certain threshold. We study a centralized and a decentralized scenario. Under the centralized scenario, the agents are instructed how to act by a central controller. Under the decentralized scenario they communicate and exchange information over a directed graph. For each scenario we propose separate deep reinforcement learning algorithms based on proximal policy optimization. Solutions to the decentralized problem start from fully decentralised training, and progressively introduce two levels of centralisation during training. We assess the proposed algorithms in an example of anomaly detection over sensor networks, considering three different decentralised communication settings. We infer that all algorithms achieve the required accuracy level considerably faster than a single deep reinforcement learning agent, while satisfying the expected cost constraints when required. Under the assumption that the communication graph is fully connected, the decentralised agents perform just as well, and sometimes better than the centralised controller. Out of all decentralised algorithms, the one that uses a global critic is by far the best performing and can compete with the central controller even when the communication graph is not complete.