ICTACT Journals

DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 2 )

Abstract

Multi-Agent Reinforcement Learning (MARL) has emerged as a key paradigm for solving complex real-world problems involving multiple agents interacting in dynamic environments. However, training MARL models, especially for cooperative reasoning tasks, remains computationally intensive and sample-inefficient due to nonstationarity, credit assignment, and policy coupling issues. Conventional policy gradient methods struggle with convergence and scalability in multi-agent settings. Centralized training frameworks suffer from bottlenecks and synchronization overheads. Evolutionary algorithms, while more robust to non-differentiable objectives, are often too slow when applied in single-node environments. To address these challenges, we propose Distributed Co-evolutionary Policy Optimization (DCPO), a hybrid learning framework that distributes evolutionary computation across multiple nodes. DCPO decomposes the global policy search into sub-population-based parallel explorations, with each node evolving a subset of agent policies using fitness-driven mutation, crossover, and local policy gradient updates. A global coordinator aggregates top-performing policies periodically to ensure cooperative learning convergence. DCPO was tested on standard cooperative MARL benchmarks such as StarCraft II Micromanagement and Multi-Agent Particle Environments (MPE). Compared to traditional baselines such as MADDPG, QMIX, MAPPO, COMA, and EPOpt, DCPO showd up to 37% faster convergence, 25% higher final cumulative rewards, and enhanced generalization in unseen environments.

Authors

A. Rajavel
Kamaraj College of Engineering and Technology, India

Keywords

Multi-Agent Reinforcement Learning, Evolutionary Algorithms, Distributed Learning, Policy Optimization, Cooperative Reasoning

Published By

ICTACT

Published In

ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 2 )

Date of Publication

July 2025

Pages

3893 - 3898

Doi

10.21917/ijsc.2025.0539

DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

Abstract

Authors

Keywords

Published By

Published In

Date of Publication

Pages

Doi

Page Views

Full Text Views

Article Details ICTACT Journals

DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

Abstract

Authors

Keywords

Published By

Published In

Date of Publication

Pages

Doi

Page Views

Full Text Views