DISTRIBUTED EVOLUTIONARY POLICY OPTIMIZATION FOR EFFICIENT TRAINING OF MULTI-AGENT REASONING MODELS

ICTACT Journal on Soft Computing ( Volume: 16 , Issue: 2 )

Abstract

Multi-Agent Reinforcement Learning (MARL) has emerged as a key paradigm for solving complex real-world problems involving multiple agents interacting in dynamic environments. However, training MARL models, especially for cooperative reasoning tasks, remains computationally intensive and sample-inefficient due to nonstationarity, credit assignment, and policy coupling issues. Conventional policy gradient methods struggle with convergence and scalability in multi-agent settings. Centralized training frameworks suffer from bottlenecks and synchronization overheads. Evolutionary algorithms, while more robust to non-differentiable objectives, are often too slow when applied in single-node environments. To address these challenges, we propose Distributed Co-evolutionary Policy Optimization (DCPO), a hybrid learning framework that distributes evolutionary computation across multiple nodes. DCPO decomposes the global policy search into sub-population-based parallel explorations, with each node evolving a subset of agent policies using fitness-driven mutation, crossover, and local policy gradient updates. A global coordinator aggregates top-performing policies periodically to ensure cooperative learning convergence. DCPO was tested on standard cooperative MARL benchmarks such as StarCraft II Micromanagement and Multi-Agent Particle Environments (MPE). Compared to traditional baselines such as MADDPG, QMIX, MAPPO, COMA, and EPOpt, DCPO showd up to 37% faster convergence, 25% higher final cumulative rewards, and enhanced generalization in unseen environments.

Authors

A. Rajavel
Kamaraj College of Engineering and Technology, India

Keywords

Multi-Agent Reinforcement Learning, Evolutionary Algorithms, Distributed Learning, Policy Optimization, Cooperative Reasoning

Published By
ICTACT
Published In
ICTACT Journal on Soft Computing
( Volume: 16 , Issue: 2 )
Date of Publication
July 2025
Pages
3893 - 3898
Page Views
17
Full Text Views
2

ICT Academy is an initiative of the Government of India in collaboration with the state Governments and Industries. ICT Academy is a not-for-profit society, the first of its kind pioneer venture under the Public-Private-Partnership (PPP) model

Contact Us

ICT Academy
Module No E6 -03, 6th floor Block - E
IIT Madras Research Park
Kanagam Road, Taramani,
Chennai 600 113,
Tamil Nadu, India

For Journal Subscription: journalsales@ictacademy.in

For further Queries and Assistance, write to us at: ictacademy.journal@ictacademy.in