On the performance of different Deep Reinforcement Learning based controllers for the path-following of a ship

Sivaraj, Sivaraman; Dubey, Awanish; Suresh  Rajendran

doi:10.1016/j.oceaneng.2023.115607

On the performance of different Deep Reinforcement Learning based controllers for the path-following of a ship

Date Issued

15-10-2023

Author(s)

Sivaraj, Sivaraman

Dubey, Awanish

Suresh Rajendran

Indian Institute of Technology, Madras

DOI

10.1016/j.oceaneng.2023.115607

Abstract

A set of continuous state-action space-based deep reinforcement learning algorithms are used for the path following of a ship in calm water and waves. The mathematical model of a KVLCC2 tanker represents the ship dynamics. The mathematical model includes the hull force, rudder force, propulsion force, and external wave forces. Look ahead distance-based guidance algorithm called Line of Sight (LOS) is used for computing the Cross Track Error (CTE) and Heading Error (HE). The reward function is designed based on HE and CTE. The created Environment is trained with four different Deep Reinforcement Learning (DRL) agents named Proximal Policy Optimization (PPO), Deep Deterministic Policy Gradients (DDPG), Twin-Delayed Deep Deterministic Policy Gradients (TD3), and Soft-Actor Critic (SAC). Common Neural Network architecture is used for all four agents. Yaw rate, HE, and CTE serve as input to the Neural Network, and the rudder deflection rate (δ°) corresponds to the action space (output). Computation time, average cross-track error, and rudder actuation are computed and compared for path-following scenarios. DDPG performs better with a minimum average CTE for all the simulated cases. However, SAC demands minimum rudder control effort to achieve the tasks. Finally, the trained agents are validated using Hardware In-Loop (HIL) simulation.

Volume

286

Subjects

Options

On the performance of different Deep Reinforcement Learning based controllers for the path-following of a ship