Читать книгу Cyberphysical Smart Cities Infrastructures. Optimal Operation and Intelligent Decision Making онлайн
77 страница из 94
To benchmark Habitat, the owners employed a few naive algorithm baselines, proximal policy optimization (PPO) [81] as the representer of learning algorithms versus ORB‐SLAM2 [82, 83] as the chosen candidate for non‐learning agents, and tested them on the PointGoal Navigation task on Gibson and Matterport3D. They used Success weighted by Path Length (SPL) [84] as the metric for their performance. The PPO agent was tested with different levels of sensors (e.g. no visual sensor, only depth, only RGB, and RGBD) to perform an ablation study and find the proportion in which each sensor helps the progress. SLAM agents were given RGBD sensors in all the episodes.
The authors found out that first, PPO agents with only RGB perform as bad as agents with no visual sensors. Second, all agents perform better and generalize more on Gibson rather than Matterport3D since the size of environments in the latter is bigger. Third, agents with only depth sensors generalize across datasets the best and can achieve the highest SPL. However, most importantly, they realized that unlike what has been mentioned in the previous work, if the PPO agent learns long enough, it will eventually outperform the traditional SLAM pipeline. This finding was only possible because the Habitat simulator was fast enough to train PPO agents for 75 million time steps as opposed to only 5 million time steps in the previous investigations.