Читать книгу Cyberphysical Smart Cities Infrastructures. Optimal Operation and Intelligent Decision Making онлайн
75 страница из 94
3.4.1 MINOS
MINOS simulator provides access to 45 000 three‐dimensional models of furnished houses with more than 750 K rooms of different types available in the SUNCG [76] dataset and 90 multi‐floor residences with approximately 2000 annotated room regions that are in the Matterport3D [77] dataset by default. Environments in Matterport3D are more realistic looking than the ones in SUNCG. MINOS simulator can approximately reach hundreds of frames per second on a normal workstation.
To benchmark the system, the authors studied four navigation algorithms, three of which were based on asynchronous advantage actor‐critic (A3C) approach [78] and the remaining one was direct future prediction (DFP) [79].
The most basic one among the algorithms was feedforward A3C. In this algorithm, a feedforward convolutional neural network (CNN) model is employed as the function approximator to learn the policy along with the total value function that is the expected sum of rewards from the current timestamp until the end of the episode. The second one was LSTM A3C that used an LSTM model with the feedforward A3C acting as a simple memory. Next was UNREAL, an LSTM A3C model boosted with auxiliary tasks such as value function replay and reward prediction. Last but not the least, the DFP algorithm was employed that can be considered as Monte Carlo RL [80] with a decomposed reward.