Читать книгу Cyberphysical Smart Cities Infrastructures. Optimal Operation and Intelligent Decision Making онлайн
61 страница из 94
Extensive amount of time has been devoted in the past years to gather and build huge datasets for the imaging and language communities. A few considerable markers of this can be ImageNet [8], MS COCO [9], Sun [10], Caltech‐256 [11], and Places [12] created for vision tasks; SQuAD [13], GLUE [14], and SWAG [15] built for language objectives; and Visual Genome [16] and VQA [17] datasets created for joint purposes to name a few.
Apart from playing a pivotal role in the recent advances of the main fields, these datasets also proved to be useful when used with transfer learning methods to help underlying disciplines such as biomedical imaging [18, 19]. However, the aforementioned datasets are prune to restrictions. Firstly, at times it can get extremely costly, both in terms of time and money, to gather all the required data for the collection and label them. Secondly, the collection has to be monitored constantly to assure that they follow certain rules to avoid creating biases that could lead to erroneous results in future works [20] and also make sure that the collected data are all normal and uniform in terms of attributes such as background, size, position of the objects, lighting conditions, etc. However, in contrast, we know that in real‐world scenarios, this cannot be the case, and robots have to deal with a mixture of unnormalized noisy irrelevant data along with the relevant well‐curated ones. Additionally, the agent would be able to interact with the objects in the wild (e.g. picking it up and looking at the object from another angle) and also use its other senses such as smell and hearing to collect information (ssss1).