Since we were small till now and till the end, feeling is and will always be a sort of language to us. The ability to touch and understand what we’re touching is something that we’ve taken for granted since we were born. While this is something we don’t really think about, a robot that has been programmed to touch or see can do one of them but not both together. So, to bridge that gap researchers at MIT’s Computer Science and Artificial Intelligence Laboratory or CSAIL for short have come up with an AI tech that can learn to see by touching and vice versa.
The system up at CSAIL works by creating tactile signals from visual inputs and helps in predicting which object and what part of such an object is being touched. This they did by using a tactile sensor called GelSight, curtesy of another group at MIT.
The team at MIT used a web camera to record nearly 200 objects being touched. This was done not once or twice but nearly 12000 times. Once that was done, the 12000 videos were broken down into static frames. These frames became a part of a dataset known as “VisGel”. VisGel includes more than 3 million visual/tactile images.
By Using AI, the robot or GelSight learns what it means to touch various objects as well as different parts of those objects. Allowing the robot to blindly touch things, they use the dataset they’ve been given to understand what they’re touching and identify the object. This according to researchers will greatly reduce the data that is needed for manipulating and grasping objects.
MIT’s 2016 project uses deep learning to visually indicate a sound or sounds and to also enable a robot to predict responses to physical forces. Both these projects are based of of datasets that don’t help in guiding seeing by touching and vice versa.
So as mentioned earlier, the team had come up with VisGeldataset and one more other thing. This other thing is Generative Adversarial Networks or GANs for short.
GANs use the visual or tactile images to generate other possible images of the thing. It basically uses two things known as “Generator” and “Discriminator”. These two compete with each other in that while one comes up with images of real- life objects to fool the discriminator, the other has to call the bluff. Once and if the discriminator calls it, the generator learns from it and tries to up the game.
As humans we can see things and know exactly how they would feel had we touched it. To get machines to do the same thing, they first had to locate the position of the likely touch and then understand how that exact location would feel had they touched it.
To do this the reference images were used. This allowed the system to see objects and their surroundings. After that the robot arm or GelSight came into use. While the GelSight was being used, it took how various areas felt when touched into its database. While touching things and because of VisGel, the robot knew exactly what it was touching and where.
GelSight to Help with seeing by Touching and Touching by Seeing:
The system up at CSAIL works by creating tactile signals from visual inputs and helps in predicting which object and what part of such an object is being touched. This they did by using a tactile sensor called GelSight, curtesy of another group at MIT.
How GelSight Works:
The team at MIT used a web camera to record nearly 200 objects being touched. This was done not once or twice but nearly 12000 times. Once that was done, the 12000 videos were broken down into static frames. These frames became a part of a dataset known as “VisGel”. VisGel includes more than 3 million visual/tactile images.
By Using AI, the robot or GelSight learns what it means to touch various objects as well as different parts of those objects. Allowing the robot to blindly touch things, they use the dataset they’ve been given to understand what they’re touching and identify the object. This according to researchers will greatly reduce the data that is needed for manipulating and grasping objects.
The Work to Equip Robots with More Human Like Attributes:
MIT’s 2016 project uses deep learning to visually indicate a sound or sounds and to also enable a robot to predict responses to physical forces. Both these projects are based of of datasets that don’t help in guiding seeing by touching and vice versa.
So as mentioned earlier, the team had come up with VisGeldataset and one more other thing. This other thing is Generative Adversarial Networks or GANs for short.
GANs use the visual or tactile images to generate other possible images of the thing. It basically uses two things known as “Generator” and “Discriminator”. These two compete with each other in that while one comes up with images of real- life objects to fool the discriminator, the other has to call the bluff. Once and if the discriminator calls it, the generator learns from it and tries to up the game.
Learning to See by Touching:
As humans we can see things and know exactly how they would feel had we touched it. To get machines to do the same thing, they first had to locate the position of the likely touch and then understand how that exact location would feel had they touched it.
To do this the reference images were used. This allowed the system to see objects and their surroundings. After that the robot arm or GelSight came into use. While the GelSight was being used, it took how various areas felt when touched into its database. While touching things and because of VisGel, the robot knew exactly what it was touching and where.