
Lorenzo Mur and María Santos, two young researchers from the I3A have been this year's winners of the first prize in the most prestigious conference currently in the field of computer vision, the Computer Vision and Pattern Recognition (CVPR), for their participation in the Second Joint Egocentric Vision Workshop, held recently in the United States. With their work, they were able to beat more than 40 international teams.
A recognition “very important for us, it motivates us to continue researching with more strength in the field of egocentric vision or vision from a first-person point of view, to help people,” they explain. In addition, they underline the fact that sometimes “it is not necessary to have a lot of computational capacity, but by introducing intelligent strategies, even with academic resources, we can make a contribution to our field”.
Both are part of the Robotics, Computer Vision and Artificial Intelligence Group (Ropert). Lorenzo Mur was already awarded last year in this same congress, the most relevant international congress in computer vision.
A paper on egocentric vision
In the proposal they presented, they started with two cameras, one that moves, which could be that of a robot or a person, with the new camera glasses, and another that is fixed, such as a surveillance camera, another robot, an assistant, etc. From that point on, they are able to identify an object from one camera in the other. "For example, if a person is looking at a particular object, we are able to identify it on the fixed camera and vice versa. This, which might seem easy, is quite complicated as an object can be half covered or deformed in one of the cameras. There may also be several similar objects and it may not be easy to know which one is which. In addition, the camera worn by the person is constantly moving, so that many objects appear blurred".
Their research in computer vision allows them to identify the object at pixel level, that is, to know exactly which point of the image belongs to the object and which doesn't. "Our strategy, like many others that competitors have used, is based on artificial intelligence systems (neural networks) learning by showing thousands of videos, but unlike others, we have used other neural networks to help our system learn and then deduce the correct answer. In addition, we have invented a technique to help differentiate objects that are close to each other," explain María Santos and Lorenzo Mur.
More than 1000 hours of video
Last year saw the release of the largest and most comprehensive set of videos (it's over 1000 hours long) that exists on egocentric vision, EgoExo4D. "It has a lot of useful information and because it is so relevant and has so much potential, we decided to make the most of it to advance our research. When the challenge was announced, we had already been working on this task for some time, so we decided to participate," they say.
In total, some 40 teams from different countries participated. The challenge they faced "was very exciting, until the last day you didn't know what position you were in or if there was a much better team in another part of the world. Every day, our results were improving, but without knowing if they were enough to win, since we were competing with some of the best research laboratories in the world, from universities and leading technology centers, which are also working on this type of technology".
Lorenzo Mur and María Santos studied Industrial Technologies Engineering at the University of Zaragoza, he did the master's degree in Robotics, Graphics and Computer Vision and she did the master's degree in Industrial Engineering. They are currently pursuing their PhD in the Systems Engineering and Computer Science program, supervised by José J. Guerrero and Rubén Martínez-Cantín. Alejandro Pérez-Yus and Jesús Bermúdez-Cameo have also participated in the work they have brought to the Computer Vision Congress.
Looking ahead, both intend to continue in the world of research. “Right now, our main goal is to finish the thesis and continue developing our work in this line.”
They consider their experience in the RoPeRT research group and in the I3A "very enriching. It is an environment with people very involved and eager to continue in constant training, always aware of new advances in our field. We have the supervision of researchers who, in addition to having a lot of experience, are close to us and value people, which allows us to work in a safe and motivating environment. This is allowing us to grow a lot both technically and personally", they emphasize.
This is the second year that the Robotics, Computer Vision and Artificial Intelligence research group (RoPeRT) obtains this international recognition at the Computer Vision Congress, the most important in the world in this field.