Computer vision and deep learning to sharpen our view

Computer vision research underway at the University of Melbourne is working to emulate the cognitive speed and efficiency of human vision, such as the capacity to instantly focus on and assess a specific object or view.

This is a complex process and an ambitious goal for sensors being developed for use in applications such as autonomous vehicles. These sensors collect large amounts of image data, and they must be able to process this information quickly enough – as fast or faster than a human – to detect and avoid hazards.

“We are talking about algorithms that enable an efficient recognition of objects and surroundings,” explains Dr Kris Ehinger, a vision systems researcher at the University’s School of Computing and Information Systems.

Close up of human eye with data visualisation overlaid representing augmented reality

“For example, an advanced vision system could work with a vehicle driver by monitoring the driver’s peripheral vision and alerting him or her to hazards.

“For applications like this, the computer vision models we are developing need to be more strategic and more cognitive. We are thinking about the structure of the world and how a visual system should act in that world to accomplish a task,” she says.

“So, in the vehicle example, the computer knows what the driver can see well, and instead it concentrates on what is in, or outside, a person’s peripheral vision.”

Dr Ehinger points to the increasing use of drones as another example where vision systems with improved recognition and information-processing stands to offer much greater functionality. She says the programming of vision systems for drones needs to be extremely strategic in terms of what data they take in and use because they don’t have the processing power to look at and understand everything.

Human-computer intersection

Dr Ehinger’s research is at the intersection of human and computer vision for tasks such as object recognition, visual search and depth perception in natural scenes.

It follows her PhD in the Department of Brain and Cognitive Sciences at the Massachusetts Institute of Technology (MIT) in 2013. There she studied with Professor Ruth Rosenholtz, a leader in the field of visual encoding, particularly in peripheral vision and its implications for visual performance and theories of attention.

In the same way that humans rapidly process what they are seeing, computer vision and deep learning could eventually see computers or autonomous robots similarly being able to quickly sort all the incoming data and focus solely on what it is looking for

Dr Ehinger joined the Faculty of Engineering and Information Technology at the University of Melbourne in 2019, drawn by the chance to work collaboratively with people in other disciplines and industry.

The COVID-19 pandemic did create some unexpected challenges: “Our research has a human experimental component that we haven’t been able to implement, although that is soon to change.”

Dr Ehinger is keen to further develop attention models in computer vision. Attention modelling sets up a selective capability such as being able to weight the importance of different elements of an image or view – to ‘know’ what to focus on and what to disregard.

“This sets up the ability for computers to undertake increasingly more complex visual tasks,” she explains.

“In the same way that humans rapidly process what they are seeing, computer vision and deep learning could eventually see computers or autonomous robots similarly being able to quickly sort all the incoming data and focus solely on what it is looking for. In other words, it will ‘look’ strategically in the parts of a scene that are most informative for the task it is undertaking.”

For Dr Ehinger it’s all about enhancing human endeavour and capability, marrying our human, holistic view of the world around us with the programmed objectivity of computer vision.

Related topics

AI and data Technology and society Computing and information systems