Advancing Computer Vision to New Heights

Dr. Kaiyang Zhou

As an emerging field in artificial intelligence (AI), computer vision helps machines derive meaningful information from digital images or videos and make suitable recommendations from them. While computer vision has promising potential to enrich the functionality of robotics in service delivery, it is currently challenging to deploy it in varying environments. For instance, the autopilot of a car in Hong Kong may recognise the scenery and work optimally here but not in the streets of Shenzhen or Macau. Dr. Kaiyang Zhou, Assistant Professor at the Department of Computer Science, has dedicated his work to overcome these issues and to optimise computer vision for future applications.

When Dr. Zhou began studying computer science as an undergraduate, he concentrated on software engineering and game development. But he soon felt a desire to contribute to more impactful applications. At a close friend’s suggestion, Dr. Zhou began exploring the vast potential of AI, and this eventually became the research focus of his Master’s degree at the University of Bristol, UK, and his PhD at the University of Surrey, also in the UK.

Overcoming Hurdles to Unleash Full Potential

Having earned his reputation as a forerunner in the field, Dr. Zhou has identified several limitations in computer vision that need to be overcome for it to achieve the same phenomenal growth experienced by ChatGPT.

The first of these obstacles is generalisation. Typically, the deep learning process involves data input from both internet sources and actual fieldwork, and this means there are bias or distribution issues. “When the AI is deployed in a different setting, the distribution changes hamper its performance. For example, a model that has only been exposed to daylight will not function if it is deployed at night,” Dr. Zhou explains.

Meanwhile, powerful machine learning has enabled computer vision to identify people through facial recognition and detect objects from a photo. “The current technology is sufficient to determine whether there is a chair or a table in a photo. However, when the model is developed, it cannot change and integrate new functions. This is another hindrance,” Dr. Zhou elaborates while adding that he has immense interest in advancing to the next level by adding model-user interaction. “If we can make this model interact with a user in real-time, then potentially, the user can ask the computer to advise if there is a chair inside a house. This will have vast potential for different kinds of applications. I am exploring ways to create a natural language processing (NLP) module to turn this into reality.”

According to Dr. Zhou, the challenge in making real progress in this direction is a lack of available training data. While there is abundant video footage on the internet, it is challenging to collect images and sounds that correspond to each other to facilitate the machine learning process. “I am working to resolve this issue, both by producing more data through research, or by innovating a model that is capable of learning quickly from limited data,” Dr. Zhou says.

“When these issues are resolved, I am confident that computer vision will enjoy the phenomenal growth in popularity witnessed by ChatGPT during the past few years.”

Robots to Deliver Better Service

Dr. Zhou’s ambition reaches beyond merely gaining popularity with the general public. He has set his sights on applying the computer vision and NLP modules to robots, helping AI to truly interact with people.

He has expressed keen interest in understanding how to apply technologies to robots. Generally speaking, robots require mechanics to ensure their proper functions and to move around. Dr. Zhou would like to explore how to integrate robots with computer vision and highly effective language processing abilities. This will create smoother interaction with users, enable basic capabilities of investigation, and ultimately deliver better service.

“For example, if you ask a robot to get a glass of water for you, it must first understand what a glass is, where to obtain it, and how to fill it with water. All of these steps require computer vision. At the same time, the robot needs to understand the spoken command of ‘get a glass of water’ and this involves understanding natural language. This is a complicated process.”

“AI has been developing rapidly in recent years with the frequent discovery of new technologies. I believe that we will witness a truly interactive robot with computer vision and natural language processing in about five years,” Dr. Zhou proclaims confidently.

Drawn by HKBU’s Interdisciplinary Approach

Dr. Zhou began work at HKBU in August 2023 following a post-doctoral stint at the Nanyang Technological University in Singapore. He says Hong Kong seemed to be a logical location as it offered him the best scenario both personally and professionally. He added that HKBU has a strong culture of interdisciplinary research, and this has also made a big impression on him.

Words of Advice for Research Students

While he has enjoyed success as a researcher, Dr. Zhou’s career has also met challenges. In particular, he laments that during his PhD work, he focused solely on his studies and often neglected the people around him. Hence, his first advice to up-and-coming research students is to maintain a good work-life balance that includes a healthy diet and suitable physical activity.

He also urges students to maintain high levels of determination and persistence. “The research journey is not an easy one. You must have a big heart and determination, and not be afraid of failing. The most important aspect for me is whether you are genuinely interested in your work and whether you think it is bringing about a positive impact,” he shares.