The emergence of artificial intelligence (AI) has opened up new ways to simplify human lives. By efficiently collecting and analysing data, AI generates insights and predictions that empower individuals to make more informed decisions. Dr. Yifan Chen, Assistant Professor at the Department of Computer Science, discusses how his statistical background optimises AI performance and benefits a diverse range of industries.
While pursuing his PhD in Statistics at the University of Illinois Urbana-Champaign (UIUC) in the US, Dr. Chen always enjoyed exploring new topics of interest. When he realised the similarities between the framework of prevailing machine learning (ML) models and statistics methods, he was intrigued and decided to further investigate this area. He explains: “I was surprised to find many prevailing ML methods have specific statistical structures that have not been fully recognised or utilised. This gave me confidence in applying the statistical theory to improving the processing speed.”
According to Dr. Chen, the existing models produce deep learning with good quality, but they should be able to work better. His research has identified new ways to boost performance by simplifying the design of the transformer architecture rather than by improving the efficiency of the large models.
In general, the core attention matrix of the transformer calculates the relationship between individual words in a sentence and evaluates the relevance of each word. When a large volume of information is involved, this becomes highly sophisticated and time-consuming. By examining the statistical structure in the transformer, Dr. Chen was able to carefully dissect these complex models into two smaller portions, hence saving processing time while maintaining the quality of work.
Dr. Chen revealed that he is currently collaborating with a biotech company to predict T-cell receptor antigen binding, which has the potential for the development of new vaccines, diagnostics and treatment options. He is also working with partners from business school to optimise their strategic decision-making in the supply chain.
Summing up his innovation, Dr. Chen says: “I have managed to improve the accuracy-efficiency trade-off in deep learning models. Rather than proposing general methods that can be applied to any models, I carefully consider the statistical structure to each model, so as to tailor-make the best possible solution.”
His hands-on approach to reviewing transformer architectures on an individual basis is backed up by a principled understanding of the statistical structure. This is different to his observation that model designs by the industry are typically driven by trial and error. Dr. Chen elaborates: “If I look at the programming codes, I am not able to imagine the model’s nature and its pain points. Once it becomes a formula, it is easier to visualise the areas of improvement. In the case of matrix multiplication, I may search for more mature calculation methods that might be applicable to the project at hand.”
With his commitment to theoretically guaranteed solutions, Dr. Chen places himself among the estimated 6% of AI scientists who engage in the application of theory in real world issues. Approximately 4% work solely on theory development, while the vast majority of 90% are involved only in application, including specific research to look at how to better analyse specific graphics or how to improve the translation of text, etc.
Recollecting the time when he started working on ML projects, Dr. Chen said he faced his fair share of challenges. He was only beginning to grasp large language models at that time, which made life difficult. "I had to overcome this, otherwise my work would not be completed. By manually typing out the codes, I gained a better understanding of how the different parts are interrelated and how one session impacts another," he proudly shares.
Dr. Chen graduated at a time when ChatGPT was rising to prominence, with US-based technology companies and research centres prioritising their resources on developing this tool and riding the latest trend. When considering his career path, Dr. Chen found that he enjoys a sense of achievement after putting his effort into studying the entire workflow, from theory development to application. As such, he believes that academia might be a better option.
Since joining HKBU in August 2023, Dr. Chen has taught classes in the application of core ML techniques to AI applications, thus sharing his expertise with students. He also stresses the importance of understanding the theory framework to his students, urging them to try to simplify the process and know what is not working.
For students who are considering doing research, he likes to share some advice based on his experience. First and foremost, students should try to be published as soon as possible, given the impact that publishing has on career development. Not only does this look impressive on a curriculum vitae (CV) – especially if papers are published in top-tier journals – it also increases the chances of securing scholarships.
Another important aspect of PhD studies is to learn the most from the advisor, who has a stronger research capability and a wealth of resources such as networking. Students should first absorb the advisor’s knowledge, then it is important to seek guidance on how to pursue an academic career by learning from the advisor’s experience.
“Furthermore, peer influence is unavoidable. In general, a student will spend more time communicating with lab-mates than with their undergraduate classmates. As different people think differently, they may also hold views that are outside the box. It is great to learn from different minds, while it is important to prioritise what makes you comfortable,” he says with a smile.
This advice is summed up by his motto: “Engage in work you love and partner with people you admire.” Having begun his career at HKBU only a year ago, his love for the statistical structure in machine learning continues to grow, creating new applications that benefit society.