Research Spotlights - XUE, Wei: AI’s Music Revolution

Last year, a K-pop virtual girl band generated with AI (Artificial Intelligence) technology made their debut and released their first single. Since then, the group has received millions of views online. “Maybe in 5 or 10 years, we’ll see that some of the top 10 singers might be created by AI,” predicted Dr. Wei Xue, Assistant Professor at Department of Computer Science, HKBU.

Technology has been changing the way how music is created, performed and perceived over the last few decades. Now, with the advancement of AI, the music world is undergoing a huge revolution that might be beyond your imagination.

In the Annual Gala Concert presented by Hong Kong Baptist University Symphony Orchestra in July 2022, the first human-machine collaborative performance featuring AI virtual choir, AI virtual dancers and an AI media artist was showcased. Being the first of its kind in the world, the performance was developed by a HKBU’s Augmented Creativity Lab research team led by Professor Johnny Poon, Associate Vice-President (Interdisciplinary Research) of HKBU.

A member of the research team, Dr. Xue said AI music was an important part of this project and music making is the representation of human creativity. Entitled Building Platform Technologies for Symbiotic Creativity in Hong Kong, the project is funded by the Theme-based Research Scheme under the Research Grants Council (RGC) for a period of five years.

The First Human-machine Collaborate Performance

Fig 1 — The HKBU Symphony Orchestra and an AI virtual choir perform a newly arranged choral-orchestral version of the song Pearl of the Orient, with an accompanying cross-media visual narrative based on the lyrics and music created by an AI media artist.

In the performance, the AI choir performed a choral piece with Chinese lyrics; the AI media artist created a cross-media visual narrative based on the lyrics and music; and the AI virtual dancers delivered a ballet dance. “The project focuses on how to build the creativity of the AI,” said Dr. Xue. “We make the generative model to make the AI to create music. We can also make the machine to sing better and higher pitches than human do.”

Received his PhD degree in pattern recognition and intelligent systems from Institute of Automation, Chinese Academy of Sciences in 2015, Dr. Xue was a Senior Research Scientist at JD AI Research, Beijing, from November 2018 to December 2021, where he was leading the R&D on front-end speech signal processing and acoustic modelling for robust speech recognition.

In another project funded by the RGC, Dr. Xue, the Principal Investigator of the project, is working on making the machine to create new music just by listening to the audio form of music.

“Most of the times, music is in audio form, not in symbolic form. We want to make the machine to learn creating music by just listening to the audio form of music. We can make it possible by making the machine to learn from a very large scale of data,” he said.

A New Stage of AI Technology

In the past couple of decades, AI technology mainly focused on perceiving the environment, such as face recognition and voice recognition. “Now, we have entered a new stage on how to make the AI to create something, rather than just perceive something. What we are looking at is how to make the machine to create music,” said Dr. Xue.

Dr. Xue’s research focuses not only on AI music generation, but on how to create the next generation of audio content creation, perception, and interaction of the human, machine and environment. “Because of the development of technologies, it is now possible for us to develop new interactions, perception, and content generation in the form of audio, acoustic and music,” he said.

Fig 2 — Content generation, perception and interaction in the form of audio, acoustic and music.

In the area of audio perception, he has been exploring how to make the machine to perceive the environment by audios, like we are using our ears to perceive the environment with the sounds. “This includes how to make the machine to detect the locations of different people, then extract the speech of the target speaker,” he said. “For example, if we are sitting in a cafeteria, you can extract the speech of the speaker in front of you and suppress all the other noises. That is how to make the machine to perceive the acoustic environment.”

Dr. Xue has also been working on the production of 3D immersive audio. “We have generated so many good content - the music, the virtual singer, how can we present the content to the users in a different way, build better, immersive and realistic listening experiences with generative audio,” he said.

Challenges Ahead

While the audio database especially for research in the area of AI music still takes time to build, Dr. Xue said how to create new melodies and impose emotions and concepts into the generative AI music remains a big challenge.

“Generating new music content is not simply replicating existing content. We want to create new values, new thinking and concept of the AI music. So that’s not just sampling from something. That’s the fundamental logic of the music creation - creating new form of music,” he said.

We can now edit an image easily with software and Apps such as Photoshop. But in sound, acoustic environment editing is a yet-to-be-developed area that Dr. Xue would like to explore in the future.

“For example, if you want to grab the person from left to right, or from far to near, or you want mute someone and put some virtual sound in front of you, how to achieve this in sounds, and how to present the new acoustic environment and create a new immersive environment using AirPods or other hearing devices would be a very interesting topic in the future,” he said.