Project Team: Y.M. Cheung (In-charge)
External Collaboration: Zhongzheng Zhao (SCM, HKBU)
Background and Research Problems:
Traditional Chinese medicine (TCM) has been recently receiving wide attention in the medical field because of its unique effects in chronic and consumptive diseases with little side effects. In general, a variety of Chinese medicinal herbs play the important role in the effectiveness of the TCM. Definitely, the counterfeits of medicinal herbs will not only seriously degrade the effectiveness of the TCM, but would be also possible to jeopardize a patient’s life. Unfortunately, such a counterfeit, particularly those rare medicinal herbs like Cordyceps sinesis, exist everywhere in the market. To detect the herbal counterfeit, one most popular method is to manually investigate the physical properties of a herb such as color, texture, shape, and so forth. Often, this process needs a special knowledge of an expert in the field, and cannot be widely used for public from the practical viewpoint. In this project, we are therefore going to develop a system to detect the herbal counterfeits automatically using the images on mobile platforms. To this end, the following key issues should be addressed:
- Pre-processing and Normalization of Input Herbal Images
Since the input herbal images are capture by a normal mobile device, e.g. mobile phone, they may have different picture size, brightness and contrast, background, resolutions. Hence, we have to pre-process and normalize them before extracting the features from a herbal image.
- How to represent and extract the features?
For different kinds of herbs, their distinctive features are generally different. Under the circumstances, one feasible way is that, for each kind of herbs, we select different feature representations. Although the dimension of the feature vectors extracted in this way may be low, but the feature vectors from different kinds of herbs may have the different types and dimensions. Subsequently, it may make the classification task become difficult because it is hard to measure the difference between two feature vectors. Alternatively, we are going to explore a unified feature representation for different kinds of herbs so that is more suitable for classification task.
- How to measure the difference between two feature vectors?
A unified feature representation may consist of both of categorical and numerical attributes. To make the categorical data measurable in a distance sense, a straightforward way is to transform the categorical values into numerical ones, e.g. the binary strings. In general, such a method has ignored the similarity information embedded in the categorical values and cannot faithfully reveal the similarity structure of the data sets. We have developed a new unified metric that is applicable for mixture of categorical and numerical data. Nevertheless, the further studies of this new metric with some extensions on measuring the difference between herbal features are still needed.
- Dimensional Reduction of Features
The dimension of a herbal feature vector may be high. To circumvent the curse of dimensionality problem, on the one hand, we will reduce their dimensionality by investigating the existing dimension-reduction methods, e.g. Kernel PCA, LLE, and Laplacian Eigenmaps, to evaluate their appropriateness. On the other hand, we are going to perform the two closely related tasks: (1) dimension reduction of features, and (2) classification task simultaneously via optimizing a single objective function. It is expected that this single learning paradigm would lead to a better classification result.
- Classification of Herbal Feature Vectors
We need to classify the herbal feature vectors so that the counterfeit of herbs can be detected. To this end, we will investigate the existing classifiers, e.g. SVM, Bayes Classifier and their variants, after the dimension of feature vectors is reduced by an existing method. On the other hand, we will present a model with a new objective function for classification, in which the importance of each attribute is measured by a weight. Subsequently, the model parameters are learned together with the weights in a single learning paradigm, whereby an optimal classification result is achieved.