br In recent years cervical cancer has
In recent years, cervical cancer has attracted much attention by being the fourth most common cause of death from can-cer in women .Due to the lack of good screening programs available in developed countries to lower the overall mortality, seventy percent of occurrence and ninety percent of death take place in developing countries . Hence, there is an urgent need for an auxiliary screening scheme based on easily accessible factors is in urgent need. Researchers have demonstrated that some risk factors such as sexual history, smoking history, various symptoms of potential complications, etc. provide a lot of useful information in cervical cancer screening and diagnosis . However, due to privacy concerns and other rea-sons, not all the above information can be collected easily, which leads to great di culties for the existing computer-aided diagnosis approaches. To solve this problem, Fernandes K. et al. in  adpoted mean 4μ8C to estimate the miss-ing attributes, however, this imputation approach is too blunt to yield satisfactory performance. In addition, the work in
 is mainly a study of sharing knowledge within the regression models instead of improving the screening performance. A discussion about recent advances for missing value estimation is provided in . Recently, techniques that impute missing nominal data have also been developed such as in .
Some pioneering works [45,46] have demonstrated that granular computing (GrC) can neatly handle the uncertainty and vagueness in various data mining tasks. Granular computing involves processing a collection of entities that are formed by similarity or indistinguishability. So far, GrC, or fuzzy methodologies are widely used in machine learning, ranging from the fundamental theory aspects [2,4], classifiers aggregation [38,40] to frontier applications, especially in cybernetics, ex-pert systems, and biomedical environments [8,42,43,47,48]. These approaches usually generalized classical machine learning models to their fuzzy counterparts. For instance, Jin et al.  proposed a fuzzy version of the classical Support Vector Ma-chine (SVM) to fit specific data, while [42,47] adopted fuzzy logic in decision-making. Generally speaking, compared with their deterministic counterparts, the fuzzy based approaches have shown their advantages in both classification accuracy and robustness. Inspired by the granular computing (GrC) philosophy, a new algorithm is proposed in this paper to provide accurate and robust cervical cancer screening results. To handle the severe data incompleteness, a Bayesian version of Possi-bilistic C-Means Clustering algorithm is proposed, which can detect valuable patterns robustly for improved data imputation. After data completion, a bagging scheme and an ensemble module are designed for classification with class-imbalanced data. The major contributions of the proposed algorithm are three-folds:
1. The proposed BPCM adopts the Bayesian inference framework rather than the maximum likelihood criterion in the expectation step and thus it is more robust against outliers and is more suitable to deal with the limited data avail-able.
2. The proposed BPCM provides a flexible membership assignment space where data points that belong to multiple classes (i.e., points that cannot be clearly assigned to one class only) and outliers can be well separated.
3. A fuzzy ensemble learning scheme is proposed, which can deal with the class-imbalance problem and handle the uncertainties in data collection, missing attribute completion, etc.
Hence, a reliable cervical cancer screening result can be obtained. In a data set consisting of 858 patients, our framework can provide screening prediction results with an accuracy of 76% and a sensitivity of 79%, which are superior to the results obtained by the existing methods .