Comparison of Performances of Different Machine Learning Classifiers
Programming simplifies every task in the human world, evolving with every new technology. It is reshaping, redefining and realigning the world with every new byte. Without computer science, countless devices that we rely on today would not exist – from space shuttles, medical devices to the cell phone. Software is indeed changing the world in unimagined ways. It can document and offer unexpected convenience to our routine tasks.
We developed and simulated a proposed model by using Python programming language. In this model a comparative study has been performed between five state-of-the art machine learning algorithms, namely LR, RF, NB, SVM, DT and the proposed method Improved RF(IRF). Among these five state-of-the-art machine learning techniques, some techniques show better accuracy whereas performances of some other techniques are not so good.
To boost up the accuracy and performance of the weak classifiers, authors used advanced ensemble machine learning technique by incorporating an idea to utilize an Adaptive Boosting Algorithm. The proposed Improved RF(IRF) algorithm is an ensemble meta-algorithmic technique for machine learning, in which the authors changed the default base estimator of a boosting algorithm to combine the performance of low performing classifier with this Adoptive Boosting Algorithm.
Along with other five models our proposed Improved RF ensemble learning classification and accuracy prediction model were applied to a dataset collected from https:/www.kaggle.com/amanajmera1/framingham-study-dataset. Information linked to hospital patients. The subjects were systematically selected in this CHD study as a sample of 4240 patients who went for medical examinations out of which 644 instances have heart disease.
As can be seen in Table #, there are large differences in terms of the percentage of the presence of coronary heart disease in patients, with the lowest at 72.48% in DT and the highest at 100% in Improved RF(IRF) in the dataset
A confusion matrix represents the statistics of real and projected classifications achieved from the analysis of different classification systems. The performance of all such systems is generally assessed by using the data generated in this matrix. Table # shows the results generated from confusion matrices by using different machine learning algorithms.
The performance of the proposed system along with other methods were evaluated based on sensitivity, specificity, and accuracy tests, which use the true positive (TRPOS), true negative (TRNEG), false negative (FLNEG), and false positive (FLPOS) terms and the results are shown in Table #. Sensitivity indicates the number of patients that are correctly classified healthy in the dataset whereas specificity denotes the proportion of persons that are correctly classified as sick.
The comparison of proposed ensemble model (IRF) with other widely used individual classifiers is shown in Figure #. It is clear from the comparison that proposed ensemble technique has highest accuracy, sensitivity and specificity values((sensitivity=1, specificity=1) for heart disease dataset.The ROC charts for these experiments with individual machine learning techniques.
Six ROC charts drawn in different parts for 10 folds cross validation, are drawn in blue color. Experimental results show that the proposed method(IRF) outperformed all other previously used methods discussed in literature study, in terms of cross validation accuracy. With the proposed model the generated AUC value reaches 1.
From the CKS analysis values of the five popular machine learning techniques and the proposed model, it can be proved easily that the IRF performed much better than other classifier(value=1).