A Improving Feature Selection on Heart Disease Dataset With Boruta Approach

Main Article Content

Muhammad Arzanul Manhar
Indah Soesanti
Noor Akhmad Setiawan

Abstract

Coronary artery disease (CAD) is one of the deadliest diseases in the entire world, including in Indonesia. CAD occurs due to narrowing or blockage of coronary arteries which is usually caused by atherosclerosis. Various studies have been conducted with the aim to predict the nature and characteristics of this disease. Some researches uses the Z-Alizadeh Sani dataset which consists of 54 attributes with two results of classification, CAD and Normal to classify its data. Feature selection is one way to reduce the number of attributes that exist by leaving the attributes that have a high effect on the dataset. In this study, the Boruta method is used as a feature selection to minimize the attributes and leave the attributes with high relative with the dataset. By reducing the attributes in the dataset through the feature selection process, sets of 17 and 18 attributes are selected as attributes with high relative with the dataset. These attributes then used to calculate the accuracy value of the dataset using the several classification methods and 90,3% accuracy is obtained from this study.

Article Details

Section
Computer Engineering