64th ISI World Statistics Congress - Ottawa, Canada

64th ISI World Statistics Congress - Ottawa, Canada

Evaluation of Feature Selection Algorithms based on Synthetic Data


64th ISI World Statistics Congress - Ottawa, Canada

Format: CPS Abstract

Keywords: outlier

Session: CPS 51 - Statistical methodology III

Tuesday 18 July 4 p.m. - 5:25 p.m. (Canada/Eastern)


The primary goal of this paper is to propose a collection of synthetic datasets, inspired by real life scenarios, that can be used as benchmarks for the evaluation of feature selection methods. Several fundamental feature selection algorithms are studied and their performance is evaluated in a controlled experimental setting. The complexity of the generated synthetic datasets is measured and the results are used to classify the datasets into three distinct categories. The degree of matching between the selected features by a given feature selection algorithm and the correct features is determined and linked to the complexity of the datasets. The results of this study can help practitioners of feature selection standardize the evaluation of the feature selection process and select techniques most relevant to the specific area of application.