Towards building a unified framework for feature selection with ranking functions

2 min read · Published on Apr 7, 2021

In the feature engineering of any data related project, we often have to filter the columns we will use to train our model manually; this filtering often relies on the insights we have on data and also on many criteria that can help us distinguish between the valuable attributes and the redundant or the meaningless ones, this process is known as the feature selection.

They are many different ways of selecting attributes; the one we see today is known as the ranking feature selection. It’s the most basic way. It evaluates each attribute with a function, ranks them and keeps the best attributes.

Although this ranking can be done with various methods, we will explore these method’s classification, see how this classification can help us organize them into a reliable and extensible framework, and use this framework to test and compare their behaviours.

Feature selection ranking methods

In feature selection, we can classify the quality measure of an attribute into five categories according to the classification made by Dash and Liu.

I will, of course, come back to each of these classes when presenting the measures I will consider.