Skip to main content

Table 1 Profile of research data sets

From: Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

Name of dataset Sample size Number of attributes Missing values? Task Area
Iris 150 4 No Multi-class Life
Adulta 32,561 13 Yes Binary-class Social
Wine 178 13 No Multi-class Physical
Car evaluation 1728 6 No Multi-class
Breast cancer Wisconsina 699 9 Yes Binary-class Life
Wdbca 569 30 No Binary-class Life
Wpbca 198 31 Yes Binary-class Life
Abalone 4177 8 No Multi-class Life
Wine quality_reda 1599 11 No Multi-class Business
Wine quality_whitea 4898 11 No Multi-class Business
Heart diseasea 303 13 Yes Multi-class Life
Poker handa 25,010 10 No Multi-class Game
  1. aThe dataset ‘Adult’ is a subset of the database ‘Adult Data Set’. The datasets ‘Breast cancer Wisconsin’, ‘Wdbc’ and ‘Wpbc’ are three subsets come from the same database ‘Breast Cancer Wisconsin (diagnostic) data set’. The datasets ‘Wine quality_red’ and ‘Wine quality_white’ are included in the same database ‘Wine Quality Data Set’. Limited to data quality, ‘processed.cleveland’ and ‘poker-hand-training-true’ two subsets were selected as represents of the databases ‘Heart Disease Data Set’ and ‘Poker hand data set’, respectively