Skip to main content

Table 1 Profile of research data sets

From: Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications

Name of dataset

Sample size

Number of attributes

Missing values?

Task

Area

Iris

150

4

No

Multi-class

Life

Adulta

32,561

13

Yes

Binary-class

Social

Wine

178

13

No

Multi-class

Physical

Car evaluation

1728

6

No

Multi-class

Breast cancer Wisconsina

699

9

Yes

Binary-class

Life

Wdbca

569

30

No

Binary-class

Life

Wpbca

198

31

Yes

Binary-class

Life

Abalone

4177

8

No

Multi-class

Life

Wine quality_reda

1599

11

No

Multi-class

Business

Wine quality_whitea

4898

11

No

Multi-class

Business

Heart diseasea

303

13

Yes

Multi-class

Life

Poker handa

25,010

10

No

Multi-class

Game

  1. aThe dataset ‘Adult’ is a subset of the database ‘Adult Data Set’. The datasets ‘Breast cancer Wisconsin’, ‘Wdbc’ and ‘Wpbc’ are three subsets come from the same database ‘Breast Cancer Wisconsin (diagnostic) data set’. The datasets ‘Wine quality_red’ and ‘Wine quality_white’ are included in the same database ‘Wine Quality Data Set’. Limited to data quality, ‘processed.cleveland’ and ‘poker-hand-training-true’ two subsets were selected as represents of the databases ‘Heart Disease Data Set’ and ‘Poker hand data set’, respectively