Decision Trees
- Goal: predict some feature (attribute) of objects on grounds of data collected for these objects
- a Decision Tree ist trained using training data sets, and is then applied to test data sets
- Learning: the Decision Tree is constructed using data with known attributes, called training set
- Prediction: the Decision Tree is used to predict the unknown attributes of objects in the test set
- Training set:
- must be a dataset where the attributes are known
- is used to construct the Decision Tree
- Test set:
- here, we have only the data but not the corresponding attributes
- we want to infer the attributes on grounds of the data, with the help of the beforehand constructed Decision Tree
- we will use the package RWeka here
Training set: Edgar Anderson’s Iris Data
- sepal length, sepal width, petal length and petal width for 50 flowers from each of the 3 species of iris
- the species are Iris setosa, Iris versicolor, and Iris virginica
- Iris dataset: Wikipedia
- this is a training set, i.e. we have the attribute (= species) corresponding to each dataset (consisting of the 4 measures)