Given Python code to train a shallow decision tree for the
classification of flowers. The input features are the flowers’ sepal length, sepal width, petal
length, and petal width, and the label corresponds to the species, i.e. “iris setosa”, “iris ver-
sicolor”, or “iris virginica”. The data is provided in the file iris.csv.1 The code computes the
training accuracy, i.e. the accuracy obtained when classifying all instances from the training
dataset that was used to build the classifier in the first place.
(a) Add a few lines of code to compute the accuracy in a 10-fold cross-validation set up.
Use functions or methods from sklearn for k-fold cross-validation instead of implementing
your own. This will save you time and help you to get more familiar with sklearn. Include
your code in your hard copy homework solution, either handwritten (it’s just a few lines
of code) or a printout. You shouldn’t make changes to the code that was provided, so
there is no need to include any other code in your homework solution than the few lines
that you added. I will not run your code for this homework problem, so you do not
need to upload your code.
Below is the .csv file:
(b) What is the training accuracy? What is the accuracy obtained using 10-fold cross-
validation? Briefly comment on which one is the lowest, and why that does (or does
not) agree with your expectations.