Home / Expert Answers / Computer Science / given-python-code-to-train-a-shallow-decision-tree-for-the-classification-of-flowers-the-input-feat-pa991

(Solved): Given Python code to train a shallow decision tree for the classification of flowers. The input feat ...



Given Python code to train a shallow decision tree for the
classification of flowers. The input features are the flowers’ sepal length, sepal width, petal
length, and petal width, and the label corresponds to the species, i.e. “iris setosa”, “iris ver-
sicolor”, or “iris virginica”. The data is provided in the file iris.csv.1 The code computes the
training accuracy, i.e. the accuracy obtained when classifying all instances from the training
dataset that was used to build the classifier in the first place.


(a) Add a few lines of code to compute the accuracy in a 10-fold cross-validation set up.
Use functions or methods from sklearn for k-fold cross-validation instead of implementing
your own. This will save you time and help you to get more familiar with sklearn. Include
your code in your hard copy homework solution, either handwritten (it’s just a few lines
of code) or a printout. You shouldn’t make changes to the code that was provided, so
there is no need to include any other code in your homework solution than the few lines
that you added. I will not run your code for this homework problem, so you do not
need to upload your code.

import pandas as pd
from sklearn import tree
from sklearn import metrics
\# Read the dataset into a dataframe and map the lab

Below is the .csv file:


(b) What is the training accuracy? What is the accuracy obtained using 10-fold cross-
validation? Briefly comment on which one is the lowest, and why that does (or does
not) agree with your expectations.

import pandas as pd from sklearn import tree from sklearn import metrics \# Read the dataset into a dataframe and map the labels to numbers df = pd.read_csv('iris.csv') map_to_int setosa':0, 'versicolor':1, 'virginica':2\} df["label"] = df["species"].replace(map_to_int) print(df) \# Separate the input features from the label features = list(df.columns [:4]) X = df[features] y=df["label"] \# Train a decision tree and compute its training accuracy clf = tree.DecisionTreeclassifier(max_depth=2, criterion='entropy') clf.fit(X, y) print(metrics.accuracy_score(y, clf.predict(X)))


We have an Answer from Expert

View Expert Answer

Expert Answer


The training accuracy can be computed using the metrics.accuracy_score function from the sklearn library, by passing in the actual labels (y) and the
We have an Answer from Expert

Buy This Answer $5

Place Order

We Provide Services Across The Globe