Classification of Dry Beans For this assignment, we stay with the dry beans classification problem. However, for the purposes of this assignment make use of the original dataset as uploaded to the post-block 3 assignment folder. Note that the version of the dataset that you have used for post-block assignment 2 had a number of data quality issues injected into the original dataset. Please make sure that you download the original dataset for the purposes of post-block assignment 2 . For the purposes of this assignment you will develop an ensemble learning model of your choice to produce a dry beans classification model. You will then compare the performance of the chosen ensemble learning model to that of a single machine learning model instance of the machine learning approach used for the individual members of the ensemble. As an example, if you decide to implement a random forest, then the performance of the random forest will be compared to that of an individual classification tree. You have to write a report wherein you provide responses in clear narrative on the aspects enumerated below, under appropriate section headings. Note that code will not be evaluated. Tables and figures will also not be considered if these tables and figures are not accompanied by your own explanation of what these tables and figures portray. Complete the assignment in the following steps: Download the DryBeanPBA3.xlsx dataset. The dataset contains 13611 instances, 16 descriptive features, and the class feature Class in column
Q. You now have to very carefully explore the dataset to identify any issues with in this dataset. Identify the issues and explain how you have addressed these issues. Decide on the ensemble learning model that you will use. Give justifications for why you have selected this ensemble learning model. Discuss the data-preprocessing steps that you have implemented to optimally transform the dataset for the ensemble learning approach. Note: do not do unnecessary data transformations. Carefully think about the data transformations needed for the selected ensemble learning approach. Provide justifications for each of these pre-processing steps. Should you decide not to address a data quality issue, justify this decision. Make sure to tune all of the hyperparameters of the selected ensemble learning model. Describe each of the hyperparameters, the process followed to find best values for each, and then list the best values obtained. Now do the same for the individual machine learning model. Discuss the empirical process that you have followed to evaluate the performance of each model, and to compare these two models. Now present and discuss the results of the two models and conclude on which of the two approaches are best. Provide your opinions on why the one model will perform better than the other. Please provide guidance on how to perform this using orange data mining software. Give some guidance on how to answer the questions as well.
