Home /
Expert Answers /
Computer Science /
phase-3-30-phase-2-program-which-implements-k-means-algorithm-produces-two-clusters-one-co-pa760

PHASE 3 (30%) Phase 2 program, which implements k-means algorithm, produces two clusters - one containing benign cells (predicted class

`=2`

) and the other one that contains malign cells (predicted class

`=4`

). But there are chances that a malign cell is clustered into a benign cluster and vice versa. In phase 3 you will analyze the quality of the clustering. To check how well your clustering worked, you will calculate the error rate for your clusters. Assume that the column "Class" of the initial data set contains correct clustering of the data points. INSTRUCTIONS There are two parts in phase 3: Write a code to calculate the individual and total error rates of the predicted clusters. Prepare and submit final report a) Write code to calculate the individual and total error rates of the predicted clusters Your phase 3 program will calculate the error rates based on two arguments: The predicted clusters, calculated by your phase 2 program, The correct clusters, specified by the column "Class" of the initial data set. Let's have a look at the example of the cluster assignment with first 20 data points, listed on page Column "Class" represents the correct clusters and column "Predicted_Class" represents the clusters calculated by the

`k`

-means algorithm.Marked data points represent the errors of the k-means clustering: Yellow data points are predicted as class 4 (malign cells), while the correct class is 2 (benign cells). Gray data points are predicted as class 2 (benign cells), while the correct class is 4 (malign cells). Let's define the following notation: Use the following formulae to calculate and print error rates for each cluster:

```
error_(B) =( error_(24) / pclass_(2) )**100%
error_( M)=( erro(r_( 42))/( p)class_( 4))**100%
error_( T)=( error_(a)ll )/( class_(a)ll )
```

Total error rate more than

`50%`

indicates that your program swapped the predicted clusters. Your program has to detect this situation, swap the predicted clusters by replacing 2 with 4 , and 4 with 2 in column "Predicted_Class", and recalculate the error rates. b) Prepare final report that incorporates all the results and your conclusions for phases 1 to 3 . SAMPLE OUTPUT This is the output in case the SAMPle OUtPUT This is the output in case the clusters are swapped and the program swapped the predicted class.clusters are swapped and the program swapped the predicted class.Error data points, Predicted Class 4: Number of all data points: 699 Number of error data points: 28 Error rate for class 2:

`,3.7%`

Error rate for class 4:

`,4.7%`

Total error rate:

`,4.0%`

SUBMISSION GUIDELINES Prepare and submit a PDF with final report that includes:

`>`

Project statement Short description of phase 1,2 and 3 programs (algorithm, description of input data, structure of the programs and description of results)

`>`

Phase 1,2 and 3 results

`>`

Conclusion Submit phase 1,2 and 3 programs together with any data files that may be needed to run your programs. Provide 'readme.txt' file that provides information about how to execute your code.