(Solved): r code Question 1: Note: set the seed using set. seed(5533). (a) Perform k-means clustering on the d ...
r code
Question 1: Note: set the seed using set. seed(5533). (a) Perform k-means clustering on the dailykos.csv data set, with the number of clusters equal to 7 . Choose the best clustering from 10 repetitions of the algorithm, with different choices of the initial center. (Here, the best clustering is one with the smallest total within-cluster sum of squares. You may want to learn about nstart parameter to kmeans command.) For the best clustering you found, identify the top five words frequently used words in each cluster. (b) For the clustering you found in (a), list out all the clusters centers. For this data set, what is the interpretation of the centers? (c) Compare these clusters with those obtained from hierarchical clustering (with euclidean distance, and method = "complete", and 7 clusters). In particular, make a "confusion matrix" style table, where the rows represent the cluster numbers from k-means clustering, columns represent the cluster numbers from hierarchical clustering, and the cell (i,j) represents the number of articles that are in cluster i (under k-means clustering) and cluster j (under hierarchical clustering). Are there clusters that are similar between the two clustering?