Fill This Form To Receive Instant Help
Homework answers / question archive / Disc 1 This week we discuss association analysis and the advanced concepts (in Chapter six)
Disc 1 This week we discuss association analysis and the advanced concepts (in Chapter six). After reviewing the material answer the following questions:
What are the techniques in handling categorical attributes?
Categorical attributes are variables that define a finite count of different groups or categories. The various ways of handling categorical data include Dummy creation, encoding of ordinal numbers, count encoding, etc. (Tan, Steinbach, Karpatne & Kumar, 2019). Dummies are columns in binary form. When the value of a row is 1, the category in that row is present. If the row value is zero, the group in that row is absent. In ordinal encoding, one of the approaches is to assign ordinal numbers to each group. In count encoding, you group the categories concerning their frequencies. Other subsequent techniques include target encoding, mean encoding, and probability ratio encoding.
How do continuous attributes differ from categorical attributes?
Continuous attributes work with numeric data with infinite values between any two sets of values; hence are different from the categorical ones.
What is a concept hierarchy?
Concept hierarchy is the arrangement from lower-level concept categories to generalized higher-level concepts (Tan et al., 2019). One can apply concept hierarchy is in the generalization of data-mining database records. The general approach focuses on the induction of attributes using crisp concept hierarchies. Also, a person can apply fuzzy concept hierarchies in attribute-oriented processes since they are better than other models. They give a reflection of the degree of belongingness of more than one concept to their direct abstracts. Applying the concept of fuzzy hierarchies, one can employ fuzzy data rules in data summarization.
Note the major patterns of data and how they work.
Furthermore, there are various data patterns like associations, predictions and clusters used in data mining (Delen, 2021). The associations' sequences seek to group things that most of the time co-occur. These patterns also capture the patterns of occurrence like time-ordered variables. Predictions predict the future behaviours of specific variables based on their past behaviour. Finally, clusters group things based on their identified features or behaviours, for instance, assigning buyers distinct sections based on how they behaved when purchasing in the past.
What is K-means from a basic standpoint?
K-means is an essential algorithm employed in machine learning. K-means is an unsupervised algorithm that puts similar items in distinct groups called clusters (Tan, Steinbach, Karpatne & Kumar, 2019). K represents the number of groups that the process achieves. There are three core steps of the K-means clustering process, i.e. k values selection, centroids initialization, and finally, finding the mean of the selected group. In addition, the two essential methods of choosing the K value include the Silhouette and Elbow method.
What are the strengths and weaknesses of K-means
Some advantages of K-means clustering include simplicity in implementation, scalability and faster to massive data, quick adaptation to new data, and clustering of different sizes and data shapes. On the other hand, the drawbacks of K-means include sensitivity to outliers, the complexity of manually choosing k values and a decrease in scalability with an increase in dimensions.
What are the various types of clusters and why is the distinction important?
Moreover, there are different types of clusters essential in machine learning. These groups include connectivity-based, centroid-based, fuzzy groups distribution based clusters, constraint-based clusters and density-based clusters (Tan et al., 2019). Disticnting the clustered data is essential in data mining. In the current world, variables are multidimensional; hence they require different clustering techniques. These techniques may help in solving a variety of business problems. Thus the distinction of one cluster from another is very vital.
What is a cluster evaluation?
Cluster evaluation is the tendency, number and quality of clusters check before considering a specific clustering method (Weili, Hui & Shekhar, 2013). In this analysis, we check for object similarities in a group. Then ensure that a distinction of the objects from those in another group exists. In tendencies' search, we establish that a data set has no uniform distribution of points. Also, the number of optimum clusters is vital in models like K-means clustering. Lastly, we need to gauge the quality of the clustering after the actual process. Quality clustering has minimum intracluster separation and maximum inter-cluster separation.