Fill This Form To Receive Instant Help
Homework answers / question archive / Activity one Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps
Activity one
Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps.
The following is a simple demonstration of using WSS to decide and plot the clusters
based on k-means clusters algorithm.
%% Import the necessary packages
%
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
from sklearn.datasets.samples_generator import make_blobs
from sklearn.cluster import KMeans
%% Generate 6 artificial clusters for illustration purpose
%% Hint: you may need to use make_blobs and scatter functions: check the Python
%% official resources for more information of their usages
%
Insert your code block here
%% Implement the WSS method and check through the number of clusters from 1
%% to 12, and plot the figure of WSS vs. number of clusters.
%% Hint: reference the plots in the lecture slides;
%% You may need to use inertia_ from property WCSS, and kmeans function
%
wcss = []
for i in range(1, 12):
kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10,
random_state=0)
Insert your code block here
%% Categorize the data using the optimum number of clusters (6)
%% we determined in the last step. Plot the fitting results
%% Hint: you may need to call fit_predict from kmeans; scatter
%
kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=300, n_init=10,
random_state=0)
Insert your code block here
plt.scatter(X[:,0], X[:,1])
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,
c='red')
plt.show()
Exercise 2 For the following code blocks and plots, run the code first; then provide your
interpretation/explanation for the required parts.
k-means on digits
We will attempt to use k-means to try to identify similar digits without using the original
label information; this might be similar to a first step in extracting meaning from a new
dataset about which you don't have any a priori label information.
We will start by loading the digits and then finding the k-Means clusters. The digits
consist of 1,797 samples with 64 features, where each of the 64 features is the
brightness of one pixel in an 8×8 image.
import seaborn as sns; sns.set() # for plot styling
from sklearn.datasets import load_digits
digits = load_digits()
digits.data.shape
## Provide your interpretation/explanation for the following block
#
kmeans = KMeans(n_clusters=10, random_state=0)
clusters = kmeans.fit_predict(digits.data)
kmeans.cluster_centers_.shape
## Provide your interpretation/explanation for the following block
#
fig, ax = plt.subplots(2, 5, figsize=(8, 3))
centers = kmeans.cluster_centers_.reshape(10, 8, 8)
for axi, center in zip(ax.flat, centers):
axi.set(xticks=[], yticks=[])
axi.imshow(center, interpolation='nearest', cmap=plt.cm.binary)
from scipy.stats import mode
labels = np.zeros_like(clusters)
for i in range(10):
mask = (clusters == i)
labels[mask] = mode(digits.target[mask])[0]
from sklearn.metrics import accuracy_score
accuracy_score(digits.target, labels)
## Provide your interpretation/explanation for the following block
#
from sklearn.metrics import confusion_matrix
mat = confusion_matrix(digits.target, labels)
sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,
xticklabels=digits.target_names,
yticklabels=digits.target_names)
plt.xlabel('true label')
plt.ylabel('predicted label');
______________________________________________________
Activity two
Exercise 1 What is the Apriori property (in 1 or 2 sentences). Provide a simple
example.
Exercise 2 Following is a list of five transactions that include items A, B, C, and D:
T1: {A, B, C}
T2: {A, C}
T3: {B, C}
T4: {A, D}
T5: {A, C, D}
Which itemsets satisfy the minimum support of 0.5? Need to include your deduction
process.
Hint: Given an itemset L, the “support” of L is the percentage of transactions containing
L. To meet support criteria of 0.5, you need to find the sets of transactions that show
up at least 50% of the time.
_________________________________________________________________________________
Activity three
Exercise 1 In the Income linear regression example, consider the distribution of the
outcome variable Income. It is noticed the income values tend to be highly skewed to
the right (distribution of value has a large tail to the right).
Does such a non-normally distributed outcome variable violate the general assumption
of a linear regression model? Provide your supporting arguments.
Exercise 2: Describe how logistic regression can be used as a classifier.
Exercise 3: If the probability of an event occurring is 0.4, then
a. What is the odds ratio?
b. What is the log odds ratio? __________________________________________________________________________
Activity four
Exercise 1 We have three observed points (23, 41), (67, 84), (78, 100).
Question: fit them to a linear model: Y = ?0 + ?1X.
For the ease of computation, we use residual sum of squares (RSS) as the loss function to
estimate the parameters:
Set the learning rate λ = 0.00001, the initial guess for the parameters: ?0 = 7, ?1 = 1.
You only need to provide the first three iterations using Gradient Descent.
Calculate the results manually.
Hint:
1? Compose the loss function based on the RSS formula
2? Take the partial derivative of RSS function
3? Substitute the initialized parameters to check the RSS
Iteration 1:
a) Plug ?0 = 7, ?1 = 1 into the partial derivative equations for RSS
b) Compute the step size, use the provided learning rate
c) Update the parameters (check the lecture slides for the equation)
d) Re-compute the RSS to check the loss change
Repeat Iteration 1 for another 2 iterations to see the change in trend and loss