Homework answers / question archive / Activity one Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps

Activity one Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps

Computer Science

Share With

Activity one

Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps.

The following is a simple demonstration of using WSS to decide and plot the clusters

based on k-means clusters algorithm.

%% Import the necessary packages

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

from sklearn.datasets.samples_generator import make_blobs

from sklearn.cluster import KMeans

%% Generate 6 artificial clusters for illustration purpose

%% Hint: you may need to use make_blobs and scatter functions: check the Python

%% official resources for more information of their usages

Insert your code block here

%% Implement the WSS method and check through the number of clusters from 1

%% to 12, and plot the figure of WSS vs. number of clusters.

%% Hint: reference the plots in the lecture slides;

%% You may need to use inertia_ from property WCSS, and kmeans function

wcss = []

for i in range(1, 12):

kmeans = KMeans(n_clusters=i, init='k-means++', max_iter=300, n_init=10,

random_state=0)

Insert your code block here

%% Categorize the data using the optimum number of clusters (6)

%% we determined in the last step. Plot the fitting results

%% Hint: you may need to call fit_predict from kmeans; scatter

kmeans = KMeans(n_clusters=6, init='k-means++', max_iter=300, n_init=10,

random_state=0)

Insert your code block here

plt.scatter(X[:,0], X[:,1])

plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300,

c='red')

plt.show()

Exercise 2 For the following code blocks and plots, run the code first; then provide your

interpretation/explanation for the required parts.

k-means on digits

We will attempt to use k-means to try to identify similar digits without using the original

label information; this might be similar to a first step in extracting meaning from a new

dataset about which you don't have any a priori label information.

We will start by loading the digits and then finding the k-Means clusters. The digits

consist of 1,797 samples with 64 features, where each of the 64 features is the

brightness of one pixel in an 8×8 image.

import seaborn as sns; sns.set() # for plot styling

from sklearn.datasets import load_digits

digits = load_digits()

digits.data.shape

## Provide your interpretation/explanation for the following block

kmeans = KMeans(n_clusters=10, random_state=0)

clusters = kmeans.fit_predict(digits.data)

kmeans.cluster_centers_.shape

## Provide your interpretation/explanation for the following block

fig, ax = plt.subplots(2, 5, figsize=(8, 3))

centers = kmeans.cluster_centers_.reshape(10, 8, 8)

for axi, center in zip(ax.flat, centers):

axi.set(xticks=[], yticks=[])

axi.imshow(center, interpolation='nearest', cmap=plt.cm.binary)

from scipy.stats import mode

labels = np.zeros_like(clusters)

for i in range(10):

mask = (clusters == i)

labels[mask] = mode(digits.target[mask])[0]

from sklearn.metrics import accuracy_score

accuracy_score(digits.target, labels)

## Provide your interpretation/explanation for the following block

from sklearn.metrics import confusion_matrix

mat = confusion_matrix(digits.target, labels)

sns.heatmap(mat.T, square=True, annot=True, fmt='d', cbar=False,

xticklabels=digits.target_names,

yticklabels=digits.target_names)

plt.xlabel('true label')

plt.ylabel('predicted label');

______________________________________________________

Activity two

Exercise 1 What is the Apriori property (in 1 or 2 sentences). Provide a simple

example.

Exercise 2 Following is a list of five transactions that include items A, B, C, and D:

T1: {A, B, C}

T2: {A, C}

T3: {B, C}

T4: {A, D}

T5: {A, C, D}

Which itemsets satisfy the minimum support of 0.5? Need to include your deduction

process.

Hint: Given an itemset L, the “support” of L is the percentage of transactions containing

L. To meet support criteria of 0.5, you need to find the sets of transactions that show

up at least 50% of the time.

_________________________________________________________________________________

Activity three

Exercise 1 In the Income linear regression example, consider the distribution of the

outcome variable Income. It is noticed the income values tend to be highly skewed to

the right (distribution of value has a large tail to the right).

Does such a non-normally distributed outcome variable violate the general assumption

of a linear regression model? Provide your supporting arguments.

Exercise 2: Describe how logistic regression can be used as a classifier.

Exercise 3: If the probability of an event occurring is 0.4, then

a. What is the odds ratio?

b. What is the log odds ratio? __________________________________________________________________________

Activity four

Exercise 1 We have three observed points (23, 41), (67, 84), (78, 100).

Question: fit them to a linear model: Y = ?0 + ?1X.

For the ease of computation, we use residual sum of squares (RSS) as the loss function to

estimate the parameters:

Set the learning rate λ = 0.00001, the initial guess for the parameters: ?0 = 7, ?1 = 1.

You only need to provide the first three iterations using Gradient Descent.

Calculate the results manually.

Hint:

1? Compose the loss function based on the RSS formula

2? Take the partial derivative of RSS function

3? Substitute the initialized parameters to check the RSS

Iteration 1:

a) Plug ?0 = 7, ?1 = 1 into the partial derivative equations for RSS

b) Compute the step size, use the provided learning rate

c) Update the parameters (check the lecture slides for the equation)

d) Re-compute the RSS to check the loss change

Repeat Iteration 1 for another 2 iterations to see the change in trend and loss

pur-new-sol

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

SQL_database_analytics_activities.docx

Answer Preview

PFA

Activity one Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps

Computer Science

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions

menu

Activity one Exercise 1 Using your preferred editor (colab is recommended) to fill the snippet gaps

Computer Science

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE

Answer Preview

Download Attached File

Sitejabber (5.0)

BBC (5.0)

Trustpilot (4.9)

Google (5.0)

Related Questions