NPTEL Introduction to Machine Learning | Week 2 Assignment Answer

Welcome to this comprehensive guide where I present the answers to the Week 2 assessment of NPTEL's Introduction to Machine Learning course. This course is designed to lay the groundwork for understanding the core concepts and methodologies that drive modern machine learning applications.

Machine learning, a cornerstone of artificial intelligence, empowers computers to learn from data and make informed decisions without explicit programming. Week 2 of this course focuses on foundational principles such as data preprocessing, exploratory data analysis (EDA), and statistical techniques essential for machine learning practitioners.

Throughout this post, I will meticulously address each multiple-choice question (MCQ) from the Week 2 assignment. The answers provided are meticulously crafted based on the rigorous study of course materials, ensuring accuracy and clarity in understanding fundamental ML concepts.

NPTEL Introduction to Machine Learning Answer

Week 2 Assignment Answer

Whether you are a student aspiring to enter the field of data science, a professional seeking to enhance your skills, or an enthusiast curious about the intricacies of machine learning, this blog aims to serve as a valuable resource. By elucidating the rationale behind each answer, I aim to facilitate a deeper comprehension of machine learning fundamentals and their practical applications.

Here are Introduction to Machine Learning Week 2 Assignment Answers

Q1. Which of the following is/are unsupervised learning problem(s)?

Grouping documents into different categories based on their topics
Forecasting the hourly temperature in a city based on historical temperature patterns
Identifying close-knit communities of people in a social network
Training an autonomous agent to drive a vehicle
Identifying different species of animals from images

Answer: A, C
Grouping documents into different categories based on their topics
Identifying close-knit communities of people in a social network

Q1. The parameters obtained in linear regression

can take any value in the real space
are strictly integers
always lie in the range [0,1]
can take only non-zero values

Answer: can take any value in the real space

Q2. Suppose that we have N independent variables (X1, X2, …, Xn) and the dependent variable is Y. Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation of X1 with Y is -0.005.

Regressing Y on X1 mostly does not explain away Y
Regressing Y on X1 explains away Y
The given data is insufficient to determine if regressing Y on X1 explains away Y or not
None of the above

Answer: The given data is insufficient to determine if regressing Y on X1 explains away Y or not

Q3. Consider the following 4 training examples:

We want to learn a function f(x)=ax+b which is parametrized by (a,b). Using squared error as the loss function, which of the following parameters would you use to model this function?
(1,1)
(1,2)
(2,1)
(2,2)

Answer: (1,1)

Q4. The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be:

Grade = 30.5 + 15.2(h)
How will a student’s grade be affected if she studies for four hours, compared to not studying?
It will go down by 30.4 points
It will go up by 60.8 points
The grade will remain unchanged
It cannot be determined from the information given

Answer: It will go up by 60.8 points

Q5. Which of the statements is/are True?

Ridge has sparsity constraint, and it will drive coefficients with low values to 0.
Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.
Ridge regression may reduce the number of variables.
If there are two or more highly collinear variables, Lasso will select one of them randomly.

Answer: Ridge regression may reduce the number of variables.

Q6. Consider the following statements:

Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.
Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then.

Answer: Both the statements are False

Q7. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1’s), Y be N×1 vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap ). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?

XTXθ=X^TY
XTX=XY
XTY=X^TY
XTY=XTY

Answer: XTXθ=X^TY

Q8. Choose the correct option(s) from the following:

When working with a small dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers
When working with a small dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiers
When working with a large dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiers
When working with a large dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers

Answer: When working with a small dataset, one should prefer high bias/low variance classifiers over low bias/high variance classifiers When working with a large dataset, one should prefer low bias/high variance classifiers over high bias/low variance classifiers

Q9. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1’s), Y be N×1 vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap ). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?

XTXθ=X^TY
XTX=XY
XTY=X^TY
XTY=XTY

Answer: XTXθ=X^TY

Session: JULY-DEC 2023

Q1. The parameters obtained in linear regression

can take any value in the real space
are strictly integers
always lie in the range [0,1]
can take only non-zero values

Answer:can take any value in the real space

Q2. Suppose that we have N independent variables (X1,X2,…Xn) and the dependent variable is Y. Now imagine that you are applying linear regression by fitting the best fit line using the least square error on this data. You found that the correlation coefficient for one of its variables (Say X1) with Y is -0.005.

Regressing Y on X1 mostly does not explain away Y
Regressing Y on X1 explains away Y
The given data is insufficient to determine if regressing Y on X1 explains away Y or not

Answer: Regressing Y on X1 mostly does not explain away Y
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q3. Which of the following is a limitation of subset selection methods in regression?

They tend to produce biased estimates of the regression coefficients.
They cannot handle datasets with missing values.
They are computationally expensive for large datasets.
They assume a linear relationship between the independent and dependent variables.
They are not suitable for datasets with categorical predictors.

Answer: They are computationally expensive for large datasets.
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q4. The relation between studying time (in hours) and grade on the final examination (0-100) in a random sample of students in the Introduction to Machine Learning Class was found to be: Grade = 30.5 + 15.2 (h)

It will go down by 30.4 points.
It will go down by 30.4 points.
It will go up by 60.8 points.
The grade will remain unchanged.
It cannot be determined from the information given.

Answer: It will go up by 60.8 points.
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q5. Which of the statements is/are True?

Ridge has sparsity constraint, and it will drive coefficients with low values to 0.
Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.
Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.
If there are two or more highly collinear variables, Lasso will select one of them randomly.

Answer: a, d
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q6. Find the mean of squared error for the given predictions:
1
2
1.5
0

Answer: 1
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q7. Consider the following statements:

Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.
Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then.

Answer: Both the statements are False.
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q8. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1‘s), Y be N×1 vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap).
If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?

XTX=XY
Xθ=XTY
XTXθ=Y
XTXθ=XTY

Answer: XTXθ=XTY
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q9. Which of the following statements is true regarding Partial Least Squares (PLS) regression?

PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.
PLS is only applicable when there is no multicollinearity among the independent variables.
PLS can handle situations where the number of predictors is larger than the number of observations.
PLS estimates the regression coefficients by minimizing the residual sum of squares.
PLS is based on the assumption of normally distributed residuals.
All of the above.
None of the above.

Answer: PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.
These are Introduction to Machine Learning Week 2 Assignment 2 Answers

Q10. Which of the following statements about principal components in Principal Component Regression (PCR) is true?

Principal components are calculated based on the correlation matrix of the original predictors.
The first principal component explains the largest proportion of the variation in the dependent variable.
Principal components are linear combinations of the original predictors that are uncorrelated with each other.
PCR selects the principal components with the highest p-values for inclusion in the regression model.
PCR always results in a lower model complexity compared to ordinary least squares regression.

Answer: Principal components are linear combinations of the original predictors that are uncorrelated with each other.

Session: JAN-APR 2023

Q1. The parameters obtained in linear regression

can take any value in the real space
are strictly integers
always lie in the range [0,1]
can take only non-zero values

Answer: can take any value in the real space

Regressing Y on X1 mostly does not explain away Y.
Regressing Y on X1 explains away Y.
The given data is insufficient to determine if regressing Y on X1 explains away Y or not.

Answer: Regressing Y on X1 mostly does not explain away Y.

Q3. Which of the following is a limitation of subset selection methods in regression?

They tend to produce biased estimates of the regression coefficients.
They cannot handle datasets with missing values.
They are computationally expensive for large datasets.
They assume a linear relationship between the independent and dependent variables.
They are not suitable for datasets with categorical predictors.

Answer: They are computationally expensive for large datasets.

It will go down by 30.4 points.
It will go up by 60.8 points.
The grade will remain unchanged.
It cannot be determined from the information given.

Answer: It will go up by 60.8 points.

Q5. Which of the statements is/are True?

Ridge has sparsity constraint, and it will drive coefficients with low values to 0.
Lasso has a closed form solution for the optimization problem, but this is not the case for Ridge.
Ridge regression does not reduce the number of variables since it never leads a coefficient to zero but only minimizes it.
If there are two or more highly collinear variables, Lasso will select one of them randomly.

Answer: a, d

Q6. Find the mean of squared error for the given predictions: 1, 2, 1.5, 0

Answer: 1

Q7. Consider the following statements:

Statement A: In Forward stepwise selection, in each step, that variable is chosen which has the maximum correlation with the residual, then the residual is regressed on that variable, and it is added to the predictor.
Statement B: In Forward stagewise selection, the variables are added one by one to the previously selected variables to produce the best fit till then.

Answer: Both the statements are False.

Q8. The linear regression model y=a0+a1x1+a2x2+…+apxp is to be fitted to a set of N training data points having p attributes each. Let X be N×(p+1) vectors of input values (augmented by 1‘s), Y be N×1 vector of target values, and θ be (p+1)×1 vector of parameter values (a0,a1,a2,…,ap). If the sum squared error is minimized for obtaining the optimal regression model, which of the following equation holds?

XTX=XY
Xθ=XTY
XTXθ=Y
XTXθ=XTY

Answer: XTXθ=XTY

Q9. Which of the following statements is true regarding Partial Least Squares (PLS) regression?

PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.
PLS is only applicable when there is no multicollinearity among the independent variables.
PLS can handle situations where the number of predictors is larger than the number of observations.
PLS estimates the regression coefficients by minimizing the residual sum of squares.
PLS is based on the assumption of normally distributed residuals.
All of the above.
None of the above.

Answer: PLS is a dimensionality reduction technique that maximizes the covariance between the predictors and the dependent variable.

Q10. Which of the following statements about principal components in Principal Component Regression (PCR) is true?

Principal components are calculated based on the correlation matrix of the original predictors.
The first principal component explains the largest proportion of the variation in the dependent variable.
Principal components are linear combinations of the original predictors that are uncorrelated with each other.
PCR selects the principal components with the highest p-values for inclusion in the regression model.
PCR always results in a lower model complexity compared to ordinary least squares regression.

Answer: Principal components are linear combinations of the original predictors that are uncorrelated with each other.