Welcome to this comprehensive guide where I present the answers to the Week 1 assessment of NPTEL's Introduction to Machine Learning course. This course is designed to lay the groundwork for understanding the core concepts and methodologies that drive modern machine learning applications.
Machine learning, a cornerstone of artificial intelligence, empowers computers to learn from data and make informed decisions without explicit programming. Week 1 of this course focuses on foundational principles such as data preprocessing, exploratory data analysis (EDA), and statistical techniques essential for machine learning practitioners.
Throughout this post, I will meticulously address each multiple-choice question (MCQ) from the Week 1 assignment. The answers provided are meticulously crafted based on the rigorous study of course materials, ensuring accuracy and clarity in understanding fundamental ML concepts.
Week 1 Assignment Answer |
Whether you are a student aspiring to enter the field of data science, a professional seeking to enhance your skills, or an enthusiast curious about the intricacies of machine learning, this blog aims to serve as a valuable resource. By elucidating the rationale behind each answer, I aim to facilitate a deeper comprehension of machine learning fundamentals and their practical applications.
Here are Introduction to Machine Learning Week 1 Assignment Answers
Q1. Which of the following is/are unsupervised learning problem(s)?
- Grouping documents into different categories based on their topics
- Forecasting the hourly temperature in a city based on historical temperature patterns
- Identifying close-knit communities of people in a social network
- Training an autonomous agent to drive a vehicle
- Identifying different species of animals from images
Answer: A, C
Grouping documents into different categories based on their topics
Identifying close-knit communities of people in a social network
Q2. Which of the following statement(s) about Reinforcement Learning (RL) is/are true?
- While learning a policy, the goal is to maximize the long-term reward.
- During training, the agent is explicitly provided the most optimal action to be taken in each state.
- The state of the environment changes based on the action taken by the agent.
- RL is used for building agents to play chess.
- RL is used for predicting the prices of apartments from their features.
Answer: A, C, D
While learning a policy, the goal is to maximize the long-term reward.
The state of the environment changes based on the action taken by the
agent.
RL is used for building agents to play chess.
Q3. Which of the following is/are classification tasks(s)?
- Predicting whether an email is spam or not spam
- Predicting the number of COVID cases over a given period
- Predicting the score of a cricket team
- Identifying the language of a text document
Answer: A, D
Predicting whether an email is spam or not spam
Identifying the language of a text document
Q4. Which of the following is/are regression task(s)?
- Predicting whether or not a customer will repay a loan based on their credit history
- Forecasting the amount of rainfall in a given place
- Identifying the types of crops from aerial images of farms
- Predicting the future price of a stock
Answer: B, D
Forecasting the amount of rainfall in a given place
Predicting the future price of a stock
Q5. Consider the following dataset. Fit a linear regression model of the form y=β0+β1×1+β2×2 using the mean-squared error loss. Using this model, the predicted value of y at the point (x1,x2)=(0.5,−1.0) is:
- −0.651
- −0.737
- 0.245
- −0.872
Answer: −0.737
Q6. Consider the following dataset. Using a k-nearest neighbour (k-NN) regression model with k=3, predict the value of y at (x1,x2)=(0.5,−1.0). Use the Euclidean distance to find the nearest neighbours.
- −1.762
- −2.061
- −1.930
- −1.529
Answer: −1.930
Q7. Consider the following statements regarding linear regression and k-NN regression models. Select the true statements.
- A linear regressor requires the training data points during inference.
- A k-NN regressor requires the training data points during inference.
- A k-NN regressor with a higher value of k is less prone to overfitting.
- A linear regressor partitions the input space into multiple regions such that the prediction over a given region is constant.
Answer: B, C
A k-NN regressor requires the training data points during inference.
A k-NN regressor with a higher value of k is less prone to overfitting.
Q8. Consider a binary classification problem where we are given certain measurements from a blood test and need to predict whether the patient does not have a particular disease (class 0) or has the disease (class 1). In this problem, false negatives (incorrectly predicting that the patient is healthy) have more serious consequences as compared to false positives (incorrectly predicting that the patient has the disease). Which of the following is an appropriate cost matrix for this classification problem? The row denotes the true class and the column denotes the predicted class.
- [0 0 100 0]
- [0 1 100 0]
- [0 1 1 0]
- [0 100 1 0]
- [0 100 0 0]
Answer: D. [0 1 100 0]
Q9. Consider the following dataset with three classes: 0, 1 and 2. x1 and x2 are the independent variables whereas y is the class label. Using a k-NN classifier with k = 3, predict the class label at the point (x1,x2)=(0.7,−0.8). Use the Euclidean distance to find the nearest neighbours.
- 0
- 1
- 2
- Cannot be predicted
Answer: 1
Q10. Suppose that we train two kinds of regression models corresponding to
the following equations.
(i) y=β0+β1×1+β2×2
(ii) y=β0+β1×1+β2×2+β3x1x2
Which of the following statement(s) is/are correct?
- On a given training dataset, the mean-squared error of (i) is always greater than or equal to that of (ii).
- (i) is likely to have a higher variance than (ii).
- (ii) is likely to have a higher variance than (i).
- If (ii) overfits the data, then (i) will definitely overfit.
- If (ii) underfits the data, then (i) will definitely underfit.
Answer: A, C, E
On a given training dataset, the mean-squared error of (i) is always greater
than or equal to that of (ii).
(ii) is likely to have a higher variance than (i).
If (ii) underfits the data, then (i) will definitely underfit.
Introduction to Machine Learning - July-Dec 2023 Assignment Answers
Q1. Which of the following is a supervised learning problem?
- Grouping related documents from an unannotated corpus.
- Predicting credit approval based on historical data.
- Predicting if a new image has cat or dog based on the historical data of other images of cats and dogs, where you are supplied the information about which image is cat or dog.
- Fingerprint recognition of a particular person used in biometric attendance from the fingerprint data of various other people and that particular person.
Answer: B, C, D
Q2. Which of the following are classification problems?
- Predict the runs a cricketer will score in a particular match.
- Predict which team will win a tournament.
- Predict whether it will rain today.
- Predict your mood tomorrow.
Answer: B, C, D
Q3. Which of the following is a regression task?
- Predicting the monthly sales of a cloth store in rupees.
- Predicting if a user would like to listen to a newly released song or not based on historical data.
- Predicting the confirmation probability (in fraction) of your train ticket whose current status is waiting list based on historical data.
- Predicting if a patient has diabetes or not based on historical medical records.
- Predicting if a customer is satisfied or unsatisfied from the product purchased from an ecommerce website using the reviews he/she wrote for the purchased product.
Answer: A, C
Q4. Which of the following is an unsupervised learning task?
- Group audio files based on the language of the speakers.
- Group applicants to a university based on their nationality.
- Predict a student’s performance in the final exams.
- Predict the trajectory of a meteorite.
Answer: A, B
Q5. Which of the following is a categorical feature?
- Number of rooms in a hostel.
- Gender of a person.
- Your weekly expenditure in rupees.
- Ethnicity of a person.
- Area (in sq. centimeter) of your laptop screen.
- The color of the curtains in your room.
- Number of legs an animal has.
- Minimum RAM requirement (in GB) of a system to play a game like FIFA, DOTA.
Answer: B, D, F
Q6. Which of the following is a reinforcement learning task?
- Learning to drive a cycle
- Learning to predict stock prices
- Learning to play chess
- Learning to predict spam labels for e-mails
Answer: A, C
Q7. Let X and Y be a uniformly distributed random variable over the interval [0,4] and [0,6] respectively. If X and Y are independent events, then compute the probability, P(max(X,Y)>3)
- 1/6
- 5/6
- 2/3
- 1/2
- 2/6
- 5/8
- None of the above
Answer: 5/8
Q8. Find the mean of 0-1 loss for the given predictions:
- 1
- 0
- 1.5
- 0.5
Answer: 0.5
Q9. Which of the following statements are true? Check all that apply.
- A model with more parameters is more prone to overfitting and typically has higher variance.
- If a learning algorithm is suffering from high bias, only adding more training examples may not improve the test error significantly.
- When debugging learning algorithms, it is useful to plot a learning curve to understand if there is a high bias or high variance problem.
- If a neural network has much lower training error than test error, then adding more layers will help bring the test error down because we can fit the test set better.
Answer: A, B, C
Q10. Bias and variance are given by:
- E[f^(x)]−f(x),E[(E[f^(x)]−f^(x))2]
- E[f^(x)]−f(x),E[(E[f^(x)]−f^(x))]2
- (E[f^(x)]−f(x))2,E[(E[f^(x)]−f^(x))2]
- (E[f^(x)]−f(x))2,E[(E[f^(x)]−f^(x))]2
Answer: E[f^(x)]−f(x),E[(E[f^(x)]−f^(x))2]
Session: JAN-APR 2023
Q1) Which of the following is a supervised learning problem?
- Grouping related documents from an unannotated corpus.
- Predicting credit approval based on historical data.
- Predicting rainfall based on historical data.
- Predicting if a customer is going to return or keep a particular product he/she purchased from e-commerce website based on the historical data about the customer purchases and the particular product.
- Fingerprint recognition of a particular person used in biometric attendance from the fingerprint data of various other people and that particular person.
Answer: b, c, d, e
Q2) Which of the following is not a classification problem?
- Predicting the temperature (in Celsius) of a room from other environmental features (such as atmospheric pressure, humidity, etc).
- Predicting if a cricket player is a batsman or bowler given his playing records.
- Predicting the price of a house (in INR) based on the data consisting prices of other houses (in INR) and its features such as area, number of rooms, location, etc.
- Filtering of spam messages.
- Predicting the weather for tomorrow as “hot,” “cold,” or “rainy” based on the historical data wind speed, humidity, temperature, and precipitation.
Answer: a, c
Q3) Which of the following is a regression task?
- Predicting the monthly sales of a cloth store in rupees.
- Predicting if a user would like to listen to a newly released song or not based on historical data.
- Predicting the confirmation probability (in fraction) of your train ticket whose current status is the waiting list based on historical data.
- Predicting if a patient has diabetes or not based on historical medical records.
- Predicting if a customer is satisfied or unsatisfied with the product purchased from an e-commerce website using the reviews he/she wrote for the purchased product.
Answer: a, c
Q4) Which of the following is an unsupervised task?
- Predicting if a new edible item is sweet or spicy based on the information of the ingredients, their quantities, and labels (sweet or spicy) for many other similar dishes.
- Grouping related documents from an unannotated corpus.
- Grouping of hand-written digits from their image.
- Predicting the time (in days) a PhD student will take to complete his/her thesis to earn a degree based on the historical data such as qualifications, department, institute, research area, and time taken by other scholars to earn the degree.
- All of the above.
Answer: b, c
Q5) Which of the following is a categorical feature?
- Number of rooms in a hostel.
- Minimum RAM requirement (in GB) of a system to play a game like FIFA, DOTA.
- Your weekly expenditure in rupees.
- Ethnicity of a person.
- Area (in sq. centimeters) of your laptop screen.
- The color of the curtains in your room.
Answer: d, f
Q6) Let X and Y be a uniformly distributed random variable over the interval [0, 4] and [0, 6] respectively. If X and Y are independent events, then compute the probability, P(max(X,Y)>3)
- 1/6
- 5/6
- 2/3
- 1/2
- 2/6
- 5/8
- None of the above
Answer: f. 5/8
Q7) Let the trace and determinant of a matrix A[acbd] be 6 and 16 respectively. The eigenvalues of A are
- 3+i√7/2, 3−i√7/2, where i=√−1
- 1, 3
- 3+i√7/4, 3−i√7/4, where i=√−1
- 1/2, 3/2
- 3+i√7, 3−i√7, where i=√−1
- 2, 8
- None of the above
- Can be computed only if A is a symmetric matrix.
- Cannot be computed as the entries of the matrix A are not given.
Answer: e. 3+i√7, 3−i√7, where i=√−1
Q8) What happens when your model complexity increases?
- Model Bias decreases
- Model Bias increases
- Variance of the model decreases
- Variance of the model increases
Answer: a, d
Q9) A new phone, E-Corp X1 has been announced and it is what you’ve been waiting for, all along. You decide to read the reviews before buying it. From past experiences, you’ve figured out that good reviews mean that the product is good 90% of the time and bad reviews mean that it is bad 70% of the time. Upon glancing through the reviews section, you find out that the X1 has been reviewed 1269 times and only 172 of them were bad reviews. What is the probability that, if you order the X1, it is a bad phone?
- 0.136
- 0.160
- 0.360
- 0.840
- 0.773
- 0.573
- 0.181
Answer: g. 0.181
Q10) Which of the following are false about bias and variance of overfitted and underfitted models?
- Underfitted models have high bias.
- Underfitted models have low bias.
- Overfitted models have low variance.
- Overfitted models have high variance.
Answer: b, c
Session: JUL-DEC 2022
1. Which of the following are supervised learning problems? (multiple may be correct)
- a. Learning to drive using a reward signal.
- b. Predicting disease from blood sample.
- c. Grouping students in the same class based on similar features.
- d. Face recognition to unlock your phone.
Answer: b, d
2. Which of the following are classification problems? (multiple may be correct)
- a. Predict the runs a cricketer will score in a particular match.
- b. Predict which team will win a tournament.
- c. Predict whether it will rain today.
- d. Predict your mood tomorrow.
Answer: b, c
3. Which of the following is a regression task? (multiple options may be correct)
- a. Predict the price of a house 10 years after it is constructed.
- b. Predict if a house will be standing 50 years after it is constructed.
- c. Predict the weight of food wasted in a restaurant during next month.
- d. Predict the sales of a new Apple product.
Answer: a, c, d
4. Which of the following is an unsupervised learning task? (multiple options may be correct)
- a. Group audio files based on language of the speakers.
- b. Group applicants to a university based on their nationality.
- c. Predict a student’s performance in the final exams.
- d. Predict the trajectory of a meteorite.
Answer: a, b
5. Given below is your dataset. You are using KNN regression with K=3. What is the prediction for a new input value (3, 2)?
Answer: 2.50
6. Which of the following is a reinforcement learning task? (multiple options may be correct)
Answer: a, b, c
7. Find the mean of squared error for the given predictions:
Answer: a
8. Find the mean of 0-1 loss for the given predictions:
Answer: d
9. Bias and variance are given by:
Answer: b
10. Which of the following are true about bias and variance? (multiple options may be correct)
Answer: b, d