Exam (elaborations)

Data Mining I PYQ Solved

Rating

Sold

Pages

Grade

Uploaded on

10-04-2025

Written in

2024/2025

Data Mining I PYQ Solved, Second year 4th sem, questions and answers

Institution

Course

Content preview

### **Solutions to the Data Mining II Question Paper (4986)**

#### **Section A**

1. **(a) How does the number of clusters affect anomaly detection in k-means clustering? (2 marks)**
- In k-means clustering, the number of clusters ($ k $) directly influences anomaly detection.
- **Fewer Clusters ($ k $ is small)**: With fewer clusters, anomalies may be absorbed into larger clusters, making them less distinguishable.
This can lead to underfitting, where anomalies are not effectively identified.
- **More Clusters ($ k $ is large)**: With more clusters, anomalies are more likely to form their own cluster or be isolated as outliers. This can
improve anomaly detection but may also lead to overfitting, where noise is mistaken for meaningful patterns.
- **Optimal $ k $**: Choosing an appropriate $ k $ is crucial. Techniques like the elbow method or silhouette analysis can help determine the
optimal number of clusters for effective anomaly detection [[5]].

2. **(b) In a dataset of monthly sales figures for a retail store, the mean monthly sales are Rs. 50,000 with a standard deviation of Rs. 5,000. In a
certain month, the store recorded sales of Rs. 65,000. Calculate the z-score for this month's sales. (2 marks)**
- The formula for the z-score is:
\[
z = \frac{x - \mu}{\sigma}
\]
where:
- $ x $ = observed value (Rs. 65,000)
- $ \mu $ = mean (Rs. 50,000)
- $ \sigma $ = standard deviation (Rs. 5,000)
- Substituting the values:
\[
z = \frac{65,000 - 50,000}{5,000} = \frac{15,000}{5,000} = 3
\]
- **Answer**: The z-score for this month's sales is $ \boxed{3} $.

, 3. **(c) Consider a dataset with binary labels. The dataset is trained using Adaboost method. The decision boundary obtained after one
iteration is shown in Figure II.**
- **(i) Which points shall have higher weights? Justify your answer. (2 marks)**
- In Adaboost, misclassified points are given higher weights in subsequent iterations to focus on difficult cases. From Figure II, the points that
lie on the wrong side of the decision boundary will have higher weights. Specifically, the points marked with triangles ($ \Delta $) that are
misclassified by the current weak learner will receive higher weights.
- **Answer**: Points marked with triangles ($ \Delta $) will have higher weights because they are misclassified by the current weak learner.

- **(ii) What is overfitting in the context of classification? Name two methods to prevent it. (3 marks)**
- **Overfitting**: Overfitting occurs when a model learns the training data too well, capturing noise and irrelevant details, leading to poor
generalization on unseen data.
- **Methods to Prevent Overfitting**:
1. **Regularization**: Adds a penalty term to the loss function to constrain model complexity (e.g., L1 or L2 regularization).
2. **Cross-Validation**: Uses multiple subsets of the data for training and validation to ensure the model generalizes well.
3. **Early Stopping**: Stops training when the performance on a validation set starts to degrade.
4. **Feature Selection**: Reduces the number of input features to avoid learning noise.
- **Answer**: Overfitting is when a model performs well on training data but poorly on unseen data. Two methods to prevent it are
**regularization** and **cross-validation**.

- **(iii) Can clustering be used for dimensionality reduction? Justify your answer. (3 marks)**
- **Yes**, clustering can be used for dimensionality reduction in certain contexts. For example:
- **Prototype-based Clustering**: Algorithms like K-means can represent each cluster with a centroid, reducing the data to a smaller set of
representative points.
- **Hierarchical Clustering**: Agglomerative clustering can be used to identify groups of similar features, which can then be aggregated or
reduced.
- However, clustering is not typically designed for explicit dimensionality reduction like PCA or t-SNE. It focuses on grouping similar data
points rather than reducing feature space.
- **Answer**: Yes, clustering can be used for dimensionality reduction by representing clusters with prototypes (e.g., centroids) or aggregating
similar features.

Report Copyright Violation

Written for

Institution: University Of Delhi
Course: Data Mining

All documents for this subject (1)

Document information

Uploaded on: April 10, 2025
Number of pages: 10
Written in: 2024/2025
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

data mining i pyq solved

$8.49

Get access to the full document:

Written by students who passed

Immediately available after payment

Read online or as PDF

Get to know the seller

sumonabiswas

Get to know the seller

sumonabiswas Delhi University

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller sumonabiswas. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $8.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 49643 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Data Mining I PYQ Solved

Content preview

Written for

Document information

Subjects

Get to know the seller

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Working on your references?

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?