Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Samenvatting

Summary Data Science and Machine Learning

Beoordeling
-
Verkocht
-
Pagina's
8
Geüpload op
01-01-2025
Geschreven in
2022/2023

This document provides a comprehensive summary of the Data Science and Machine Learning syllabus for the Master of Computer Applications (MCA) program under APJ Abdul Kalam Technical University. It covers key concepts and methodologies, including data preprocessing, statistical analysis, supervised and unsupervised learning techniques, neural networks, natural language processing, and deep learning. The document also emphasizes practical applications, algorithmic implementation, and the integration of tools like Python, R, and SQL for data analysis and machine learning. Designed to serve as a quick reference guide, it ensures a thorough understanding of the theoretical and practical aspects outlined in the syllabus, equipping students with the essential knowledge and skills to excel in the field.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

DATA SCIENCE & MACHINE LEARNING (part -
2)
techworldthink • March 06, 2022




4. Explain how to choose the value of k in k-NN algorithm.
K-Nearest Neighbors is the supervised machine learning algorithm used for
classification and regression. It manipulates the training data and classifies the new
test data based on distance metrics. It finds the k-nearest neighbors to the test data,
and then classification is performed by the majority of class labels.

Selecting the optimal K value to achieve the maximum accuracy of the model is
always challenging for a data scientist.

• There are no pre-defined statistical methods to find the most favorable value of K.

• Initialize a random K value and start computing.

• Choosing a small value of K leads to unstable decision boundaries.

• The substantial K value is better for classification as it leads to smoothening the
decision boundaries.

• Derive a plot between error rate and K denoting values in a defined range. Then
choose the K value as having a minimum error rate.

In KNN, finding the value of k is not easy. A small value of k means that noise will
have a higher influence on the result and a large value make it computationally
expensive.



There is no straightforward method to calculate the value of K in KNN. You have to
play around with different values to choose the optimal value of K. Choosing a right
value of K is a process called Hyperparameter Tuning.

The value of optimum K totally depends on the dataset that you are using. The best
value of K for KNN is highly data-dependent. In different scenarios, the optimum K
may vary. It is more or less hit and trail method.

, You need to maintain a balance while choosing the value of K in KNN. K should not
be too small or too large. A small value of K means that noise will have a higher
influence on the result.

Larger the value of K, higher is the accuracy. If K is too large, you are under-
fitting your model. In this case, the error will go up again. So, at the same time you
also need to prevent your model from under-fitting. Your model should retain
generalization capabilities otherwise there are fair chances that your model may
perform well in the training data but drastically fail in the real data. Larger K will
also increase the computational expense of the algorithm.

There is no one proper method of estimation of K value in KNN. No method is the
rule of thumb but you should try considering following suggestions:

1. Square Root Method: Take square root of the number of samples in the training
dataset.

2. Cross Validation Method: We should also use cross validation to find out the
optimal value of K in KNN. Start with K=1, run cross validation (5 to 10
fold), measure the accuracy and keep repeating till the results become consistent.

K=1, 2, 3... As K increases, the error usually goes down, then stabilizes, and then
raises again. Pick the optimum K at the beginning of the stable zone. This is also
called Elbow Method.

3. Domain Knowledge also plays a vital role while choosing the optimum value of K.

4. K should be an odd number.




5. Explain entropy and information gain.

Geschreven voor

Instelling
Vak

Documentinformatie

Geüpload op
1 januari 2025
Aantal pagina's
8
Geschreven in
2022/2023
Type
SAMENVATTING

Onderwerpen

$8.49
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
alexauguztine

Maak kennis met de verkoper

Seller avatar
alexauguztine St. Joseph's College of Engineering and Technology, Palai
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
-
Lid sinds
1 jaar
Aantal volgers
0
Documenten
4
Laatst verkocht
-

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen