BADM 211 Exam 2
Unsupervised learning - answer- Pattern or relationship between a set of variables,
where all of them have equal status
- Goal: Segment data into meaningful segments; detect patterns- - There is no target
(outcome) variable to predict or classify
- uses machine learning algorithms to analyze and cluster unlabeled datasets
Supervised learning - answer Pattern or relationship between an outcome (target)
variable and a set of predictor variables.
Features:
- Goal: predict a single "target" or "outcome" variable
- Training data, where value is known
- Test data, where value is unknown
- Methods: classification and regression
Reinforcement learning - answer An agent learns in an environment
Supervised: Regression - answer- Goal: predict numerical target (outcome) variable
- Examples: sales, revenue, performance
- Each row is a case (customer, tax return, applicant)
- Each column is a variable
Supervised: Classification - answer- Goal: predict categorical target (outcome) variable
- Examples: purchase/no purchase, fraud/no fraud
- Each row is a case (customer, tax return, applicant)
- Each column is a variable
- Target variable is often binary (yes/no)
- Classification and regression constitute "predictive analysis"
Unsupervised learning - answer- Goal: Segment data into meaningful segments; detect
patterns
- There is no target (outcome) variable to predict or classify
- Methods: association rules, collaborative filters, and data reduction, exploration,
visualization
data.df.shape - answerDisplay # of samples and variables/rows and columns
data_df.head(10) - answerDisplay the first 10 rows of df
data_df.tail(10) - answerDisplay the last 10 rows of the df
data_df.columns - answerDisplay variable names
, data_df.dtypes - answerDisplay the variable data types
data_df.info() - answerDisplay variable sequence numbers, names, missing values, and
datatypes
Strip leading and trailing spaces and replace any remaining spaces with an underscore
_ - answerdata_df.columns = [s.strip().replace(' ', '_') for s in data_df.columns]
Create a dummy dataframe - answerd = {'animal type ': ['dog', 'cat', 'bird'],'age in years':
[1, 2, 3],'size':['6', '8', '10'],'city of residence': ['miami', 'chicago', 'london']}
df = pd.DataFrame(data = d)
df
Iloc v. loc - answeriloc = only integer numbers, second index is exclusive
loc = label-based (can be integer labels), second index is inclusive
What is dummy coding - answer- use categorical variables as independent
variables/predictors while modeling. only uses ones and zeros.
Predictor and outcome variables - answerX = predictors
y = outcome
Train v. test data - answerTrain: fit the model
Test: test model prediction and accuracy
train_X, train_y, test_X, test_y
Creating a linear regression model - answer1. load linear regression algorithm into
"model_1m"
2. Use "fit" method to fit linear regression
3. Print coefficients
4. Check performance results
Turning lists to columns - answerdf = pd.DataFrame({"animal type ": animal_type,
"age in years": age,
"size": size,
"city of residence": city})
data.df.iloc[0:4] - answerDisplay the first four rows
data_df['Personal_Income'].iloc[0:10]
data_df.iloc[0:10]['Personal_Income']
data_df.iloc[0:10].Personal_income - answerAll display first 10 rows of the variable
data_df["personal_income"].head() - answerDisplay the first 5 rows of data
Unsupervised learning - answer- Pattern or relationship between a set of variables,
where all of them have equal status
- Goal: Segment data into meaningful segments; detect patterns- - There is no target
(outcome) variable to predict or classify
- uses machine learning algorithms to analyze and cluster unlabeled datasets
Supervised learning - answer Pattern or relationship between an outcome (target)
variable and a set of predictor variables.
Features:
- Goal: predict a single "target" or "outcome" variable
- Training data, where value is known
- Test data, where value is unknown
- Methods: classification and regression
Reinforcement learning - answer An agent learns in an environment
Supervised: Regression - answer- Goal: predict numerical target (outcome) variable
- Examples: sales, revenue, performance
- Each row is a case (customer, tax return, applicant)
- Each column is a variable
Supervised: Classification - answer- Goal: predict categorical target (outcome) variable
- Examples: purchase/no purchase, fraud/no fraud
- Each row is a case (customer, tax return, applicant)
- Each column is a variable
- Target variable is often binary (yes/no)
- Classification and regression constitute "predictive analysis"
Unsupervised learning - answer- Goal: Segment data into meaningful segments; detect
patterns
- There is no target (outcome) variable to predict or classify
- Methods: association rules, collaborative filters, and data reduction, exploration,
visualization
data.df.shape - answerDisplay # of samples and variables/rows and columns
data_df.head(10) - answerDisplay the first 10 rows of df
data_df.tail(10) - answerDisplay the last 10 rows of the df
data_df.columns - answerDisplay variable names
, data_df.dtypes - answerDisplay the variable data types
data_df.info() - answerDisplay variable sequence numbers, names, missing values, and
datatypes
Strip leading and trailing spaces and replace any remaining spaces with an underscore
_ - answerdata_df.columns = [s.strip().replace(' ', '_') for s in data_df.columns]
Create a dummy dataframe - answerd = {'animal type ': ['dog', 'cat', 'bird'],'age in years':
[1, 2, 3],'size':['6', '8', '10'],'city of residence': ['miami', 'chicago', 'london']}
df = pd.DataFrame(data = d)
df
Iloc v. loc - answeriloc = only integer numbers, second index is exclusive
loc = label-based (can be integer labels), second index is inclusive
What is dummy coding - answer- use categorical variables as independent
variables/predictors while modeling. only uses ones and zeros.
Predictor and outcome variables - answerX = predictors
y = outcome
Train v. test data - answerTrain: fit the model
Test: test model prediction and accuracy
train_X, train_y, test_X, test_y
Creating a linear regression model - answer1. load linear regression algorithm into
"model_1m"
2. Use "fit" method to fit linear regression
3. Print coefficients
4. Check performance results
Turning lists to columns - answerdf = pd.DataFrame({"animal type ": animal_type,
"age in years": age,
"size": size,
"city of residence": city})
data.df.iloc[0:4] - answerDisplay the first four rows
data_df['Personal_Income'].iloc[0:10]
data_df.iloc[0:10]['Personal_Income']
data_df.iloc[0:10].Personal_income - answerAll display first 10 rows of the variable
data_df["personal_income"].head() - answerDisplay the first 5 rows of data