QUESTIONS WITH ANSWERS GRADED A+
◍ Who should be included as stakeholders in an analytics project?- Anyone
who will benefit from the project- Anyone who has relevant skills- Anyone
who is available to participate- Anyone who is a manager in the
organization.
Answer: Anyone who will benefit from the project.
◍ naive bayes analysis.
Answer: a supervised machine learning algorithm, which is used for
classification tasks, like text classification.
◍ RAID.
Answer: Data storage virtualization technology that combines multiple
physical disk drive components into a single logical unit for the purposes of
data redundancy, performance improvement, or both
◍ Which task is the data analyst responsible for within a data analysis
project?- Developing and implementing software applications- Conducting
statistical analyses and generating reports- Creating the project's overall
goals and objectives- Collecting, cleaning, and loading customer data into a
data warehouse.
Answer: Conducting statistical analyses and generating reports
◍ Intrusion Prevention.
Answer: Alarms and takes actions when malicious events occur
◍ Input Validations Attacks.
Answer: When an attacker purposefully sends strange inputs to confuse a
web application. Input validation routines serve as the first line of defence
for such attacks. Examples of input validation attacks include buffer
overflow, directory traversal, cross-site scripting and SQL injection.
,◍ What function can be used to fit a nonlinear line to the data?.
Answer: The nls function fits a non-linear model to the data using a formula
that defines the relationship between the dependent and independent
variables
◍ Which programming language is primarily used for statistical analysis and
data manipulation in the model planning phase?- Ruby- R- Swift-
MATLAB.
Answer: R
◍ Nonrepudiation.
Answer: A situation in which sufficient evidence exists as to prevent an
individual from successfully denying that he or she has made a statement, or
taken an action
◍ Sever-side Attacks.
Answer: Lack of input validationImproper or inadequate
permissionsExtraneous files
◍ How many levels of fdata are contained in the following R code? data= c(1,
2, 2, 3, 1, 2, 3, 3, 1, 2, 3, 3, 1)Fdata = factor(data).
Answer: There are three levels of fdata in the following code (imagine they
are given the values of 1= "small", 2= "medium", and 3= "large".{generates
levels to indicate which group each data point belongs to} levels <-
gl(row(dataset)) levels Levels: 1 2 3
◍ Kismet.
Answer: A tool used to detect unauthorized wireless access points.A sniffer
that specializes in detecting wireless devices
◍ Authenticity.
Answer: Attribution as to the owner or creator of the data in
question.Authenticity can be enforced through the use of digital signatures.
◍ A company wants to predict the likelihood of a customer responding to a
marketing campaign. The data set contains both numerical and categorical
, variables.Which analytics technique should the company use?Logistic
regressionK-means clusteringRandom forestPrincipal component analysis
(PCA).
Answer: Logistic regressionLogistic regression is a suitable technique for
binary classification problems, such as predicting the likelihood of a
customer responding to a marketing campaign when the dataset contains
numerical and categorical variables.
◍ Possession or Control.
Answer: Refers to the physical disposition of the media on which the data is
stored. This enables us, without involving other factors such as availability,
to discuss our loss of the data in its physical mediumAn example is data
store be on multiple devices and there could be numerous versions.
◍ Cryptographic Attacks.
Answer: Exploiting the security of a cryptographic system by finding a
weakness in a code, cipher, cryptographic protocol or key management
scheme
◍ Describe the challenges of the current analytical architecture for data
scientists..
Answer: Because new data sources slowly accumulate in the EDW due to
the rigorous validation and data structuring process, data is slow to move
into the EDW, and the data schema is slow to change. -High value data is
hard to reach and leverage -Data moves in batches from EDW to local
analytical tools => DS are limited to performing in-memory data analytics
(R, SAS, SPSS). Which resitricts the size of the datasets that can be used
-DS projects remain isolated and ad-hoc rather than centrally managed =>
the organisation can never harness the power of advanced analytics in a
scalable way.
◍ Which data sources would be most relevant for analyzing factors affecting
patient satisfaction in a healthcare company?- Web log data, call-center
records, and survey responses- Printing press run records, noise levels, and
census data- Credit card charge records, telephone call detail records, and
, point-of-sale data- Warranty claims, weather data, and economic data.
Answer: Web log data, call-center records, and survey responses
◍ p-value.
Answer: The probability level which forms basis for deciding if results are
statistically significant (not due to chance).
◍ What is the advantage of using a decision tree over a linear regression model
in a data analytics project?Decision trees are faster and require fewer
computational resources.Decision trees can produce more accurate
predictions.Decision trees can handle missing data more
effectively.Decision trees can handle nonlinear relationships between
variables..
Answer: Decision trees can handle nonlinear relationships between
variables.Decision trees can model complex, nonlinear relationships
between variables, while linear regression models are limited to linear
relationships.
◍ Which question of interest is appropriate for a data analytics project to
increase a store's sales?What are the store's best-selling products?Should the
store expand to a new location?Which customer segments will most likely
respond to a marketing campaign?How can the store's social media presence
be improved?.
Answer: Which customer segments will most likely respond to a marketing
campaign?
◍ Which regression model is commonly used for predicting a continuous
numerical outcome based on a set of input features?- Polynomial regression-
Random forest regression- Logistic regression- Linear regression.
Answer: Linear regression
◍ Attribute-based Access Control (ABAC).
Answer: Model of access control that is, logically, based on attributes from a
particular person, of a resource, or of an environment.Example:VPN
connection is set to timeout after a certain time