Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Data Analytics for Accounting

Rating
-
Sold
-
Pages
18
Grade
A+
Uploaded on
06-09-2024
Written in
2024/2025

Data Analytics for Accounting

Institution
Course

Content preview

TEST BANK and SOLUTION MANUAL for Data Analytics
for Accounting, 3rd Edition by Vernon Richardson

A financial services company needs to aggregate daily stock trade data from the
exchanges into a data store. The company requires that data be streamed directly
into the data store, but also occasionally allows data to be modified using SQL. The
solution should integrate complex, analytic queries running with minimal latency.
The solution must provide a business intelligence dashboard that enables viewing of
the top contributors to anomalies in stock prices. Which solution meets the
company's requirements? - ANSWER: Use Amazon Kinesis Data Firehose to stream
data to Amazon Redshift. Use Amazon Redshift as a data source for Amazon
QuickSight to create a business intelligence dashboard.

Key points to arrive at this answer:
• Data streamed DIRECTLY to data store = Data Firehose does this.
• Integrate complex, analytic queries with min latency = Redshift, OLAP use case and
destination for firehose.
• Business intelligence dashboard = Quicksight.

A financial company hosts a data lake in Amazon S3 and a data warehouse on an
Amazon Redshift cluster. The company uses Amazon QuickSight to build dashboards
and wants to secure access from its on-premises Active Directory to Amazon
QuickSight.How should the data be secured? - ANSWER: Use an Active Directory
connector and single sign-on (SSO) in a corporate network environment.

A real estate company has a mission-critical application using Apache HBase in
Amazon EMR. Amazon EMR is configured with a single master node. The company
has over 5 TB of data stored on an Hadoop Distributed File System (HDFS). The
company wants a cost-effective solution to make its HBase data highly available.
Which architectural pattern meets company's requirements? - ANSWER: Store the
data on an EMR File System (EMRFS) instead of HDFS and enable EMRFS consistent
view. Create a primary EMR HBase cluster with multiple master nodes. Create a
secondary EMR HBase read-replica cluster in a separate Availability Zone. Point both
clusters to the same HBase root directory in the same Amazon S3 bucket.

A software company hosts an application on AWS, and new features are released
weekly. As part of the application testing process, a solution must be developed that
analyzes logs from each Amazon EC2 instance to ensure that the application is
working as expected after each deployment. The collection and analysis solution
should be highly available with the ability to display new information with minimal
delays. Which method should the company use to collect and analyze the logs? -
ANSWER: Use the Amazon Kinesis Producer Library (KPL) agent on Amazon EC2 to
collect and send data to Kinesis Data Firehose to further push the data to Amazon
Elasticsearch Service and Kibana.

,KDF data sources: Kinesis SDK, Cloud watch logs & events, Kinesis Agent, KPL, Kinesis
Streams. KDF outputs to S3, Redshift, ElasticSearch, and Kinesis Data Analytics

Kinesis Data Stream is always a polling service, consumers poll from KDS. Consumers
include KCL, Lambda, kinesis streams, kinesis analytics.

A data analyst is using AWS Glue to organize, cleanse, validate, and format a 200 GB
dataset. The data analyst triggered the job to run with the Standard worker type.
After 3 hours, the AWS Glue job status is still RUNNING. Logs from the job run show
no error codes. The data analyst wants to improve the job execution time without
overprovisioning. Which actions should the data analyst take? - ANSWER: Enable job
metrics in AWS Glue to estimate the number of data processing units (DPUs). Based
on the profiled metrics, increase the value of the maximum capacity job parameter.

A company has a business unit uploading .csv files to an Amazon S3 bucket. The
company's data platform team has set up an AWS Glue crawler to do discovery, and
create tables and schemas. An AWS Glue job writes processed data from the created
tables to an Amazon Redshift database. The AWS Glue job handles column mapping
and creating the Amazon Redshift table appropriately. When the AWS Glue job is
rerun for any reason in a day, duplicate records are introduced into the Amazon
Redshift table. Which solution will update the Redshift table without duplicates
when jobs are rerun? - ANSWER: Modify the AWS Glue job to copy the rows into a
staging table. Add SQL commands to replace the existing rows in the main table as
postactions in the DynamicFrameWriter class.

A streaming application is reading data from Amazon Kinesis Data Streams and
immediately writing the data to an Amazon S3 bucket every 10 seconds. The
application is reading data from hundreds of shards. The batch interval cannot be
changed due to a separate requirement. The data is being accessed by Amazon
Athena. Users are seeing degradation in query performance as time progresses.
Which action can help improve query performance? - ANSWER: Merge the files in
Amazon S3 to form larger files.

A company uses Amazon Elasticsearch Service (Amazon ES) to store and analyze its
website clickstream data. The company ingests 1 TB of data daily using Amazon
Kinesis Data Firehose and stores one day's worth of data in an Amazon ES cluster.
The company has very slow query performance on the Amazon ES index and
occasionally sees errors from Kinesis Data Firehose when attempting to write to the
index. The Amazon ES cluster has 10 nodes running a single index and 3 dedicated
master nodes. Each data node has 1.5 TB of Amazon EBS storage attached and the
cluster is configured with 1,000 shards. Occasionally, JVMMemoryPressure errors are
found in the cluster logs. Which solution will improve the performance of Amazon
ES? - ANSWER: Decrease the number of Amazon ES shards for the index.

A manufacturing company has been collecting IoT sensor data from devices on its
factory floor for a year and is storing the data in Amazon Redshift for daily analysis. A

, data analyst has determined that, at an expected ingestion rate of about 2 TB per
day, the cluster will be undersized in less than 4 months. A long-term solution is
needed. The data analyst has indicated that most queries only reference the most
recent 13 months of data, yet there are also quarterly reports that need to query all
the data generated from the past 7 years. The chief technology officer (CTO) is
concerned about the costs, administrative effort, and performance of a long-term
solution. Which solution should the data analyst use to meet these requirements? -
ANSWER: Create a daily job in AWS Glue to UNLOAD records older than 13 months
to Amazon S3 and delete those records from Amazon Redshift. Create an external
table in Amazon Redshift to point to the S3 location. Use Amazon Redshift Spectrum
to join to data that is older than 13 months.

An insurance company has raw data in JSON format that is sent without a predefined
schedule through an Amazon Kinesis Data Firehose delivery stream to an Amazon S3
bucket. An AWS Glue crawler is scheduled to run every 8 hours to update the
schema in the data catalog of the tables stored in the S3 bucket. Data analysts
analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue
Data Catalog as the metastore. Data analysts say that, occasionally, the data they
receive is stale. A data engineer needs to provide access to the most up-to-date
data. Which solution meets these requirements? - ANSWER: Run the AWS Glue
crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event
notification on the S3 bucket.

A company that produces network devices has millions of users. Data is collected
from the devices on an hourly basis and stored in an Amazon S3 data lake. The
company runs analyses on the last 24 hours of data flow logs for abnormality
detection and to troubleshoot and resolve user issues. The company also analyzes
historical logs dating back 2 years to discover patterns and look for improvement
opportunities. The data flow logs contain many metrics, such as date, timestamp,
source IP, and target IP. There are about 10 billion events every day. How should this
data be stored for optimal performance? - ANSWER: In Apache ORC partitioned by
date and sorted by source IP

A banking company is currently using an Amazon Redshift cluster with dense storage
(DS) nodes to store sensitive data. An audit found that the cluster is unencrypted.
Compliance requirements state that a database with sensitive data must be
encrypted through a hardware security module (HSM) with automated key rotation.
Which combination of steps is required to achieve compliance? (Choose two.) -
ANSWER: Set up a trusted connection with HSM using a client and server certificate
with automatic key rotation and Create a new HSM-encrypted Amazon Redshift
cluster and migrate the data to the new cluster.

A company is planning to do a proof of concept for a machine earning (ML) project
using Amazon SageMaker with a subset of existing on-premises data hosted in the
company's 3 TB data warehouse. For part of the project, AWS Direct Connect is
established and tested. To prepare the data for ML, data analysts are performing
data curation. The data analysts want to perform multiple step, including mapping,

Connected book

Written for

Course

Document information

Uploaded on
September 6, 2024
Number of pages
18
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$18.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
kushboopatel6867
5.0
(1)

Get to know the seller

Seller avatar
kushboopatel6867 Chamberlain College Nursing
Follow You need to be logged in order to follow users or courses
Sold
3
Member since
1 year
Number of followers
0
Documents
1282
Last sold
7 months ago
EXCELLENT HOMEWORK HELP AND TUTORING ,

EXCELLENT HOMEWORK HELP AND TUTORING ,ALL KIND OF QUIZ AND EXAMS WITH GUARANTEE OF A EXCELLENT HOMEWORK HELP AND TUTORING ,ALL KIND OF QUIZ AND EXAMS WITH GUARANTEE OF A Am an expert on major courses especially; psychology,Nursing, Human resource Management and Mathemtics Assisting students with quality work is my first priority. I ensure scholarly standards in my documents and that's why i'm one of the BEST GOLD RATED TUTORS in STUVIA. I assure a GOOD GRADE if you will use my work.

Read more Read less
5.0

1 reviews

5
1
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions