Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

BUAL5660 EXAM 2 QUESTIONS AND ANSWERS WITH COMPLETE SOLUTIONS

Beoordeling
-
Verkocht
-
Pagina's
6
Cijfer
A+
Geüpload op
16-04-2025
Geschreven in
2024/2025

BUAL5660 EXAM 2 QUESTIONS AND ANSWERS WITH COMPLETE SOLUTIONS Leave the first rating Terms in this set (67) Primary data researchers collect their own data; driven by theory/hypothesis; examples -- surveys, interviews, experiments, and direct observations; complete control on data collection; can control for external conditions; mostly confirmatory analysis; traditional statistics methods Secondary data somebody else already collected the data; driven by broader topic of study; examples -- customer data, census, etc.; not in control; cannot control additional variables; mostly exploratory analysis; data analytics techniques API an application-programming interface is a set of programming instructions and standards for accessing a Web-based software application or web tool; a software company releases its API to the public so that other software developers can design products that are powered by its service; software-to-software interface, not a user interface; ex: payment for a movie ticket Rest API (Representational State Transfer). A software architectural style for implementing web services. historical data; need authorization from the company Streaming API access to live inputs and automatically receive new info without requesting this again. Public REST API Google Maps, FLICKR, YouTube, Amazon Product Advertising, Wikipedia, LinkedIn, Facebook Web Crawling or Spidering "beautiful soup"; a search engine employs special software robots, called spiders, to build lists of the words found on Web sites. When a spider is building its lists, the process is called Web crawling. The spider will begin with a popular site, indexing the words on its pages and following every link found within the site. The spidering system quickly begins to travel, spreading out across the most widely used portions of the Web. Words occurring in the title, subtitles, meta tags and other positions of relative importance were noted for special consideration during a subsequent user search. Problems with web scraping Your IP may get banned by the website; denial of service attacks; data behind the login wall RSS Feed in KNIME Excel Reader -- RSS Feed Reader Supervised learning Target variable/Dependent Variable known and present Unsupervised learning Target variable/Dependent Variable NOT known goal is to extract relationships between variables; clusters Text Analytics information retrieval + information extraction + data mining + web mining Text Mining "Knowledge discovery in textual data" 85-90% of all corporate data is in some kind of unstructured form (e.g., text); unstructured corporate data is doubling in size every 18 months; tapping into these information sources is not an option, but a need to stay competitive a semi-automated process of extracting knowledge from unstructured data sources, aka text data mining or knowledge discovery in textual databases Benefits of text mining are obvious especially in text-rich data environments [ law (court orders), academic research (research articles), finance (quarterly reports), medicine (discharge summaries), biology (molecular interactions), technology (patent files), marketing (customer comments)] Electronic communication records (spam filtering, email prioritization and categorization, automatic response generation) information extraction, topic tracking, summarization, categorization, clustering, concept linking, question answering Data Mining vs. Text Mining Both seek for novel and useful patterns Both are semi-automated processes structured data in databases unstructured data Word documents, PDF files, text excerpts, XML files, and so on document unit of analysis corpus The collection of documents, required for text analysis terms words that you analyze in the document concepts the collection of words that you analyze

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

4/15/25, 11:16
AM


BUAL5660 EXAM 2 QUESTIONS AND ANSWERS WITH
COMPLETE SOLUTIONS
Leave the first rating

Save




Terms in this set (67)


researchers collect their own data; driven by theory/hypothesis; examples --
surveys, interviews, experiments, and direct observations; complete control on
Primary data
data collection; can control for external conditions; mostly confirmatory analysis;
traditional statistics methods

somebody else already collected the data; driven by broader topic of study;
Secondary data examples -- customer data, census, etc.; not in control; cannot control additional
variables; mostly exploratory analysis; data analytics techniques

an application-programming interface is a set of programming instructions and
standards for accessing a Web-based software application or web tool; a
API software company releases its API to the public so that other software developers
can design products that are powered by its service; software-to-software
interface, not a user interface; ex: payment for a movie ticket

(Representational State Transfer). A software architectural style for implementing
Rest API
web services. historical data; need authorization from the company

access to live inputs and automatically receive new info without requesting this
Streaming API
again.

Google Maps, FLICKR, YouTube, Amazon Product Advertising, Wikipedia,
Public REST API
LinkedIn, Facebook

"beautiful soup"; a search engine employs special software robots, called spiders,
to build lists of the words found on Web sites. When a spider is building its lists,
the process is called Web crawling. The spider will begin with a popular site,
indexing the words on its pages and following every link found within the site. The
Web Crawling or Spidering
spidering system quickly begins to travel, spreading out across the most widely
used portions of the Web. Words occurring in the title, subtitles, meta tags and
other positions of relative importance were noted for special consideration during
a subsequent user search.

Your IP may get banned by the website; denial of service attacks; data behind the
Problems with web scraping
login wall

in KNIME
RSS Feed
Excel Reader --> RSS Feed Reader

Supervised learning Target variable/Dependent Variable known and present

Target variable/Dependent Variable NOT known
Unsupervised learning
goal is to extract relationships between variables; clusters

Text Analytics information retrieval + information extraction + data mining + web mining




1/
6

, 4/15/25, 11:16
AM
"Knowledge discovery in textual data"


85-90% of all corporate data is in some kind of unstructured form (e.g., text);
unstructured corporate data is doubling in size every 18 months; tapping into
these information sources is not an option, but a need to stay
competitive


a semi-automated process of extracting knowledge from unstructured data
sources, aka text data mining or knowledge discovery in textual databases
Text Mining
Benefits of text mining are obvious especially in text-rich data environments [ law
(court orders), academic research (research articles), finance (quarterly reports),
medicine (discharge summaries), biology (molecular interactions), technology
(patent files), marketing (customer comments)]


Electronic communication records (spam filtering, email prioritization and
categorization, automatic response generation)


information extraction, topic tracking, summarization, categorization, clustering,
concept linking, question answering
Both seek for novel and useful patterns
Data Mining vs. Text Mining
Both are semi-automated processes

structured data in databases

unstructured data Word documents, PDF files, text excerpts, XML files, and so on

document unit of analysis

corpus The collection of documents, required for text analysis

terms words that you analyze in the document

concepts the collection of words that you analyze

In keyword searching, word endings are automatically removed (lines becomes
stemming
line);

In database searching, "stop words" are small and frequently occurring words like
and, or, in, of that are often ignored when keyed as search terms. Sometimes
stop words
putting them in quotes " " will allow you to search them. Words that you do not
need for your analysis

synonyms words that have similar meanings

Words with the same and a related meaning e.g. "foot" at the bottom of you leg
polysemes
and "foot" of a mountain

tokenization the process of breaking up a given text into units called tokens

remove inflectional endings only and to return the base or dictionary form of a
lemmatization
word

collection of terms specific to a narrow field that can be used to restrict the
term dictionary
extracted terms within a corpus

word frequency The frequency with which a word appears in a language is called

The process of marking up the words in a text as corresponding to a particular
part-of-speech tagging
part of speech based on a word's definition and context of its use




2/
6

Geschreven voor

Vak

Documentinformatie

Geüpload op
16 april 2025
Aantal pagina's
6
Geschreven in
2024/2025
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

$11.49
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kun je een ander document kiezen. Je kunt het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF


Ook beschikbaar in voordeelbundel

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
NurseAdvocate chamberlain College of Nursing
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
495
Lid sinds
2 jaar
Aantal volgers
77
Documenten
12046
Laatst verkocht
2 dagen geleden
NURSE ADVOCATE

I have solutions for following subjects: Nursing, Business, Accounting, statistics, chemistry, Biology and all other subjects. Nursing Being my main profession line, I have essential guides that are Almost A+ graded, I am a very friendly person: If you would not agreed with my solutions I am ready for refund

4.6

239 beoordelingen

5
193
4
14
3
15
2
6
1
11

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Bezig met je bronvermelding?

Maak nauwkeurige citaten in APA, MLA en Harvard met onze gratis bronnengenerator.

Bezig met je bronvermelding?

Veelgestelde vragen