BIOL 2600 Midterm Exam Latest Fall-Spring Verified
Exam With Answers 100% Complete
Why do we need computation - ANSWER: - Massive increase in data volume in the
last decades - to develop more accurate models of biological models
- Reproducibility and replicability - open access to data and its code
- Data driven hypothesis development - need good questions
Bioinformatics - ANSWER: is concerned with the acquisition, storage, analysis, and
dissemination of biological data, most often DNA and AA sequences (methods for
handling data)
- NOT the same as computational biology
Computational biology - ANSWER: is the science of using biological data to
developed algorithms or models in order to understand biological systems and
relationships
- Goal is biological insight by computational methods - using math to understand
biology (evolution, epidemiology)
System - ANSWER: a set of interacting or interdependent components forming an
integrated whole
- Described by their function or by their structure
- Systems have emergent properties arising from the interaction of components
- Biological systems range in scale from ---> cells, tissues, organs
- Biological systems are tricky to study because they have many parts and many
interactions/ processes occurring on different scales (spatial and temporal)
Biology - ANSWER: is the study of emergent properties of systems
- In the past, many scientists had a reductionist view
- When a complex system is observed to have properties that its parts do not have
on their own, this is described as an emergent property of the system
- We use computational biology to try to model emergent features of biological
systems
One such task in biology is to predict function from structure - which usually starts
with a "parts list" - ANSWER: Going from parts to structure to function is HARD - but
the opposite way is easier
- What would make going from a list of parts to a structure easier? --> organization
and grouping
When is a puzzle easy/ hard? - ANSWER: - Knowing what the pieces are part of
- Number of components (fewer is easier)
- Distinct classes of components
Knowing the rules governing interactions
- Having landmarks or a chassis
- Completness of the set
,- Contamination of the set
- Stability of the set (RNA degrades)
Problem faced by biology - ANSWER: biological systems are made up of 10s of 1000s
of components - acting on different scales, interacting in a multiplicity of ways, to
generate a diversity of states
What is computation? - ANSWER: - Computation is any type of calculation that
includes both arithmetical and non-arithmetical steps and which follows a well-
defined model (Ex. an algorithm)
Algorithm - ANSWER: is an ordered and finite set of operations that much be
followed in order to solve a problem
- Chain of instructions - ordered
- Defined objective
- Finite (it will stop when problem is solved)
Flowcharts - ANSWER: Useful way of representing algorithms
>Several conventions or rules for using flowcharts:
= Terminal - start and end (ovals)
= Input/ output - variables (parallelogram)
= Processing - math of algorithm (rectangle)
= Decision - equal to, greater than, less than, TRUE/FALSE
Second computational biology - ANSWER: - Computational biology involves the
development and application of data-analytical and theorectical methods,
mathematical modelling and computational simulation techniques to the study of
biological, ecological, behavioural, and social systems
- If we aim to understand how a system works, we must collect comprehensive data
Data (singular, datum) - ANSWER: - Data are units of information
- Data are a set of values of qualitative or quantitative variables about the attributes
of one or more persons or objects
- Attributes and variables
Purpose of data collection - ANSWER: Support the testing of a hypthesis
Attribute - ANSWER: characteristic of an object
Variable - ANSWER: is a logical set of attributes
Review of variable types - ANSWER: - Continuous variables
- Discrete variables
- Categorical variables
Continuous variables - ANSWER: - Numeric variables can have an infinite set of
values within a given range
,Discrete variables - ANSWER: - Numeric variables that have a countable number of
values within a given range (Ex. age)
Categorical variables - ANSWER: - Contain a finite number of categories or distinct
groups. Do not have to be numeric, through they may be represented with numbers
(Ex. live and dead)
Data collection introduces - ANSWER: bias into experimental data
Bias - ANSWER: any tendency which prevents unprejudiced consideration of a
question
- Quantitative term describing the difference between the average of measurements
made on the same object and its true value (relates to accuracy and precision)
--> SYSTEMATIC ERROR
Random errors - ANSWER: - Always present in a measurement
- Caused by inherently unpredictable fluctuations in the readings of a measurement
apparatus or in interpretation of the instrumental reading
- Strategy: make lots of measurements, calculate error
Ex. generate different results for the same repeated measurement
GOAL = study system to reduce random errors
Systemic errors - ANSWER: - Contributes to bias
- If present, always affects the results of an experiment in a predictable direction
- Caused by Ex. imperfect calibration of instruments, methods of observation,
interference of the environment with the measurement process
- Strategy: improve methods and instruments to minimize occurrence
Ex. Incorrect zeroing of an instrument leading to a zero error
Accuracy - ANSWER: proximity of measurement results to the true value
- Low accuracy is the result of systemic error
Precision - ANSWER: degree to which repeated measurements show the same
results
- Low precision may be the result of a random error
Intentional sources of bias - ANSWER: - Purposeful selection of datapoints or
experiemental subjects that tend to confirm pre-determined narratives
- Scientific fraud
- Fabricating data (rare), deleting data (common), intentionally misrepresenting data
(common)
Unintentional sources of bias - ANSWER: - Recall bias: where the outcome of a
process colors participants evaluation of all parts of the process
, - Instrument bias: where the measurements made on an instrument drift
systematically from a true value
- Confirmation bias: the tendency to value or recall data that confirms a prior belief
about how a system works
Reproducibility - ANSWER: refers to instances in which a researcher collects new
data to arrive at the same scientific findings as a previous study
- reproducibility is key to establishing scientific fact - new data, same finding
Codes - ANSWER: are tags or labels for assigning units of meaning to qualitative data
Why do we code data - ANSWER: - To make it machine readable
- To simplify: codes might represent complex concepts that would be too long to
write in each cell of a spreadsheet
- To standardize information: similar information might be given in a number of
different ways
Data coding error pose real and serious problems - ANSWER: - Time wasting ==> data
wrangling, develop research on wrong information
- Clinical consequences
- False conclusions
Data standards - ANSWER: Are documented agreements on representation, format,
definition, structing, tagging, transmission, manipulation, use, and management of
data
Ex. nomenclature
Why use data standards? - ANSWER: - Standards are difficult to establish
- They enable access because the same well understood terms, codes, and data
structures can be used for data retrieval
- They encourage and enable reuse of data for multiple purposes
- They provide consistent results during data retrieval
Controlled vocabularies - ANSWER: Controlled vocabulary = a prescribed list of
terms, each representing a concept
- Are designed for applications in which it is useful to identify each concept with one
consistent label
Ex. Protein vs gene vs mutant naming
Article: "What to do when you don't trust your data anymore" - ANSWER: - 1. What
happened? (Key events, issues, actions, etc.)
- 2. What was wrong with the data? (Why did it go undetected at first? How was it
figured out?)
o Repeated numbers in blocks (with .00 decimal, how could it be that accurate with
animals in blocks?), duplication of numbers, units with 100 added at the front
o Undetected - author put the data straight into the chart without inspecting the raw
data è trust with collaborator!
Exam With Answers 100% Complete
Why do we need computation - ANSWER: - Massive increase in data volume in the
last decades - to develop more accurate models of biological models
- Reproducibility and replicability - open access to data and its code
- Data driven hypothesis development - need good questions
Bioinformatics - ANSWER: is concerned with the acquisition, storage, analysis, and
dissemination of biological data, most often DNA and AA sequences (methods for
handling data)
- NOT the same as computational biology
Computational biology - ANSWER: is the science of using biological data to
developed algorithms or models in order to understand biological systems and
relationships
- Goal is biological insight by computational methods - using math to understand
biology (evolution, epidemiology)
System - ANSWER: a set of interacting or interdependent components forming an
integrated whole
- Described by their function or by their structure
- Systems have emergent properties arising from the interaction of components
- Biological systems range in scale from ---> cells, tissues, organs
- Biological systems are tricky to study because they have many parts and many
interactions/ processes occurring on different scales (spatial and temporal)
Biology - ANSWER: is the study of emergent properties of systems
- In the past, many scientists had a reductionist view
- When a complex system is observed to have properties that its parts do not have
on their own, this is described as an emergent property of the system
- We use computational biology to try to model emergent features of biological
systems
One such task in biology is to predict function from structure - which usually starts
with a "parts list" - ANSWER: Going from parts to structure to function is HARD - but
the opposite way is easier
- What would make going from a list of parts to a structure easier? --> organization
and grouping
When is a puzzle easy/ hard? - ANSWER: - Knowing what the pieces are part of
- Number of components (fewer is easier)
- Distinct classes of components
Knowing the rules governing interactions
- Having landmarks or a chassis
- Completness of the set
,- Contamination of the set
- Stability of the set (RNA degrades)
Problem faced by biology - ANSWER: biological systems are made up of 10s of 1000s
of components - acting on different scales, interacting in a multiplicity of ways, to
generate a diversity of states
What is computation? - ANSWER: - Computation is any type of calculation that
includes both arithmetical and non-arithmetical steps and which follows a well-
defined model (Ex. an algorithm)
Algorithm - ANSWER: is an ordered and finite set of operations that much be
followed in order to solve a problem
- Chain of instructions - ordered
- Defined objective
- Finite (it will stop when problem is solved)
Flowcharts - ANSWER: Useful way of representing algorithms
>Several conventions or rules for using flowcharts:
= Terminal - start and end (ovals)
= Input/ output - variables (parallelogram)
= Processing - math of algorithm (rectangle)
= Decision - equal to, greater than, less than, TRUE/FALSE
Second computational biology - ANSWER: - Computational biology involves the
development and application of data-analytical and theorectical methods,
mathematical modelling and computational simulation techniques to the study of
biological, ecological, behavioural, and social systems
- If we aim to understand how a system works, we must collect comprehensive data
Data (singular, datum) - ANSWER: - Data are units of information
- Data are a set of values of qualitative or quantitative variables about the attributes
of one or more persons or objects
- Attributes and variables
Purpose of data collection - ANSWER: Support the testing of a hypthesis
Attribute - ANSWER: characteristic of an object
Variable - ANSWER: is a logical set of attributes
Review of variable types - ANSWER: - Continuous variables
- Discrete variables
- Categorical variables
Continuous variables - ANSWER: - Numeric variables can have an infinite set of
values within a given range
,Discrete variables - ANSWER: - Numeric variables that have a countable number of
values within a given range (Ex. age)
Categorical variables - ANSWER: - Contain a finite number of categories or distinct
groups. Do not have to be numeric, through they may be represented with numbers
(Ex. live and dead)
Data collection introduces - ANSWER: bias into experimental data
Bias - ANSWER: any tendency which prevents unprejudiced consideration of a
question
- Quantitative term describing the difference between the average of measurements
made on the same object and its true value (relates to accuracy and precision)
--> SYSTEMATIC ERROR
Random errors - ANSWER: - Always present in a measurement
- Caused by inherently unpredictable fluctuations in the readings of a measurement
apparatus or in interpretation of the instrumental reading
- Strategy: make lots of measurements, calculate error
Ex. generate different results for the same repeated measurement
GOAL = study system to reduce random errors
Systemic errors - ANSWER: - Contributes to bias
- If present, always affects the results of an experiment in a predictable direction
- Caused by Ex. imperfect calibration of instruments, methods of observation,
interference of the environment with the measurement process
- Strategy: improve methods and instruments to minimize occurrence
Ex. Incorrect zeroing of an instrument leading to a zero error
Accuracy - ANSWER: proximity of measurement results to the true value
- Low accuracy is the result of systemic error
Precision - ANSWER: degree to which repeated measurements show the same
results
- Low precision may be the result of a random error
Intentional sources of bias - ANSWER: - Purposeful selection of datapoints or
experiemental subjects that tend to confirm pre-determined narratives
- Scientific fraud
- Fabricating data (rare), deleting data (common), intentionally misrepresenting data
(common)
Unintentional sources of bias - ANSWER: - Recall bias: where the outcome of a
process colors participants evaluation of all parts of the process
, - Instrument bias: where the measurements made on an instrument drift
systematically from a true value
- Confirmation bias: the tendency to value or recall data that confirms a prior belief
about how a system works
Reproducibility - ANSWER: refers to instances in which a researcher collects new
data to arrive at the same scientific findings as a previous study
- reproducibility is key to establishing scientific fact - new data, same finding
Codes - ANSWER: are tags or labels for assigning units of meaning to qualitative data
Why do we code data - ANSWER: - To make it machine readable
- To simplify: codes might represent complex concepts that would be too long to
write in each cell of a spreadsheet
- To standardize information: similar information might be given in a number of
different ways
Data coding error pose real and serious problems - ANSWER: - Time wasting ==> data
wrangling, develop research on wrong information
- Clinical consequences
- False conclusions
Data standards - ANSWER: Are documented agreements on representation, format,
definition, structing, tagging, transmission, manipulation, use, and management of
data
Ex. nomenclature
Why use data standards? - ANSWER: - Standards are difficult to establish
- They enable access because the same well understood terms, codes, and data
structures can be used for data retrieval
- They encourage and enable reuse of data for multiple purposes
- They provide consistent results during data retrieval
Controlled vocabularies - ANSWER: Controlled vocabulary = a prescribed list of
terms, each representing a concept
- Are designed for applications in which it is useful to identify each concept with one
consistent label
Ex. Protein vs gene vs mutant naming
Article: "What to do when you don't trust your data anymore" - ANSWER: - 1. What
happened? (Key events, issues, actions, etc.)
- 2. What was wrong with the data? (Why did it go undetected at first? How was it
figured out?)
o Repeated numbers in blocks (with .00 decimal, how could it be that accurate with
animals in blocks?), duplication of numbers, units with 100 added at the front
o Undetected - author put the data straight into the chart without inspecting the raw
data è trust with collaborator!