California Research Data Specialist (RDS) Exam
ACTUAL QUESTIONS AND ANSWERS LATEST
UPDATE THIS YEAR
California Research Data Specialist (RDS) Exam
SUMMARIZED EXAM COVERAGE (point form)
• Research Design & Methodology – Formulating research questions, study designs
(cross-sectional, longitudinal, case-control, cohort), sampling methods (simple random,
stratified, cluster, systematic), survey design (questionnaire development, bias reduction,
response rates), experimental vs. quasi-experimental design.
• Data Collection & Management – Primary vs. secondary data sources, administrative data, data
extraction (SQL, APIs), data cleaning (handling missing values, outliers, inconsistencies), data
linkage (deterministic, probabilistic), data governance (confidentiality, data security, PII
handling), metadata management.
• Statistical Analysis & Interpretation – Descriptive statistics (mean, median, mode, variance,
standard deviation, percentiles, skewness, kurtosis), inferential statistics (hypothesis testing –
t-test, chi-square, ANOVA, correlation, regression – linear, logistic), confidence intervals,
p-values, Type I/II errors, power analysis, non-parametric tests (Mann-Whitney, Kruskal-Wallis),
time series analysis.
• Data Programming & Tools – SAS (PROC SQL, PROC MEANS, PROC FREQ, DATA steps), R (dplyr,
ggplot2, lm, glm), Python (pandas, numpy, matplotlib, seaborn, scikit-learn), SQL (joins,
subqueries, aggregations, window functions), Excel (pivot tables, formulas, charts),
Tableau/Power BI (dashboards, calculated fields, data blending).
• Data Visualization & Reporting – Principles of effective visualization (Tufte, chart junk,
appropriate chart types), creating tables, graphs (bar, line, scatter, histogram, boxplot), heat
maps, geographic mapping (GIS basics), writing research reports (executive summaries,
methodology, results, limitations, recommendations), presenting to stakeholders.
• Ethical & Legal Considerations – Confidentiality, privacy laws (California Confidentiality of
Medical Information Act, Public Records Act), IRB, data use agreements, ethics of data
manipulation and interpretation, avoiding p-hacking, HARKing.
• Project Management & Collaboration – Workflow design, documentation, version control (Git
basics), team communication, meeting deadlines, leading small projects, supervising junior
analysts (RDS II/III).
• California Context – Common state datasets (vital statistics, employment, environmental,
health, housing, criminal justice), understanding of California’s diverse populations, policy
applications (e.g., CalEnviroScreen, unemployment insurance data, Medi-Cal data).
QUESTION 1: A Research Data Specialist is asked to estimate the unemployment rate for a small county
using a survey of 500 residents. The sample was selected by dividing the county into geographic blocks
and randomly selecting 10 households from each block. This sampling method is called:
, Page 2 of 132
A) Simple random sampling
B) Systematic sampling
C) Stratified random sampling
D) Cluster sampling
Answer: D – The procedure describes cluster sampling (first stage: blocks, second stage: households
within selected blocks). Stratified sampling would involve random selection from every block.
QUESTION 2: An RDS is analyzing hospital discharge data to study readmission rates. She notices that
one hospital has a disproportionately low readmission rate because it transfers many high-risk patients
to other facilities before discharge. This data quality issue is best described as:
A) Measurement error
B) Selection bias
C) Missing data
D) Outlier influence
Answer: B – Systematic differences in how subjects are assigned or excluded (transfers) create selection
bias that distorts the true readmission rate.
, Page 3 of 132
QUESTION 3: A state program needs to link three years of Medi-Cal claims data to a survey of patient
satisfaction. The only common identifier is a scrambled version of the patient’s Social Security number,
which contains some mismatches due to data entry errors. Which technique is most appropriate for
linking the two datasets?
A) Deterministic matching on the scrambled SSN
B) Probabilistic matching using multiple identifiers (e.g., date of birth, gender, ZIP code)
C) Manual review of all mismatched records
D) Dropping all records without an exact match
Answer: B – Probabilistic matching uses multiple variables to estimate the likelihood that two records
belong to the same person, accommodating errors in any single identifier.
QUESTION 4: An RDS is asked to produce a quarterly report on average wait times for unemployment
insurance appeals. The database contains an “appeal resolution date” but no “appeal filing date”. What
is the best first step?
A) Calculate the average time between the resolution date and the end of the quarter
B) Impute the filing date as the first day of the quarter
C) Work with program staff to obtain or reconstruct the missing filing date from other sources
, Page 4 of 132
D) Omit the variable from the report
Answer: C – Before improvising or omitting, the analyst should attempt to retrieve or derive the missing
date using existing data (e.g., date of initial determination) or by consulting subject matter experts.
QUESTION 5: A data specialist is using SAS to merge two tables: Table A (claims, 10 million rows) and
Table B (member demographics, 500,000 rows). The merge should keep only claims that have a
matching member record. The most efficient SAS code would use:
A) DATA step with MERGE and IN= option
B) PROC SQL with LEFT JOIN
C) DATA step with SET and BY statements
D) PROC SORT followed by MERGE with IN= and the IF condition
Answer: D – Sorting both tables by the key variable and then using a DATA step MERGE with IN= allows
efficient inner join. PROC SQL is also valid but may be less efficient for very large datasets.
QUESTION 6: A researcher is testing whether a new job training program reduces time on welfare. She
randomly assigns participants to treatment and control groups. After six months, the treatment group
shows a 2-week lower average time on welfare, but the p-value is 0.12. At the conventional α=0.05