Specialist Pentaho Data Integration
Implementation Certification Review Guide
**Question 1.** Which PDI component is primarily used for designing transformations and
jobs?
A) Kitchen
B) Spoon
C) Carte
D) Pan
Answer: B
Explanation: Spoon provides the graphical user interface for creating and editing
transformations and jobs.
**Question 2.** In PDI terminology, what does the “Pan” tool execute?
A) Remote job scheduling
B) Transformation execution on the local machine
C) Repository management
D) Server monitoring
Answer: B
Explanation: Pan runs transformations locally, reading the transformation XML and processing
the steps.
**Question 3.** Which step is best suited for row‑level data cleansing that removes leading and
trailing spaces?
A) Select Values
B) Replace in String
C) String Operations
D) Filter Rows
, [HCE5920] HCE 5920 Hitachi Vantara Certified
Specialist Pentaho Data Integration
Implementation Certification Review Guide
Answer: B
Explanation: Replace in String can be configured to replace regex patterns such as ^\s+|\s+$ to
trim spaces.
**Question 4.** When would you prefer a “Database Lookup” step over a “Stream Lookup”?
A) When the lookup data is small and fits in memory
B) When the lookup data resides in a relational database and you need SQL filtering
C) When you need a real‑time API call
D) When you need to join two streams without sorting
Answer: B
Explanation: Database Lookup queries the database directly, allowing SQL filters and leveraging
indexes.
**Question 5.** Which variable scope is visible to all jobs and transformations launched from a
Pentaho Server session?
A) Environment Variable
B) Root Job Variable
C) System Variable
D) Local Variable
Answer: A
Explanation: Environment variables are defined at the operating‑system level and are inherited
by all PDI processes.
**Question 6.** What is the purpose of the “Merge Join” step?
A) To concatenate two streams without sorting
, [HCE5920] HCE 5920 Hitachi Vantara Certified
Specialist Pentaho Data Integration
Implementation Certification Review Guide
B) To perform a SQL‑style join on two sorted input streams
C) To merge rows from a single stream into a single row
D) To join rows based on a regular expression
Answer: B
Explanation: Merge Join requires both input streams to be sorted on the join keys, then
performs the join in memory.
**Question 7.** Which repository type stores transformation metadata in XML files on the file
system?
A) Database Repository
B) File‑based Repository
C) JCR Repository
D) Remote Repository
Answer: B
Explanation: The file‑based repository saves objects as .ktr and .kjb files in a directory structure.
**Question 8.** Which step can be used to calculate a new field based on arithmetic operations
on existing numeric fields?
A) Calculator
B) Modified Java Script Value
C) Filter Rows
D) Table Output
Answer: A
Explanation: The Calculator step provides a wide range of arithmetic, statistical, and conversion
functions.
, [HCE5920] HCE 5920 Hitachi Vantara Certified
Specialist Pentaho Data Integration
Implementation Certification Review Guide
**Question 9.** Which job entry would you use to verify that a file exists before proceeding to
the next step?
A) Check Files
B) FTP Transfer
C) Shell
D) Evaluation
Answer: A
Explanation: The “Check Files” entry tests the presence (or absence) of a file and can set
success/failure conditions.
**Question 10.** In a Pentaho job, what does the “Dummy” entry primarily do?
A) Executes a shell script
B) Acts as a placeholder for branching logic without performing actions
C) Sends an email notification
D) Copies files between directories
Answer: B
Explanation: Dummy entries are used to create logical flow points without executing any
operation.
**Question 11.** Which of the following best describes the “Set Variables” step?
A) It reads variables from a properties file
B) It defines new variables that can be accessed by downstream steps or jobs
C) It encrypts variable values for security
D) It deletes variables from the environment