Questions and Answers | 100% Correct |
A+ Verified | 2026
• Data transformation . Answer: scalable build system for data that leverages multimodal
compute to produce output datasets
• Pipeline Management . Answer: - capabilities combine change management, data
quality, and data loading features.
- enables fast, flexible, and scalable delivery of data pipelines while providing
robustness and security
- Data engineers can define health checks that guarantee only fully compliant data will
be deployed to production. Where issues are found, the platform provides diagnostics
on the discrepancies detected.
• Hyper Auto . Answer: support for Software-Defined Data Integration (SDDI) to not only
connect to ERP and CRM, but generate fast data pipelines that could then feed into the
Ontology to translate data into operational
• External Transformations . Answer: perform scheduled syncs and exports to external
systems using REST APIs. Recommended to use Code Repositories in Foundry to
write external Python transforms
• Dataset . Answer: - most essential representation of data, fundamentally a wrapper
around a collection of files stored in a backing file system that allows for perms, schema
management, version control and updates
- structured (tabular - parquet, csv)
- unstructured (images, video, PDFs)
- semi-structured (XML, JSON)
- transactions - git commands for the datasets (open, committed, updating)
• Streams . Answer: similar to dataset, but a representation of data - wrapped around a
collection of rows that are tabular
- provides a lower latency view of the data
- hot buffer - low latency to pull from storage
- cold buffer - transferred over to this every few minutes to archive data
- high throughput and compressed stream types
• Media Set . Answer: Multiple files with common schema (file format), used to work with
high-scale, unstructured data (multiple pdfs)
• Jobs . Answer: ran on datasets to compute after changes
- jobspec - encapsulated by a job, and it is the definition of how a job should be
constructed
, - job types: data connection sync, code repository, health checks, analytical
applications, exports
• Schedules . Answer: used to run builds off of a trigger, which could be a time or
action/event
• Health checks . Answer: used to validate data quality that is scheduled
- job level, build level, and freshness check
• Virtual Tables . Answer: allows you to query tables in supported data platforms without
storing it in a dataset (so data coming from other places)
• Change Data Capture (CDC) . Answer: enterprise data integration pattern often used
to stream real-time updates from a relational database to other consumers, supporting
syncs, processes, stores from file systems that produce capture feeds
- must have one or more primary key columns, one or more ordering columns, and a
deletion column
- common : microsoft sql server, postgresql, oracle, db2
• Views . Answer: behave similarly to dataset view, but does not hold any files
containing data -- composed of the union of other datasets (backing datasets) when it is
read -- can be thought of as pointing to backing datasets
- can automatically perform deduplication of data if primary keys exist
- can use like regular datasets, but views cannot be specified as valid transform outputs
-- instead, valid transform inputs
- can only be used with datasets that have a schema
- used for automatic updates, folder organization, data uniqueness
• Code Repositories . Answer: web based integrated IDE
- transforms repository type - repositories support authoring data transformation log and
include feature to enable previewing and debugging transformations (python, java, sql)
- functions repository type - enable writing business logic that can be executed with low
latency in an operational context (Typescript, Python)
- Model development repository type - train models
• Contour . Answer: (python transforms)
Provides user interface to perform data analysis on tables at scale, creating dashboards
that allow others to explore in a structured way
- Features:
- visualize, filter, transform data without code
- organize complex analysis into analytical paths
- parameterize analyses to switch between different views of the data and results
- create interactive dashboards
- save analysis results as a new dataset for use
-Uses:
- some or all of the data you want is not mapped in the Ontology