Complete Interview Preparation Guide
CI/CD | SDLC & TDLC | Agile | Snowflake SQL | Matillion ETL
Comprehensive guide covering all key concepts for data engineering interviews — from DevOps
pipelines and delivery lifecycles to hands-on Snowflake SQL and Matillion ETL.
Section 1: CI/CD — Continuous Integration &
Continuous Delivery/Deployment
1.1 What is CI/CD?
CI/CD is a set of practices and automated pipelines that enable development teams to deliver
code changes more frequently, reliably, and safely. It bridges the gap between development
and operations (DevOps) by automating the building, testing, and deployment of applications.
Term Full Form & Meaning
CI — Continuous Developers frequently merge code changes into a shared
Integration repository. Every merge triggers an automated build and test run
to catch issues early.
CD — Continuous Extends CI by automatically preparing every validated build for
Delivery release to production. A human approval step is still required
before production deployment.
CD — Continuous Fully automated — every change that passes all tests is deployed
Deployment to production automatically, with no manual step.
1.2 CI/CD Pipeline Stages
• Source / Code: Developer pushes code to a version control system (Git). A trigger fires
the pipeline.
• Build: Source code is compiled, dependencies are installed, and a build artifact (JAR,
Docker image, package) is created.
• Test: Automated tests run — unit tests, integration tests, regression tests, code quality
checks (linting, SAST).
, • Staging / Pre-Prod Deploy: The artifact is deployed to a staging environment that
mirrors production for final validation.
• Acceptance Testing: Automated or manual UAT, performance tests, and smoke tests
run on staging.
• Production Deploy: On success, the artifact is deployed to production — manually
(Continuous Delivery) or automatically (Continuous Deployment).
• Monitor & Observe: Post-deployment monitoring, alerting, and rollback capability.
1.3 Popular CI/CD Tools
Tool Description
Jenkins Open-source, highly extensible. Most widely used CI/CD server.
Plugin ecosystem of 1800+ plugins.
GitHub Actions Native CI/CD within GitHub. YAML-based workflows triggered by
GitHub events (push, PR, schedule).
GitLab CI/CD Built into GitLab. Powerful YAML-based pipelines with built-in
container registry and artifact storage.
Azure DevOps Microsoft's full DevOps platform: Boards, Repos, Pipelines, Test
Plans, Artifacts.
CircleCI Cloud-native CI/CD platform. Fast, Docker-first, strong
parallelism.
AWS CodePipeline Managed CI/CD service on AWS. Integrates natively with
CodeBuild, CodeDeploy, S3, Lambda.
Bitbucket Pipelines CI/CD built into Atlassian's Bitbucket. Tight integration with Jira.
ArgoCD GitOps-based CD tool for Kubernetes. Syncs cluster state with
Git repository.
Terraform / Ansible Infrastructure-as-Code tools used alongside CI/CD for
environment provisioning and configuration management.
1.4 Version Control & Branching Strategies
Git Branching Strategies
• GitFlow: Structured model with long-lived branches: main, develop, feature/*, release/*,
hotfix/*. Best for versioned releases.
• GitHub Flow: Simplified — only main branch + short-lived feature branches. PR →
review → merge → deploy. Best for continuous deployment.
• Trunk-Based Development: All developers commit directly to the main branch (trunk)
using very short-lived feature branches (< 1 day). Requires strong automated testing.
• GitLab Flow: Adds environment branches (production, staging) on top of GitHub Flow
for clearer deployment tracking.
Key Git Concepts
, • Branch: Independent line of development. Isolates work without affecting main
codebase.
• Pull Request (PR) / Merge Request (MR): A formal request to merge a branch.
Enables code review before merging.
• Merge: Combines changes from one branch into another.
• Rebase: Re-applies commits on top of another branch, creating a linear history.
• Tag: Marks a specific commit as a release version (e.g., v1.0.0).
• Cherry-pick: Applies a specific commit from one branch to another without merging the
whole branch.
git checkout -b feature/my-feature # create feature branch
git add . && git commit -m "feat: add transformation logic"
git push origin feature/my-feature # push to remote
# Open Pull Request → Code Review → Merge to develop → CI triggers
1.5 Semantic Versioning (SemVer)
The standard version format used in software releases: MAJOR.MINOR.PATCH (e.g., 2.4.1)
• MAJOR (2.x.x): Breaking/incompatible API changes. Requires migration.
• MINOR (x.4.x): New features added in a backward-compatible manner.
• PATCH (x.x.1): Backward-compatible bug fixes and small patches.
• Pre-release: Suffixes like -alpha, -beta, -rc.1 indicate pre-release versions.
Note: Data pipelines and Snowflake objects are also versioned using SemVer or date-
based tags (e.g., v20240115) in dbt, Matillion, and deployment scripts.
1.6 CI/CD for Data Engineering (DataOps)
CI/CD principles applied to data pipelines — sometimes called DataOps — ensure data
transformations, schemas, and pipelines are tested and deployed as rigorously as application
code.
• dbt + GitHub Actions: dbt models are committed to Git. On every PR, a CI job runs dbt
build --select state:modified+ to test only changed models.
• Snowflake + CI/CD: Schema migrations (CREATE TABLE, ALTER TABLE) are
versioned using tools like Flyway or Liquibase and applied in CI pipelines.
• Matillion + Git: Matillion jobs are exported and version-controlled in Git. Environment
promotion (Dev → QA → Prod) is automated via Matillion's REST API or CLI.
• Automated testing: Data quality tests (row counts, null checks, uniqueness) run in CI
before data reaches production.
• Infrastructure as Code: Snowflake warehouses, roles, and databases defined in
Terraform — applied via CI/CD pipeline.
1.7 CI/CD Interview Questions & Answers
Q1. What is the difference between Continuous Delivery and Continuous Deployment?
Continuous Delivery means every change that passes automated tests is ready for
production deployment, but a human approval step is required before it goes live.