top of page

Data Architect Certification Program

Course overview:

 

This six-week intensive prepares working technologists to design, govern, and operate modern data platforms end-to-end. Participants move from conceptual data modeling through cloud-native lakehouse architectures, streaming pipelines, and enterprise governance, finishing with a capstone where each student delivers a production-ready architecture blueprint for a real-world scenario.

​

The curriculum blends concise theory with hands-on labs, weekly quizzes, take-home assignments, and a graded capstone project. Tools and platforms covered include AWS (S3, Glue, Redshift, Lake Formation), Snowflake, Apache Kafka, dbt, Apache Airflow, Terraform, and modern observability stacks. Every week includes a live lab session in a sandboxed cloud environment so concepts are reinforced through building, not just reading.

Course Content

 

🔹 Week 1: Foundations and data modeling

Topics:

  • Role of the data architect, stakeholder map, and architecture deliverables (C4, ADRs, reference diagrams).

  • Conceptual, logical, and physical data modeling — when to use each.

  • Normalization (1NF–3NF, BCNF) versus denormalization trade-offs.

  • Dimensional modeling: star, snowflake, conformed dimensions, slowly changing dimensions (Type 1, 2, 3).

  • Data Vault 2.0 introduction: hubs, links, satellites.

 

🔹 Week 2: Storage systems and data warehousing

Topics:

  • OLTP versus OLAP workload characteristics; row-store versus columnar storage.

  • RDBMS internals: indexing strategies, partitioning, query planning.

  • NoSQL family deep dive: key-value, document, wide-column, graph — selection criteria.

  • Cloud data warehouses: Snowflake, Redshift, BigQuery — architecture and pricing models.

  • ETL versus ELT; ingestion patterns and orchestration with Airflow and dbt.

 

🔹 Week 3: Data lakes and lakehouse architecture

Topics:

  • Data lake fundamentals: zones (raw, curated, consumption), file formats, and partition design.

  • Open table formats: Delta Lake, Apache Iceberg, Apache Hudi — feature comparison.

  • Medallion architecture: bronze, silver, gold layering and SLAs per layer.

  • Query engines: Athena, Trino, Spark SQL, Databricks SQL — trade-offs.

  • Cost optimization: tiered storage, lifecycle policies, file compaction, Z-ordering.

 

🔹 Week 4: Streaming and real-time data pipelines

Topics:

  • Batch versus streaming versus micro-batch; latency budgets and use case mapping.

  • Apache Kafka deep dive: brokers, topics, partitions, consumer groups, exactly-once semantics.

  • Stream processing engines: Kafka Streams, Flink, Spark Structured Streaming.

  • Change data capture (CDC) patterns with Debezium; outbox pattern.

  • Event-driven architecture, schema registry, and contract testing for events.​

🔹 Week 5: Data governance, security, and quality

Topics:

  • Data governance frameworks: DAMA-DMBOK overview, RACI for data ownership.

  • Catalog and lineage tooling: AWS Glue Data Catalog, Unity Catalog, OpenMetadata, Collibra.

  • Data quality frameworks: Great Expectations, dbt tests, Soda; SLA and SLO definition.

  • Security: IAM, row- and column-level security, tokenization, encryption at rest and in transit.

  • PII handling, GDPR / CCPA / HIPAA constraints, data residency, and audit logging.

​

🔹 Week 6: Capstone project and architecture review

Topics:

  • Executive summary (1 page) covering business context, success criteria, and proposed approach.

  • Reference architecture diagram (C4 levels 1–3) plus a deployment view with cloud services labeled.

  • Data model artifacts: conceptual ERD, dimensional model for the analytics layer, and key DDL.

  • Pipeline design: ingestion sources, batch versus streaming routing, orchestration, and SLAs per layer.

  • Governance plan: catalog choice, lineage strategy, quality checks, security model, and compliance mapping.

  • Cost model with three load scenarios (low, expected, peak) and a sensitivity analysis on the top two drivers.

  • Three to five architecture decision records (ADRs) documenting major trade-offs.

  • Risk register listing top ten risks with mitigations and owners.

Staffing Support​
  • Resume Preparation

  • Mock Interview Preparation

  • Phone Interview Preparation

  • Face to Face Interview Preparation

  • Project/Technology Preparation

  • Internship with internal project work

  • Externship with client project work

Our Salient Features:
  • Hands-on Labs and Homework

  • Group discussion and Case Study

  • Course Project work

  • Regular Quiz / Exam

  • Regular support beyond the classroom

  • Students can re-take the class at no cost

  • Dedicated conf. rooms for group project work

  • Live streaming for the remote students

  • Video recording capability to catch up the missed class

Apply for a Job

Disclaimer: At this time, we are not offering any training programs for Nebraska residents.
 

'PMP' and 'PMI' are registered marks of the Project Management Institute, Inc.

​

​InfoTekGuide is an independent training provider and is not affiliated with, endorsed by, or sponsored by Salesforce, Google, YouTube, Amazon, Microsoft, Azure, Cisco, Snowflake, or Atlassian. All trademarks, logos, and brand names are the property of their respective owners. Any references are used for educational and descriptive purposes only.

InfoTekGuide - A Leading IT Training Provider in Schaumburg.

bottom of page
InfoTekGuide Assistant
Hello! I am your InfoTekGuide assistant. How can I help you today?