Summery

Data Engineering

Topic	Main Points
Processing Techniques vs. Traditional Roles	New processing techniques are challenging traditional technology roles. They are changing the day-to-day work of data professionals.
Types of Data	There are two broad types of data: structured and unstructured.
Relational Databases	In relational database systems like Microsoft SQL Server, Azure SQL Database, and Azure SQL Data Warehouse, data is defined in tables.
Non-Relational Systems	Unstructured data is stored in non-relational or NoSQL systems.
Data Engineers and Data Types	Data engineers work with unstructured data and various new data types, including streaming data.
Data Extraction and Transformation	Data engineers extract raw data from structured or unstructured data pools. They transform data from the source schema to the destination schema.
Transformation Processes	ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are two approaches to data transformation.
Disadvantages of ETL	ETL transformation can take a long time and tie up source system resources.
Migration to Cloud	Factors to consider when migrating from on-premises to cloud-based solutions.
Roles in Data Industry	Development of data engineer, data scientist, and artificial intelligence engineer roles.
Data Project Phases	High-level architecture example following five data project phases: sourcing, ingest, preparation, analysis, and consumption.
Azure Storage Accounts	Base storage type within Azure with four configuration options: blob, files, queue, and table.
Data Ingestion Tools	Introduction to tools for ingesting data into Azure storage.
Considerations for Optimal Storage	Key factors to consider when choosing the optimal storage solution.
Features of Azure Storage Accounts	Scalability, security, durability, high availability, and Azure's management of maintenance and critical issues.
Data Encryption in Azure Storage	How Azure storage encrypts data and provides access control.
Azure Data Lake Storage	Overview of Data Lake Storage with Hadoop-compatible data repository and compute capabilities.
Data Ingestion and Querying	Methods for data ingest and querying using tools like Azure Data Factory, Apache Sqoop, and uSQL.
Azure Cosmos DB	Globally distributed multi-model database with strengths in uptime, replication, and consistency.
Usage and Deployment of Cosmos DB	Deployment options, data ingestion, and query capabilities of Azure Cosmos DB.
Security Features in Cosmos DB	Data encryption, IP firewall configurations, and access control from virtual networks.
Azure SQL Database (PaaS)	Managed relational database service supporting structured and unstructured data.
Features of Azure SQL Database	Comprehensive security, scalability, and support for OLTP systems.
Data Ingestion and Querying in SQL DB	Ingestion through application integration, querying using T-SQL, and security features.

Data storage

Topic	Main Points
Choosing the Right Storage Solution	Considerations for selecting the optimal storage solution for various datasets.
Data Classification	Data classification into structured, semi-structured, and unstructured categories.
Data Serialization Languages	Use of serialization languages like JSON, XML, and YAML for semi-structured data exchange.
Factors for Choosing Storage Solutions	Consideration of data type, operations, latency, and transactional support.
ACID Guarantees	Explanation of ACID guarantees (Atomicity, Consistency, Isolation, Durability) in transactions.
Azure Services for Data Storage	Introduction to Azure SQL Database, Azure Analysis Services, and Azure Cosmos DB.
Azure Blob Storage	Use cases for Azure Blob Storage, including tiered storage and integration with CDN.
Other Azure NoSQL Storage Options	Mention of Azure Table Storage, Azure HBase, and Azure Cache for Redis for NoSQL data storage.
Azure Storage Accounts	Overview of storage accounts, their settings, and considerations for their usage.
Creating Storage Accounts	Explanation of tools for creating storage accounts, including Azure Portal and Azure CLI.
Microsoft Azure Storage Overview	Features of Microsoft Azure storage, including durability, security, and scalability.
Types of Azure Data Services	Description of Azure Blob Storage, Azure Files, Azure Queue Service, and Azure Table Storage.
Azure Storage REST API	Use of the Azure Storage REST API for operating on containers and data in storage accounts.
Client Libraries for Azure Storage	Availability of client libraries in various languages and frameworks for faster development.
Accessing Azure Storage	Connecting apps to Azure storage accounts using access keys, rest API endpoints, and connection strings.
Secure Access Key Management	Best practices for managing secure access keys, including rotation and shared access signatures.
Encryption in Azure Storage	Explanation of Storage Service Encryption (SSE) and transport level security for data encryption.
Access Control and Role-Based Access	Use of role-based access control (RBAC) with Azure Active Directory for resource and data operations.
Storage Analytics and Logging	Introduction to Storage Analytics, real-time logs, and their filtering and search capabilities.
Threat Protection in Azure Storage	Overview of Azure Defender for Storage, security alerts, and integrated threat mitigation.
Security Features in Azure Data Lake	Security features in Azure Data Lake Storage, including authentication schemes and management tools.
Blob Storage API and Usage	Overview of Blob Storage API, supported operations for blobs and containers, and data organization.
Types of Blobs and Their Usage	Explanation of block blobs, append blobs, and page blobs, and their appropriate use cases.
Designing Data Storage in Azure	Considerations for organizing data across storage accounts, containers, and blobs.
Preparing for Certification	Encouragement to take a practice exam for certification preparation.

Data Integration with Microsoft Azure Data Factory

Topic	Main Points
Evolving World of Data	Changes in the data world, including new technologies and rule adjustments.
Adaptation as a Data Engineer	The importance of data engineers understanding and adapting to these changes.
ELT and ETL Processors	Explanation of ELT and ETL processors in data platforms.
Shift towards ELT Approach	How developments in data platform technologies favor an ELT approach.
Holistic Data Project Approach	The shift towards predictive and preemptive analytics and the need for a holistic data project view.
Healthcare IoT Use Case	An example of an IoT deployment in healthcare and its impact on data engineers.
Azure Data Factory Tools	Overview of Azure Data Factory and its data transformation and cleansing capabilities.
Data Transformation in ADF	Common data transformation and cleansing activities within Azure Data Factory.
Orchestration in Azure Data Factory	How Azure Data Factory orchestrates data movements and transformations.
Data Factory Control Flow	Exploration of Data Factory control flow, pipelines, debugging, and parameters.
SSIS in Azure Data Factory	The integration of SQL Server Integration Services (SSIS) with Azure Data Factory.
Azure DevOps and GitHub Integration	How Azure Data Factory integrates with Azure DevOps and GitHub for source control and CI/CD.
Data Integration with Azure Data Share	Simplifying data integration from multiple sources using Azure Data Share and Data Factory.
Completing Data Integration	Overview of data integration at scale with Azure Data Factory and Azure Synapse pipelines.

Azure Synapse Analytics

Topic	Main Points
Azure Synapse Analytics Overview	Introduction to Azure Synapse Analytics as a unified platform for various data-related tasks.
Capabilities of Azure Synapse Analytics	Explanation of the capabilities, including SQL pools, Spark pools, data integration, and visualization.
Use Cases for Azure Synapse Analytics	Common use cases for Azure Synapse Analytics, such as data warehousing, analytics, and integration.
Components of Azure Synapse Analytics	Introduction to components like Azure Synapse Workspace, data warehouse, and data virtualization.
Data Warehousing in Azure Synapse	Overview of data warehousing, its role, and key points related to data extraction and transformation.
Apache Spark in Azure Synapse	Explanation of Apache Spark usage within Azure Synapse Analytics via Spark pools.
Azure Synapse Pipelines	Introduction to Azure Synapse Pipelines for cloud-based ETL and data integration workflows.
Azure Synapse Studio Overview	Overview of Azure Synapse Studio as a web UI for data exploration, development, and management.
Hubs in Azure Synapse Studio	Explanation of different hubs within Azure Synapse Studio for various data-related tasks.
Analytical Processes in Azure Synapse	Overview of the analytical processes, data ingestion, preparation, and data shaping in Synapse.
Building a Modern Data Warehouse	Explanation of modern data warehouse architecture, including staging areas and data formats.
Staging Area in Data Warehousing	Reasons for adding a staging area, including reducing dependencies and handling source systems.
Data Storage in a Data Warehouse	Recommendations for data format (Parquet) and best practices for Azure Data Lake Storage usage.

Azure Synapse Apache Spark Pools

Topic	Main Points
Building a Modern Data Warehouse	Explanation of the process, including data ingestion, preparation, and data accessibility.
Best Practices for Azure Data Lake Storage	Considerations when working with Azure Data Lake Storage, including data structure and file sizes.
Star Schema Design	Designing a star schema and distinguishing dimension and fact tables.
Data Loading in Azure Synapse Analytics	Importance of data loading, minimizing impact, and tools for loading data into Synapse Analytics.
Managing Data Workloads	Managing resource availability, workload importance, and optimizing performance in Synapse Analytics.
Table Distribution and Indexing	Impact of table distribution on data load and query performance, and indexing strategies.
Materialized Views	Improving query performance with materialized views and read committed snapshot isolation levels.
Query Optimization Techniques	Techniques like result set caching, approximate execution, and stored procedures for optimization.
Compute Resource Management	Pausing and resuming compute resources to reduce costs and utilizing Azure Advisor recommendations.
Column store Index and Materialized Views	Exploring the benefits of column store indexes and materialized views in Synapse SQL pools.
Logged Operations and Efficiency	Differentiating fully logged and minimally logged operations for performance and efficiency.
Security and Authentication	Configuring authentication, network security, and securing keys using Azure Key Vault.
Authorization and Row-Level Security	Managing authorization through column and row-level security in Azure Synapse Analytics.
Encryption and Transparent Data Encryption	Implementing encryption with Transparent Data Encryption (TDE) to protect Synapse Analytics.
Big Data Engineering with Spark	Introduction to Apache Spark and its role in Azure Synapse Analytics. What Apache Spark pools are, their purpose, and benefits. How Apache Spark applications work within Azure Synapse Analytics. Creation and management of Spark pools in Azure Synapse Analytics Studio. Use cases for Spark pools in various Azure services.
Query Pools and Workload Management	Integration of SQL and Apache Spark pools in Azure Synapse Analytics. The Azure Synapse Apache Spark to Synapse SQL connector and its capabilities. Benefits of interoperability between Apache Spark and SQL in data exploration, loading, and sharing.
Workload Monitoring	Monitoring Spark pools and Azure Synapse Analytics using the Monitor Hub. Identifying poorly performing Spark pool runs and areas for optimization. Optimization strategies, including data format, caching, memory efficiency, bucketing, and job execution.

Operational Analytics with Azure Synapse Analytics

Explanation of how Azure Cosmos DB uses Hybrid Transactional Analytical Processing (HTAP). Azure Synapse Analytics features for querying the analytical store with SQL and Apache Spark. Introduction to Azure Synapse Link for Azure Cosmos DB as a Cloud-native HTAP capability.
Benefits of Azure Synapse Link for Azure Cosmos DB, including use cases in supply chain analytics, IoT, etc.

Topic	Main Points
Data Partitioning and Querying	Separation of transactional and analytical stores in Azure Cosmos DB. Partitioning data based on partition keys for efficient querying. Embedding entities within an array to optimize transactional data models.
Configuring and Enabling Azure Synapse Link	Steps to configure and enable Azure Synapse Link for Azure Cosmos DB.
Querying Azure Cosmos DB	Performing analytics and queries using Azure Synapse Link and serverless SQL pools. Aggregations, cross-container queries, complex JSON queries, windowing functions, and data visualization.
Writing Data Back to Cosmos DB	Writing data back to the Azure Cosmos DB transactional store. Reading data from the transactional store.

Azure Databricks

Azure Databricks is a notebook-oriented Apache Spark service in Azure. It provides a single platform for cluster management and interactive data exploration. Databricks improves Spark performance and reduces costs when running on Azure.

Topic	Main Points
Azure Databricks Architecture	Apache Spark functions through parallelism and clusters managed by a cluster manager.
	Azure Databricks supports data handling tasks like reading, writing, and querying.
Data Formats and Parquet Files	Reading and writing data in Databricks requires knowledge of various file formats.
	Importance of working with Parquet files in the Databricks file system.
Data Processing with DataFrames	DataFrames in Azure Databricks simplify data exploration and transformations.
Performance Features in Azure Databricks	Catalyst optimizer stages: logical plan analysis, optimization, physical planning, and code generation.
Security and Data Protection in Databricks	Databricks integrates with Azure services and offers high-level security.
	Access management using Azure role-based access control.
Delta Lake for Large Data	Delta Lake usage for managing large volumes of data.
	Capabilities such as table creation, appending, upsurging, and optimizations.
Azure Databricks for Streaming Data	Analyzing and processing streaming data using Apache Spark Structured Streaming.
Integration with Azure Services	Azure Data Factory for executing workloads from Databricks.
	Creating data architecture to integrate multiple services.
Azure DevOps for CI/CD	Using Azure DevOps for Continuous Integration (CI) and Continuous Deployment (CD).
	Building release pipelines and source code repository for Databricks notebooks.
Integration with Azure Synapse Analytics	Integration of Databricks with Azure Synapse Analytics using SQL Data Warehouse Connector.
Best Practices for Databricks	Following best practices for workspace administration, security, tools, integration, and more.

Azure Data Lake Storage Gen2 and Data Streaming Solution

Azure Data Lake Storage is a scalable and cost-effective data lake solution for big data analytics in Azure. It provides a repository for storing large amounts of unstructured data for high-performance analytics.

Topic	Main Points
Benefits of Data Lake Storage Gen2	Benefits include Hadoop-compatible access, security, performance, and data redundancy.
Creating Azure Storage Account	Creating an Azure storage account using the Azure portal.
Azure Blob Storage vs. Data Lake Storage Gen2	Comparing Azure Blob Storage and Azure Data Lake Storage Gen2 and finding compatibility.
Common Stages in Big Data Processing	Overview of the four common stages: ingestion, storage, preparation, training, modeling, and serving.
Supported Open Source Platforms	Supported platforms using Azure Data Lake Storage Gen2 for real-time analytical solutions.
Security Features of Data Lake Storage Gen2	Exploring security features, including enterprise-class security, access control, and encryption.
Data Streams and Event Processing	Explanation of data streams, event processing, and the need for time-based computations.
Processing Events with Azure Stream Analytics	Learning how to process events using Azure Stream Analytics.
Azure Event Hubs	Understanding the role of Azure Event Hubs in managing event streams.
Configuring and Evaluating Azure Event Hubs	Creating and configuring event hubs and evaluating performance using the Azure portal.
Stream Processing	Overview of stream processing and its role in analyzing and deriving insights from data streams.
Azure Stream Analytics	Learning to process streaming data with Azure Stream Analytics, including operational aspects.
Windowing Functionality	Introduction to the five main types of windowing functionality for data aggregation.
Visualizing Aggregated Data in Power BI	How to visualize the results of aggregated data in Power BI at the end of the stream processing pipeline.