DP-900 Objective 2.1: Describe Relational Concepts

 • 35 min read • Microsoft Azure Data Fundamentals

Share:

DP-900 Exam Focus: This objective covers foundational relational database concepts including tables, keys, relationships, and constraints; normalization principles preventing data anomalies through 1NF, 2NF, and 3NF; SQL statement categories including DDL, DML, and DQL with common statements like SELECT, INSERT, UPDATE, DELETE, CREATE, and ALTER; and database objects including tables, views, indexes, stored procedures, and triggers. Understanding these concepts is essential for relational data sections of the exam.

Understanding Relational Databases

Relational databases organize data in tables with rows and columns, establishing relationships between tables through keys. This model, invented by Edgar Codd at IBM in 1970, revolutionized data management by providing mathematical foundation for data organization and querying. The relational approach dominates transactional systems powering e-commerce, banking, healthcare, and countless business applications. Understanding relational concepts enables working effectively with databases like Azure SQL Database, Azure Database for PostgreSQL, Azure Database for MySQL, and on-premises SQL Server. These concepts form the foundation for data management across industries.

Relational databases excel at scenarios requiring data consistency, complex relationships between entities, enforcement of business rules, and support for concurrent transactions with ACID guarantees. The structured nature enables powerful querying through SQL (Structured Query Language), providing declarative interface for data operations. Unlike file-based storage requiring procedural code for data manipulation, SQL allows describing what data you want without specifying how to retrieve it. The database engine optimizes query execution automatically. This abstraction combined with strong consistency guarantees makes relational databases ideal for mission-critical applications where data accuracy and integrity cannot be compromised. Modern cloud platforms like Azure provide managed relational database services handling infrastructure, backups, scaling, and high availability while preserving familiar relational concepts and SQL interfaces.

Features of Relational Data

Tables, Rows, and Columns

Tables are fundamental building blocks of relational databases, storing data in two-dimensional structures. Each table represents an entity type—customers, products, orders, or employees. Tables consist of rows and columns forming a grid. Rows (also called records or tuples) represent individual instances of the entity—one row per customer, product, or order. Columns (also called attributes or fields) represent properties of the entity—customer name, product price, order date. The intersection of a row and column contains a single value. This structure provides intuitive way to organize and visualize data similar to spreadsheets, though with much more sophistication in relationships and constraints.

Each table has a schema defining its structure before any data insertion. The schema specifies column names, data types (integer, text, date, decimal), constraints (NOT NULL, UNIQUE), and other properties. This predefined structure distinguishes relational databases from schema-less NoSQL databases. All rows in a table conform to the same schema, ensuring consistency and enabling efficient storage and querying. Data types enforce appropriate values—numeric columns only accept numbers, date columns only accept dates. This type safety prevents errors and enables optimizations. The schema acts as a contract between database and applications, ensuring predictable data structure. Changes to schemas through ALTER TABLE statements require careful planning as they affect all existing data and dependent applications.

Primary Keys

Primary keys uniquely identify each row in a table, providing fundamental requirement for relational databases. Every table should have a primary key ensuring each row is distinguishable from others. Primary key constraints enforce two rules: uniqueness (no two rows can have the same primary key value) and NOT NULL (primary key columns must always have values). These constraints ensure entity integrity—the guarantee that each row represents a distinct entity. Without primary keys, tables become unordered collections making it impossible to reliably reference specific rows. Database systems typically create indexes on primary keys automatically, enabling fast lookups by primary key values.

Primary keys can be single columns or composite keys combining multiple columns. Simple primary keys use one column like CustomerID—an integer uniquely identifying customers. Composite primary keys use multiple columns when no single column uniquely identifies rows. For example, OrderDetails table linking orders to products might use composite primary key of (OrderID, ProductID) since one order can contain multiple products and one product can appear in multiple orders, but the combination uniquely identifies each line item. Primary keys can be natural (business-meaningful values like ISBN for books or VIN for vehicles) or surrogate (system-generated values without business meaning like auto-increment integers or GUIDs). Surrogate keys often provide more stability since business values might change, while identity remains constant.

Foreign Keys and Relationships

Foreign keys establish relationships between tables, enabling complex data modeling across related entities. A foreign key in one table references the primary key of another table, creating parent-child relationship. For example, Orders table contains CustomerID foreign key referencing Customers table CustomerID primary key. This relationship ensures every order belongs to a valid customer. Foreign keys enforce referential integrity—the guarantee that foreign key values must exist as primary key values in referenced tables (or be NULL if allowed). Attempting to insert order with non-existent CustomerID fails with referential integrity violation. Similarly, attempting to delete customer with existing orders fails unless cascading deletes are configured.

Relationships enable normalization by storing data in separate tables while maintaining connections. One-to-many relationships are most common—one customer has many orders, one category contains many products. The foreign key resides in the "many" side (Orders table contains CustomerID). Many-to-many relationships require junction tables breaking the relationship into two one-to-many relationships. For example, students enroll in multiple courses and courses have multiple students. StudentCourses junction table with (StudentID, CourseID) composite primary key and foreign keys to both Students and Courses tables implements this relationship. One-to-one relationships split data across tables, often for organization or security—storing sensitive employee information separately from basic employee data. Understanding relationships is crucial for database design and writing queries joining related tables.

Constraints and Data Integrity

Constraints enforce data integrity by defining rules that data must satisfy. Primary key constraints ensure uniqueness and non-null values as discussed. Foreign key constraints enforce referential integrity. Additional constraints include UNIQUE constraints preventing duplicate values in columns other than primary keys—email addresses must be unique across users. NOT NULL constraints require columns to always have values—customer names cannot be blank. CHECK constraints validate data against conditions—age must be positive, order quantity must be greater than zero, state codes must match specific values. DEFAULT constraints provide automatic values when none specified—order date defaults to current date, status defaults to "Pending".

Constraints can be column-level or table-level. Column-level constraints apply to individual columns defined inline with column definitions. Table-level constraints can involve multiple columns—composite unique constraint requiring combination of columns to be unique, or check constraint validating relationships between columns like EndDate must be after StartDate. Constraints are enforced at database level, providing centralized data validation rather than relying on application code. This ensures data integrity regardless of which application accesses the database. Constraint violations cause operations to fail with error messages rather than allowing invalid data. While constraints add overhead during data modifications, they prevent data corruption and simplify application development by centralizing business rules in database schema.

Database Normalization

Purpose of Normalization

Database normalization organizes data to minimize redundancy and prevent data anomalies. Redundancy occurs when the same data is stored in multiple locations—customer addresses repeated in every order record. This wastes storage and more critically creates consistency problems. If customer changes address, must all order records be updated? Missed updates create inconsistencies where different records show conflicting information. Normalization eliminates redundancy by storing each fact once in appropriate table with relationships linking related data. This single source of truth ensures consistency—updating customer address in Customers table automatically reflects in all related orders through foreign key relationships.

Normalization prevents three types of data anomalies. Update anomalies occur when changing data requires modifications in multiple locations with risk of missing some updates, leaving inconsistent data. Storing product descriptions in OrderDetails means changing description requires updating every order containing that product. Normalized design stores descriptions in Products table updated once. Insertion anomalies prevent adding data without unrelated information. Storing customer addresses only in Orders means new customers without orders cannot be added. Normalized design with separate Customers table allows adding customers independently of orders. Deletion anomalies occur when removing data loses unrelated information. Deleting the only order for a customer removes customer information entirely. Normalized design preserves customer data after order deletion. These anomalies cause data integrity problems and operational difficulties that normalization prevents through proper table decomposition.

First Normal Form (1NF)

First Normal Form establishes baseline requirements for relational tables. A table is in 1NF if it meets three criteria: atomic values where each column contains single indivisible values rather than lists or arrays; no repeating groups avoiding multiple columns for same attribute type like Phone1, Phone2, Phone3; and each row is uniquely identifiable through primary key. Consider an unnormalized Orders table with CustomerName, ProductList storing comma-separated product names, and Quantities storing corresponding quantities. This violates 1NF because ProductList and Quantities contain multiple values rather than atomic values. Querying specific products requires parsing strings. Adding or removing products requires string manipulation.

Converting to 1NF decomposes this into proper rows. Either create multiple order rows (one per product) or better, create separate OrderDetails table with one row per order line item. Now ProductName and Quantity columns contain single values. Repeating groups like Phone1, Phone2, Phone3 in Customers table violate 1NF. The solution creates PhoneNumbers table with CustomerID foreign key and PhoneNumber column, allowing unlimited phone numbers per customer without schema changes. 1NF provides foundation for higher normal forms and enables proper relational operations. While the principle seems obvious, many systems violate 1NF by storing delimited values in text columns for convenience, sacrificing query capabilities and data integrity. Properly normalized tables always satisfy 1NF as prerequisite for further normalization.

Second Normal Form (2NF)

Second Normal Form builds on 1NF by eliminating partial dependencies. A table is in 2NF if it meets 1NF and every non-key column depends on the entire primary key, not just part of it. This requirement only matters for tables with composite primary keys. Consider OrderDetails table with composite primary key (OrderID, ProductID) storing OrderID, ProductID, Quantity, ProductName, ProductPrice, and OrderDate. Quantity depends on both OrderID and ProductID (quantity ordered of specific product in specific order). However, ProductName and ProductPrice depend only on ProductID—they're product attributes independent of orders. OrderDate depends only on OrderID—it's an order attribute. These partial dependencies violate 2NF.

Converting to 2NF separates partially dependent columns into appropriate tables. Products table with ProductID primary key stores ProductName and ProductPrice. Orders table with OrderID primary key stores OrderDate. OrderDetails retains only columns depending on complete composite key—OrderID, ProductID, and Quantity. Now each table stores attributes logically related to its primary key. This eliminates redundancy—ProductName and ProductPrice stored once per product in Products table rather than repeated in every OrderDetails row for that product. Changes to product information update one location. 2NF ensures tables represent single entity types with attributes genuinely dependent on primary keys. Tables with single-column primary keys automatically satisfy 2NF since partial dependency is impossible without composite keys.

Third Normal Form (3NF)

Third Normal Form extends 2NF by eliminating transitive dependencies where non-key columns depend on other non-key columns. A table is in 3NF if it meets 2NF and every non-key column depends only on the primary key, not on other non-key columns. Consider Customers table with CustomerID primary key, CustomerName, CustomerZipCode, City, and State. City and State depend on ZipCode rather than CustomerID directly—zip codes determine cities and states. This transitive dependency (CustomerID → ZipCode → City/State) violates 3NF. Storing City and State in Customers creates redundancy—every customer in same zip code repeats city and state. Changes to city names require updating multiple customer records.

Converting to 3NF creates separate table for transitively dependent data. ZipCodes table with ZipCode primary key stores City and State. Customers table retains CustomerID, CustomerName, and ZipCode foreign key. Now city and state information is stored once per zip code. Updates to city names affect single ZipCodes row. This eliminates redundancy and update anomalies. Another example: Employees table with DepartmentName and DepartmentLocation where location depends on department. Solution: separate Departments table with DepartmentName primary key and DepartmentLocation, with Employees table containing DepartmentName foreign key. 3NF ensures each table represents single entity with direct dependencies only. Most practical databases target 3NF balancing data integrity with query complexity. Higher normal forms exist (BCNF, 4NF, 5NF) but 3NF satisfies requirements for most applications.

SQL Statements

Data Query Language (DQL)

Data Query Language retrieves data from databases primarily through SELECT statements. SELECT is the most frequently used SQL statement, enabling data retrieval for applications, reports, and analysis. Basic SELECT syntax specifies columns to retrieve, tables to query, and optional filtering, sorting, and grouping. SELECT FirstName, LastName FROM Customers retrieves customer names. SELECT * FROM Products retrieves all columns from Products. The WHERE clause filters results: SELECT * FROM Orders WHERE OrderDate >= '2024-01-01' retrieves orders from 2024. Comparison operators (=, <, >, <=, >=, <>), logical operators (AND, OR, NOT), and pattern matching (LIKE) enable complex filtering.

ORDER BY sorts results: SELECT ProductName, Price FROM Products ORDER BY Price DESC shows products from most to least expensive. GROUP BY aggregates data: SELECT CustomerID, COUNT(*) AS OrderCount FROM Orders GROUP BY CustomerID counts orders per customer. Aggregate functions include COUNT, SUM, AVG, MIN, and MAX. HAVING filters grouped results: GROUP BY CustomerID HAVING COUNT(*) > 5 shows only customers with more than 5 orders. JOINs combine related tables: SELECT Customers.CustomerName, Orders.OrderDate FROM Customers INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID retrieves customers with their order dates. Different join types (INNER, LEFT, RIGHT, FULL OUTER) control which rows appear when relationships don't match. Subqueries nest SELECT statements enabling complex logic. Understanding SELECT and its clauses is fundamental to working with relational databases.

Data Manipulation Language (DML)

Data Manipulation Language modifies data within existing tables through INSERT, UPDATE, and DELETE statements. INSERT adds new rows to tables. Basic syntax specifies table name, columns, and values: INSERT INTO Customers (CustomerName, Email, City) VALUES ('John Smith', 'john@example.com', 'Seattle'). Column order in VALUES must match column list. Omitting columns uses NULL for nullable columns or defaults for columns with default values. Inserting into all columns allows omitting column list: INSERT INTO Products VALUES (101, 'Widget', 29.99, 100) assuming four columns. INSERT can add multiple rows in single statement: INSERT INTO Categories (CategoryName) VALUES ('Electronics'), ('Clothing'), ('Books').

UPDATE modifies existing data. Syntax specifies table, columns to change, and WHERE clause identifying rows: UPDATE Products SET Price = Price * 1.10 WHERE CategoryID = 1 increases electronics prices by 10%. Without WHERE clause, UPDATE affects all rows—use carefully! UPDATE can modify multiple columns: UPDATE Customers SET City = 'Portland', State = 'OR' WHERE CustomerID = 5. DELETE removes rows from tables: DELETE FROM Orders WHERE OrderDate < '2020-01-01' removes old orders. DELETE without WHERE removes all rows—extremely dangerous in production! DML statements participate in transactions enabling ROLLBACK to undo changes before COMMIT. Applications should use parameterized queries for DML preventing SQL injection attacks. Understanding DML is essential for application development maintaining database data.

Data Definition Language (DDL)

Data Definition Language defines and modifies database structure through CREATE, ALTER, and DROP statements. CREATE statements create new database objects. CREATE TABLE Customers (CustomerID INT PRIMARY KEY IDENTITY, CustomerName VARCHAR(100) NOT NULL, Email VARCHAR(100) UNIQUE, City VARCHAR(50)) creates Customers table with auto-incrementing primary key, required name, unique email, and optional city. Column definitions specify data types and constraints. CREATE INDEX creates indexes improving query performance: CREATE INDEX IX_Orders_CustomerID ON Orders(CustomerID). CREATE VIEW creates virtual tables: CREATE VIEW ActiveCustomers AS SELECT * FROM Customers WHERE IsActive = 1.

ALTER statements modify existing objects. ALTER TABLE Products ADD Description VARCHAR(500) adds column to existing table. ALTER TABLE Orders ADD CONSTRAINT FK_Orders_Customers FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) adds foreign key constraint. ALTER can modify column definitions, add or drop constraints, and rename objects. DROP statements delete objects: DROP TABLE TempData removes table and all data. DROP INDEX IX_Orders_CustomerID removes index. DROP VIEW removes views. DDL statements typically auto-commit immediately and cannot be rolled back. They require appropriate permissions and affect database schema globally. Care must be taken with production DDL since schema changes impact applications. Azure SQL Database and other relational databases support standard DDL with minor variations. Understanding DDL enables database design and schema evolution over time.

Common Database Objects

Tables

Tables are fundamental database objects storing data in rows and columns as discussed. Every relational database consists primarily of tables organizing information into logical entities. Table design requires careful consideration of columns, data types, constraints, and relationships. Well-designed tables follow normalization principles, have appropriate primary keys, and use meaningful names. Column data types should match data characteristics—integers for counts, decimals for currency, dates for temporal data, and appropriately sized text types. Constraints enforce data integrity at database level. Indexes on frequently queried columns improve performance. Tables should document their purpose through naming conventions and comments.

Temporary tables store intermediate results during complex operations without persisting permanently. They exist only for current session or transaction. Table partitioning splits large tables into smaller physical pieces improving query performance and management. Partitions divide data by ranges (date ranges), lists (geographic regions), or hash values. Queries automatically access relevant partitions only. In Azure SQL Database, tables support features like row-level security restricting access based on user context, dynamic data masking obscuring sensitive data, and temporal tables maintaining automatic history of changes. Understanding table fundamentals is essential since they form the foundation of relational databases storing virtually all data.

Views

Views are virtual tables defined by SELECT queries presenting data from underlying tables without storing data duplicates. Views serve multiple purposes: simplifying complex queries by encapsulating joins and filtering logic users can query simply; providing security by exposing only specific columns or rows restricting sensitive data access; presenting data in different formats or structures without changing base tables; and maintaining abstraction allowing underlying schema changes without impacting applications if view definitions update accordingly. Creating a view: CREATE VIEW CustomerOrders AS SELECT c.CustomerName, o.OrderDate, o.TotalAmount FROM Customers c INNER JOIN Orders o ON c.CustomerID = o.CustomerID. Users query views like tables: SELECT * FROM CustomerOrders WHERE OrderDate > '2024-01-01'.

Views always reflect current underlying data since they execute stored queries rather than caching results. This ensures consistency but means complex views impact query performance. Some databases support materialized views caching query results and refreshing periodically for performance. Indexed views in SQL Server store results physically improving performance for complex aggregations. Views can be updateable allowing INSERT, UPDATE, and DELETE operations if certain conditions are met—single base table, no aggregations, and other restrictions. Most views serve read-only purposes. Security views restrict data: CREATE VIEW PublicCustomers AS SELECT CustomerName, City FROM Customers WHERE ConsentToPublish = 1 exposes only customers who consented. Views provide powerful abstraction mechanism central to database design.

Indexes

Indexes are data structures improving query performance by creating sorted lookup structures enabling fast data location without scanning entire tables. Similar to book indexes helping find topics quickly, database indexes help find rows matching criteria efficiently. Without indexes, queries perform table scans reading every row sequentially—acceptable for small tables but prohibitive for millions of rows. Indexes store sorted copies of column values with pointers to corresponding rows. Searching sorted structures is dramatically faster through binary search or B-tree traversal. Primary keys automatically get indexes. Additional indexes are created on columns frequently used in WHERE clauses, JOIN conditions, or ORDER BY clauses.

Creating an index: CREATE INDEX IX_Orders_OrderDate ON Orders(OrderDate) enables fast queries filtering or sorting by OrderDate. Indexes have trade-offs: they accelerate queries but slow INSERT, UPDATE, and DELETE operations since indexes must be maintained. Indexes consume storage. Too many indexes degrade write performance more than they help reads. Index strategy requires balance based on query patterns. Composite indexes on multiple columns benefit queries filtering or sorting by those columns together. Covering indexes include all columns needed by queries avoiding table access entirely. Azure SQL Database provides automatic index tuning recommending and optionally creating indexes based on workload patterns. Understanding indexes is crucial for database performance optimization.

Stored Procedures

Stored procedures are saved SQL code executing as units on database servers. They accept input parameters, perform operations including queries and data modifications, contain logic with conditionals and loops, and return results or output parameters. Benefits include performance through precompilation and execution plan caching; security by granting execute permissions without direct table access; centralization of business logic ensuring consistent operations across applications; reduced network traffic sending procedure name and parameters rather than entire SQL statements; and transaction management encapsulating complex operations in atomic units. Creating a stored procedure: CREATE PROCEDURE GetCustomerOrders @CustomerID INT AS BEGIN SELECT * FROM Orders WHERE CustomerID = @CustomerID END. Applications execute: EXEC GetCustomerOrders @CustomerID = 5.

Stored procedures can include complex logic with IF statements, WHILE loops, error handling with TRY-CATCH blocks, and transaction control. They encapsulate business rules—validating order totals, applying discounts, updating inventory—ensuring consistent implementation regardless of calling application. Parameters enable flexibility: CREATE PROCEDURE SearchProducts @Category VARCHAR(50), @MinPrice DECIMAL(10,2), @MaxPrice DECIMAL(10,2) accepts search criteria. OUTPUT parameters return computed values. Stored procedures support overloading in some databases. While stored procedures provide significant benefits, they also create maintenance considerations—logic distributed between applications and databases requires coordinated updates. Some development philosophies prefer application logic over stored procedures. Regardless, understanding stored procedures is essential for working with relational databases.

Triggers

Triggers are special stored procedures executing automatically in response to specific database events like INSERT, UPDATE, or DELETE operations on tables. Unlike regular stored procedures called explicitly, triggers fire implicitly when triggering events occur. Triggers serve various purposes: enforcing complex business rules beyond constraint capabilities; maintaining audit trails logging data changes; implementing complex referential integrity cascading changes across tables; synchronizing related data ensuring consistency; and preventing or modifying operations based on conditions. Triggers operate behind the scenes providing powerful but potentially dangerous automation requiring careful design and testing.

Two trigger types exist: AFTER triggers executing after triggering operations complete, accessing inserted and deleted pseudo-tables containing affected rows; and INSTEAD OF triggers replacing triggering operations entirely. Example AFTER trigger logging deletions: CREATE TRIGGER TR_Customers_Delete ON Customers AFTER DELETE AS BEGIN INSERT INTO CustomerAudit (CustomerID, DeletedDate) SELECT CustomerID, GETDATE() FROM deleted END. INSTEAD OF triggers enable updateable views by intercepting operations and executing custom logic. Triggers add overhead since they execute with every triggering operation. Poorly designed triggers cause performance problems or infinite loops. Multiple triggers on same table execute in undefined order creating unpredictability. Despite concerns, triggers provide valuable automation for specific scenarios. Azure SQL Database fully supports triggers enabling sophisticated data management logic.

Real-World Relational Database Scenarios

Scenario 1: E-Commerce Order Management

Business Requirement: Online retailer needs database managing customers, products, and orders.

Database Design:

  • Customers table: CustomerID (PK), CustomerName, Email (UNIQUE), Address, City, State, ZipCode. Stores customer information once referenced by orders.
  • Products table: ProductID (PK), ProductName, Description, Price, CategoryID (FK). Stores product information with category reference.
  • Categories table: CategoryID (PK), CategoryName. Organizes products into categories.
  • Orders table: OrderID (PK), CustomerID (FK), OrderDate, TotalAmount, Status. Stores order headers with customer reference.
  • OrderDetails table: OrderID (FK), ProductID (FK), Quantity, UnitPrice. Junction table storing order line items with composite primary key (OrderID, ProductID).

Normalization: Database is in 3NF—no repeating groups, no partial dependencies, no transitive dependencies. Customer data stored once in Customers. Product information in Products. Order details separate from order headers.

SQL Examples:

  • Query orders for customer: SELECT * FROM Orders WHERE CustomerID = 5 ORDER BY OrderDate DESC
  • Order details with products: SELECT p.ProductName, od.Quantity, od.UnitPrice FROM OrderDetails od JOIN Products p ON od.ProductID = p.ProductID WHERE od.OrderID = 100
  • Customer order history: View joining Customers, Orders, OrderDetails, Products providing complete order information without exposing schema complexity.

Outcome: Normalized design eliminates redundancy, ensures data consistency through foreign keys, and enables complex queries through relationships.

Scenario 2: Healthcare Patient Records

Business Requirement: Hospital needs database tracking patients, appointments, treatments, and prescriptions.

Database Design:

  • Patients table: PatientID (PK), FirstName, LastName, DateOfBirth, SSN (UNIQUE, encrypted), Address. Stores patient demographics.
  • Doctors table: DoctorID (PK), FirstName, LastName, Specialty, LicenseNumber. Stores physician information.
  • Appointments table: AppointmentID (PK), PatientID (FK), DoctorID (FK), AppointmentDate, Status, Notes. Tracks scheduled appointments.
  • Treatments table: TreatmentID (PK), AppointmentID (FK), TreatmentType, Description, Cost. Records treatments performed.
  • Prescriptions table: PrescriptionID (PK), AppointmentID (FK), Medication, Dosage, Frequency. Stores prescribed medications.

Security: Row-level security restricts doctors to their patients only. Views expose de-identified data for research. Audit triggers log all access to sensitive patient information. Encryption protects SSN and diagnosis columns.

Stored Procedures: ScheduleAppointment validates availability before booking. PrescribeMedication checks drug interactions. Centralized logic ensures consistent validation across applications.

Outcome: Relational design maintains patient data integrity, enforces security policies, and enables comprehensive patient history tracking while complying with healthcare regulations.

Scenario 3: University Course Registration

Business Requirement: University needs database managing students, courses, and enrollments.

Database Design:

  • Students table: StudentID (PK), FirstName, LastName, Email (UNIQUE), Major, EnrollmentYear. Stores student information.
  • Courses table: CourseID (PK), CourseName, Credits, DepartmentID (FK), MaxEnrollment. Stores course definitions.
  • Departments table: DepartmentID (PK), DepartmentName, Building. Organizes courses by department.
  • Enrollments table: EnrollmentID (PK), StudentID (FK), CourseID (FK), Semester, Grade. Junction table implementing many-to-many relationship—students enroll in multiple courses, courses have multiple students.
  • Prerequisites table: CourseID (FK), PrerequisiteCourseID (FK). Self-referencing table defining course prerequisites.

Constraints: UNIQUE constraint on (StudentID, CourseID, Semester) prevents duplicate enrollments. CHECK constraint ensures Grade in valid range. Trigger validates student hasn't exceeded credit limits before enrollment.

Views: StudentTranscripts view shows grades, GPAs, and credits for each student. CourseRoster view lists enrolled students per course.

Outcome: Normalized design with junction tables models complex academic relationships, constraints enforce business rules, and views simplify reporting.

Exam Preparation Tips

Key Concepts to Master

  • Relational features: Tables, rows, columns, schemas, primary keys, foreign keys, relationships, constraints
  • Normalization: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies)
  • Normalization benefits: Eliminates redundancy, prevents update/insertion/deletion anomalies, ensures consistency
  • SQL categories: DDL (define structure), DML (modify data), DQL (query data)
  • Common SQL statements: SELECT, INSERT, UPDATE, DELETE, CREATE, ALTER, DROP
  • Database objects: Tables, views, indexes, stored procedures, triggers, constraints
  • Primary keys: Uniquely identify rows, NOT NULL, enable relationships
  • Foreign keys: Reference primary keys, enforce referential integrity, establish relationships

Practice Questions

Sample DP-900 Exam Questions:

  1. Question: What is the primary purpose of database normalization?
    • A) Improve query performance
    • B) Eliminate data redundancy and prevent anomalies
    • C) Increase storage capacity
    • D) Simplify SQL queries

    Answer: B) Eliminate data redundancy and prevent anomalies - Normalization organizes data to reduce redundancy and prevent update, insertion, and deletion anomalies.

  2. Question: Which SQL statement category is used to create tables?
    • A) DQL
    • B) DML
    • C) DDL
    • D) DCL

    Answer: C) DDL - Data Definition Language (DDL) includes CREATE, ALTER, and DROP statements for defining database structure.

  3. Question: What database constraint ensures that a column or combination of columns uniquely identifies each row?
    • A) Foreign key
    • B) Check constraint
    • C) Primary key
    • D) Unique constraint

    Answer: C) Primary key - Primary key constraints ensure each row is uniquely identifiable and cannot be NULL.

  4. Question: Which database object executes automatically in response to INSERT, UPDATE, or DELETE operations?
    • A) View
    • B) Index
    • C) Stored procedure
    • D) Trigger

    Answer: D) Trigger - Triggers are special stored procedures that execute automatically when specific database events occur.

  5. Question: What is First Normal Form (1NF)?
    • A) Eliminating transitive dependencies
    • B) Ensuring atomic values and no repeating groups
    • C) Removing partial dependencies
    • D) Creating foreign key relationships

    Answer: B) Ensuring atomic values and no repeating groups - 1NF requires each column to contain single values and eliminates repeating column groups.

  6. Question: Which SQL statement retrieves data from a database?
    • A) INSERT
    • B) UPDATE
    • C) SELECT
    • D) DELETE

    Answer: C) SELECT - SELECT is the primary DQL statement for querying and retrieving data.

  7. Question: What database object improves query performance by creating sorted lookup structures?
    • A) View
    • B) Trigger
    • C) Stored procedure
    • D) Index

    Answer: D) Index - Indexes create sorted data structures enabling fast data location without full table scans.

  8. Question: What type of key establishes relationships between tables?
    • A) Primary key
    • B) Composite key
    • C) Candidate key
    • D) Foreign key

    Answer: D) Foreign key - Foreign keys reference primary keys in other tables, establishing relationships and enforcing referential integrity.

DP-900 Success Tip: Remember relational databases organize data in tables with rows and columns, use primary keys for unique identification and foreign keys for relationships. Normalization eliminates redundancy through 1NF (atomic values), 2NF (no partial dependencies), and 3NF (no transitive dependencies). SQL has three main categories: DDL for structure (CREATE, ALTER, DROP), DML for data (INSERT, UPDATE, DELETE), and DQL for queries (SELECT). Key database objects include tables storing data, views as virtual tables, indexes improving performance, stored procedures encapsulating logic, and triggers executing automatically.

Hands-On Practice Lab

Lab Objective

Practice relational database concepts by designing normalized schemas, writing SQL statements, creating database objects, and understanding relationships using Azure SQL Database, local SQL Server, or online SQL sandboxes.

Lab Activities

Activity 1: Design Normalized Schema

  • Choose scenario: Select business case like library (books, authors, members, loans) or school (students, teachers, courses, grades)
  • Identify entities: List main entities and their attributes
  • Apply normalization: Ensure 1NF with atomic values, 2NF eliminating partial dependencies, 3NF removing transitive dependencies
  • Define relationships: Identify one-to-many and many-to-many relationships, create junction tables as needed
  • Specify keys: Define primary keys for each table, foreign keys for relationships
  • Document design: Draw entity-relationship diagram showing tables, columns, keys, and relationships

Activity 2: Create Tables with DDL

  • Access database: Connect to Azure SQL Database, local SQL Server, or online sandbox
  • Write CREATE statements: Create tables from normalized design specifying columns, data types, and constraints
  • Define primary keys: Add PRIMARY KEY constraints with IDENTITY for auto-increment
  • Add foreign keys: Create FOREIGN KEY constraints establishing relationships
  • Include constraints: Add NOT NULL, UNIQUE, CHECK, and DEFAULT constraints enforcing rules
  • Execute and verify: Run CREATE statements, query system views confirming table structures and constraints

Activity 3: Insert and Query Data

  • Insert parent records: Use INSERT statements adding data to tables without foreign keys first (categories, departments)
  • Insert child records: Add data to tables with foreign keys referencing parent records
  • Test constraints: Try inserting invalid data (duplicate primary keys, non-existent foreign keys) observing constraint violations
  • Write SELECT queries: Query single tables with WHERE, ORDER BY, and aggregate functions
  • Join related tables: Write queries joining multiple tables through foreign key relationships
  • Create complex queries: Use GROUP BY, HAVING, subqueries demonstrating SQL capabilities

Activity 4: Update and Delete Data

  • Update records: Write UPDATE statements modifying data with WHERE clauses targeting specific rows
  • Delete records: Use DELETE statements removing data, observing foreign key constraints preventing orphaned records
  • Test cascading: If supported, configure CASCADE on foreign keys testing automatic deletion of related records
  • Use transactions: Wrap DML in BEGIN TRANSACTION, practice COMMIT and ROLLBACK
  • Verify changes: Query data confirming modifications successful

Activity 5: Create Database Objects

  • Create view: Write CREATE VIEW statement joining related tables, query view like table
  • Create index: Add index on frequently queried column, compare query performance before and after
  • Create stored procedure: Write procedure accepting parameters, performing query or data modification, execute with different parameters
  • Create trigger (optional): If supported, create simple trigger logging data changes to audit table
  • Document objects: Note purpose and usage of each created object

Activity 6: Analyze Normalization

  • Start with denormalized table: Create single table with redundant data (customer info repeated in orders)
  • Identify problems: Note redundancy, update anomalies, insertion anomalies, deletion anomalies
  • Apply 1NF: Ensure atomic values, eliminate repeating groups
  • Apply 2NF: Remove partial dependencies creating separate tables
  • Apply 3NF: Eliminate transitive dependencies
  • Compare designs: Document benefits of normalized design over denormalized

Lab Outcomes

After completing this lab, you'll understand relational database fundamentals including tables, keys, relationships, and constraints. You'll be able to design normalized schemas preventing data anomalies and redundancy. You'll have hands-on experience writing DDL creating database structure, DML modifying data, and DQL querying across related tables. You'll understand database objects including views, indexes, stored procedures, and triggers. This knowledge demonstrates relational concepts tested in DP-900 exam and provides foundation for working with Azure SQL Database and other relational databases.

Frequently Asked Questions

What are the key features of relational data?

Relational data organizes information in tables with rows and columns following structured schemas. Key features include tables (relations) storing data in two-dimensional structures with rows representing records and columns representing attributes; schemas defining table structures with column names, data types, and constraints before data insertion; primary keys uniquely identifying each row ensuring no duplicates; foreign keys establishing relationships between tables by referencing primary keys in related tables; relationships connecting tables through foreign keys enabling complex data modeling with one-to-many, many-to-many, and one-to-one relationships; constraints enforcing data integrity rules like NOT NULL requiring values, UNIQUE preventing duplicates, CHECK validating conditions, and referential integrity ensuring foreign keys reference existing records; ACID properties guaranteeing transaction reliability through atomicity, consistency, isolation, and durability; and structured query language (SQL) providing declarative interface for data manipulation. These features make relational databases ideal for transactional systems requiring data consistency, complex querying across related entities, and enforcement of business rules through constraints. Relational model invented by Edgar Codd in 1970 remains foundational for modern database systems including Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL.

What is database normalization and why is it important?

Database normalization organizes data into related tables eliminating redundancy and preventing data anomalies. The process decomposes tables with duplicate information into multiple tables storing each fact once with relationships through foreign keys. Normalization is important because it eliminates data redundancy reducing storage requirements and preventing inconsistencies from duplicate data stored in multiple locations; prevents update anomalies where changing data requires updating multiple locations with risk of missing some; prevents insertion anomalies where adding records requires entering unrelated data; prevents deletion anomalies where removing records loses unrelated information; improves data consistency ensuring single source of truth for each fact; simplifies maintenance by updating data in one location; and enforces referential integrity through foreign key relationships. Normalization forms include First Normal Form (1NF) eliminating repeating groups and ensuring atomic values; Second Normal Form (2NF) removing partial dependencies where non-key columns depend on only part of composite keys; Third Normal Form (3NF) eliminating transitive dependencies where non-key columns depend on other non-key columns. Most relational databases target 3NF balancing data integrity with query complexity. While normalization benefits transactional consistency, analytical systems sometimes denormalize for query performance.

What are the main categories of SQL statements?

SQL (Structured Query Language) statements fall into three main categories serving different purposes. Data Query Language (DQL) retrieves data from databases primarily through SELECT statements specifying columns, tables, filtering conditions with WHERE clauses, sorting with ORDER BY, grouping with GROUP BY, and joining related tables. Data Manipulation Language (DML) modifies data including INSERT adding new rows, UPDATE changing existing data, and DELETE removing rows. These statements operate on data within existing tables. Data Definition Language (DDL) defines database structure including CREATE statements creating new objects like tables, views, or indexes; ALTER statements modifying existing object structures like adding columns; and DROP statements deleting objects. Additional categories include Data Control Language (DCL) managing permissions through GRANT and REVOKE statements, and Transaction Control Language (TCL) managing transactions through COMMIT saving changes, ROLLBACK undoing changes, and SAVEPOINT creating transaction checkpoints. Understanding SQL categories helps organize learning and recognize statement purposes. Azure SQL Database, Azure Database for PostgreSQL, and Azure Database for MySQL all support SQL with minor syntax variations. Power BI and Azure Synapse Analytics also leverage SQL for querying data enabling consistent interface across Azure data services.

What are the most common database objects in relational databases?

Relational databases contain various objects organizing and managing data. Tables are fundamental objects storing data in rows and columns with defined schemas including column names, data types, and constraints. Views are virtual tables defined by queries presenting data from one or more tables, simplifying complex queries, providing security by exposing only specific columns, and presenting data in different formats without storing duplicates. Indexes are structures improving query performance by creating sorted data structures enabling fast lookups similar to book indexes, though they increase storage and slow insertions. Stored procedures are saved SQL code executing as units, accepting parameters, containing logic with conditionals and loops, and improving performance through precompilation. Triggers are special stored procedures executing automatically in response to events like insertions, updates, or deletions, enforcing business rules, maintaining audit trails, or cascading changes. Constraints enforce data integrity including primary keys, foreign keys, unique constraints, check constraints, and default values. Schemas are logical containers organizing database objects, managing permissions, and separating concerns in multi-tenant applications. Sequences generate unique numeric values for primary keys. Understanding these objects enables effective database design and interaction. Azure SQL Database supports all standard objects while adding cloud-specific features like elastic pools and automatic tuning.

What is a primary key and why is it important?

A primary key is a column or combination of columns uniquely identifying each row in a table. Primary keys are crucial because they ensure entity integrity guaranteeing each row is uniquely identifiable preventing duplicate records; enable row identification providing reliable way to reference specific records; serve as foreign key targets allowing other tables to establish relationships; improve query performance through automatic index creation on primary key columns enabling fast lookups; and provide logical organization giving tables clear identity columns. Primary key constraints enforce two rules: uniqueness ensuring no two rows have same primary key value, and NOT NULL requiring primary key columns always have values. Primary keys can be natural (meaningful business values like social security numbers or product codes) or surrogate (system-generated values like identity columns or GUIDs without business meaning). Surrogate keys often prove more stable since business values might change. Most tables should have primary keys as foundation for data integrity and relationships. Composite primary keys use multiple columns useful when no single column uniquely identifies rows, though single-column primary keys generally simplify database design. Every table in normalized relational database should have primary key ensuring reliable identification and relationship foundation.

What is a foreign key and how does it establish relationships?

A foreign key is a column or columns in one table referencing the primary key of another table, establishing relationships between tables. Foreign keys enable relational databases to model real-world relationships through referential integrity ensuring foreign key values must exist as primary key values in referenced tables or be NULL. For example, Orders table might have CustomerID foreign key referencing Customers table CustomerID primary key, ensuring orders only reference existing customers. Foreign keys enforce data consistency preventing orphaned records where child records reference non-existent parent records, cascade operations optionally propagating changes like deleting parent records also deleting related child records, and logical data modeling representing entities and their relationships. Common relationship types include one-to-many where one parent relates to multiple children (one customer has many orders), many-to-many requiring junction tables breaking into two one-to-many relationships (students enroll in multiple courses, courses have multiple students), and one-to-one where tables split for organization or security. Foreign keys are fundamental to normalization enabling data storage in separate tables while maintaining relationships. Database management systems enforce foreign key constraints validating data modifications, rejecting changes violating referential integrity. Understanding foreign keys is essential for relational database design and querying related data through joins.

What are the different normal forms in database normalization?

Database normalization progresses through normal forms, each addressing specific types of data anomalies. First Normal Form (1NF) requires atomic values where each column contains single indivisible values, not lists or arrays; elimination of repeating groups avoiding columns like Phone1, Phone2, Phone3 in favor of separate PhoneNumbers table; and each row uniquely identifiable through primary key. Second Normal Form (2NF) requires meeting 1NF and eliminating partial dependencies where non-key columns depend on entire primary key, not just part of composite keys. This typically splits tables with composite keys separating attributes depending on different key portions. Third Normal Form (3NF) requires meeting 2NF and eliminating transitive dependencies where non-key columns depend on other non-key columns rather than primary key directly. For example, storing both CustomerCity and CustomerState where State depends on City creates transitive dependency resolved by moving to separate table. Boyce-Codd Normal Form (BCNF) strengthens 3NF handling specific edge cases. Fourth Normal Form (4NF) addresses multi-valued dependencies. Fifth Normal Form (5NF) handles join dependencies. Most practical databases target 3NF balancing data integrity with query complexity and performance. Higher normal forms provide additional guarantees but increase complexity. Understanding normalization principles enables designing databases minimizing redundancy and anomalies while maintaining data integrity.

What is the difference between DDL, DML, and DQL in SQL?

SQL statements categorize by their purposes and effects on databases. Data Definition Language (DDL) defines and modifies database structure including CREATE statements for creating tables, views, indexes, or schemas; ALTER statements for modifying existing structures like adding columns or changing data types; and DROP statements for deleting objects. DDL changes affect database schema and typically auto-commit immediately. Data Manipulation Language (DML) modifies data within tables including INSERT statements for adding new rows; UPDATE statements for changing existing data; and DELETE statements for removing rows. DML operates on data content without changing structure and usually participates in transactions allowing rollback. Data Query Language (DQL) retrieves data primarily through SELECT statements specifying what data to return, from which tables, with what filters, in what order, and how to group and aggregate. DQL doesn't modify data or structure, only reads. The distinction helps understand statement purposes and their impact. DDL requires elevated privileges and affects all users. DML affects specific data and uses transaction control. DQL has lowest impact enabling read-only access. Azure SQL Database and other relational databases implement all categories with minor syntax variations. Understanding categories helps organize SQL learning, set appropriate permissions, and predict statement effects on database systems and applications.

Share:

Written by Joe De Coppi - Last Updated November 14, 2025