DP-900 Objective 3.2: Describe Capabilities and Features of Azure Cosmos DB

 • 37 min read • Microsoft Azure Data Fundamentals

Share:

DP-900 Exam Focus: This objective covers Azure Cosmos DB as Microsoft's globally distributed, multi-model NoSQL database including use cases for global distribution, low latency, IoT, and real-time analytics; and the five primary APIs—NoSQL (native document database), MongoDB (MongoDB compatibility), Cassandra (wide-column), Gremlin (graph database), and Table (key-value compatible with Azure Table storage). Understanding when to use Cosmos DB and choosing appropriate APIs is essential for the exam.

Understanding Azure Cosmos DB

Azure Cosmos DB is Microsoft's globally distributed, multi-model NoSQL database service designed for mission-critical applications requiring predictable performance, high availability, and elastic scalability at global scale. Unlike traditional databases designed for single-region deployments, Cosmos DB built from ground up for global distribution, enabling data replication across any number of Azure regions with turnkey configuration. The service guarantees single-digit millisecond latencies for reads and writes at 99th percentile, supports unlimited elastic scalability of throughput and storage, and provides comprehensive SLAs covering availability (up to 99.999%), latency, throughput, and consistency—unique among database services. Cosmos DB is fully managed Platform-as-a-Service (PaaS) eliminating operational concerns like patching, scaling, replication, and failover.

What makes Cosmos DB unique is its multi-model support through different API endpoints. Instead of forcing applications into single data model, Cosmos DB supports documents (NoSQL API), key-value pairs (Table API), graph structures (Gremlin API), column-families (Cassandra API), and MongoDB compatibility (MongoDB API). This flexibility enables choosing appropriate data model for specific scenarios or migrating existing applications built with MongoDB, Cassandra, or Azure Table storage to Cosmos DB with minimal code changes while gaining global distribution, guaranteed performance, and enterprise SLAs. Cosmos DB suits internet-scale applications, IoT and telemetry platforms, retail and personalization engines, gaming, real-time analytics, and any scenario requiring global reach, predictable low latency, or flexible schemas unsuitable for rigid relational models. Understanding Cosmos DB capabilities and choosing appropriate APIs enables leveraging its strengths for modern distributed applications.

Core Capabilities and Features

Global Distribution

Azure Cosmos DB provides turnkey global distribution enabling data replication across any Azure regions worldwide with single-click or API call. Unlike traditional databases requiring complex replication setups, Cosmos DB handles distribution transparently. You select Azure regions for database presence, and Cosmos DB automatically replicates data, handles failover, and maintains consistency according to configured level. This global distribution serves several purposes: reducing latency by placing data near users—European users access European replica while Asian users access Asian replica, both reading local data with single-digit millisecond latency; ensuring high availability through redundancy across regions surviving regional outages; enabling multi-region writes allowing applications to write to nearest region rather than routing all writes to single primary region; and supporting disaster recovery with automatic failover to healthy regions when issues occur.

Global distribution configuration is dynamic—add or remove regions anytime without application downtime. Cosmos DB handles data distribution, rebalancing, and synchronization automatically. For read-heavy workloads, multi-region reads distribute query load across regions. For write-heavy workloads or applications requiring local writes, multi-region writes enable writing to any region with conflict resolution policies handling concurrent modifications. Use cases benefiting from global distribution include social networks serving users worldwide, multiplayer games with players across continents, e-commerce platforms with international customers, and IoT applications with devices globally distributed. The ability to transparently distribute data while maintaining consistency and performance makes Cosmos DB ideal for truly global applications where data locality significantly impacts user experience.

Performance and Scalability

Azure Cosmos DB guarantees single-digit millisecond latency for reads (under 10ms) and writes (under 15ms) at 99th percentile backed by SLAs. This predictable low latency enables responsive user experiences regardless of data volume or geographic location. Throughput scales elastically from hundreds to millions of requests per second per database or container. You provision throughput in Request Units per second (RU/s) representing normalized measure of compute, memory, and I/O required for operations. Cosmos DB automatically distributes throughput across partitions enabling horizontal scaling. Autoscale feature automatically adjusts provisioned throughput based on workload patterns, scaling up during peak usage and down during quiet periods, optimizing costs while maintaining performance.

Storage scales unlimited with Cosmos DB automatically adding partitions as data grows. Unlike databases with fixed capacity limits, Cosmos DB handles petabytes of data seamlessly. Partition key choice critically impacts performance and scalability—good partition keys distribute data and operations evenly across partitions preventing hot partitions. Serverless pricing model suits intermittent or unpredictable workloads, charging only for consumed resources without provisioning throughput upfront. This flexibility enables right-sizing costs for different workload patterns. The combination of guaranteed latency, elastic throughput scaling, and unlimited storage makes Cosmos DB suitable for both small applications and internet-scale services handling millions of users. Performance guarantees backed by SLAs provide confidence for production deployments where performance directly impacts business outcomes.

Consistency Models

Azure Cosmos DB provides five well-defined consistency levels enabling fine-grained tradeoff between consistency, availability, latency, and throughput. This choice addresses fundamental distributed systems challenge: CAP theorem states distributed systems can provide at most two of Consistency, Availability, and Partition tolerance. Cosmos DB's consistency levels offer spectrum between strong consistency (guaranteeing linearizability but with higher latency and lower availability) and eventual consistency (maximizing availability and performance but allowing temporary inconsistencies). The five levels are: Strong consistency guarantees linearizability where reads return most recent committed write, appearing as if single copy exists—suitable for financial transactions or inventory management where consistency cannot be compromised. Bounded staleness ensures reads lag behind writes by at most K versions or T time interval, providing consistency guarantee with better performance than strong—suitable for scenarios tolerating bounded staleness like scoreboards or social media feeds.

Session consistency guarantees consistency within client session meaning client sees its own writes (read-your-writes consistency), monotonic reads, and monotonic writes—suitable for most applications where users see their changes immediately. Consistent prefix ensures reads never see out-of-order writes, maintaining write ordering but without guarantees on lag—suitable for social media updates where order matters but immediacy doesn't. Eventual consistency provides no ordering guarantees and eventual convergence, offering highest availability and lowest latency—suitable for view counts, likes, or non-critical data. Choosing appropriate consistency level depends on application requirements. Most applications use session consistency balancing consistency guarantees with performance. Strong consistency suits scenarios requiring absolute consistency. Eventual consistency suits read-heavy workloads with non-critical data. The ability to tune consistency per operation enables different consistency levels for different data within same database, optimizing for specific requirements.

Indexing and Querying

Azure Cosmos DB automatically indexes all data without requiring schema or index definitions. Unlike traditional databases requiring explicit index creation and management, Cosmos DB uses automatic indexing policy analyzing access patterns and optimizing indexes. This eliminates index tuning and maintenance while ensuring query performance. Indexing policies are customizable for advanced scenarios—include or exclude specific paths, configure spatial indexes for geospatial queries, or optimize for write-heavy vs read-heavy workloads. The automatic indexing combined with schema-agnostic design enables flexible data models where documents in same container have different structures, accommodating evolving schemas without migrations.

Query capabilities vary by API. NoSQL API supports SQL-like queries with SELECT, WHERE, JOIN, ORDER BY enabling rich filtering, sorting, projections, and aggregations. Queries support JavaScript user-defined functions for custom logic. MongoDB API supports MongoDB query language and aggregation pipeline. Cassandra API uses CQL (Cassandra Query Language). Gremlin API uses graph traversal language for relationship-based queries. Table API supports simple filtering on partition and row keys. Change feed provides ordered log of all changes enabling reactive programming—applications subscribe to changes implementing event-driven architectures, materialized views, or replication to other systems. Server-side programming through stored procedures, triggers, and user-defined functions (available in NoSQL API) enables executing logic close to data reducing network overhead and ensuring transactional consistency within partitions.

Azure Cosmos DB APIs

NoSQL API (Native API)

Azure Cosmos DB NoSQL API (formerly called SQL API) is the native core API providing document database functionality with JSON documents and SQL-like query syntax. It's the most feature-rich API and recommended starting point for new applications. NoSQL API stores data as JSON documents in containers (analogous to tables) within databases. Documents have flexible schemas—documents in same container can have completely different structures enabling schema evolution without migrations. The SQL-like query language enables familiar querying: SELECT * FROM products p WHERE p.category = 'Electronics' ORDER BY p.price DESC. This syntax supports complex filters, joins within documents, projections selecting specific fields, aggregations calculating sums or averages, and user-defined functions.

Advanced features include server-side programming through stored procedures, triggers, and user-defined functions written in JavaScript executing close to data ensuring ACID transactions within partition and reducing network roundtrips. Change feed captures all modifications enabling reactive programming, event sourcing, or materialized views. Conflict resolution policies handle multi-region write conflicts. NoSQL API benefits from all Cosmos DB capabilities including global distribution, tunable consistency, automatic indexing, and elastic scaling. Use cases include modern application backends requiring JSON data storage, content management systems, user profile management, product catalogs with varying attributes, and scenarios needing flexible schemas and rich querying. The API provides SDKs for .NET, Java, Python, Node.js, and other languages simplifying development. NoSQL API suits applications without specific compatibility requirements, enabling full leverage of Cosmos DB's native capabilities.

MongoDB API

Azure Cosmos DB MongoDB API provides wire protocol compatibility with MongoDB enabling MongoDB applications, drivers, and tools to work with Cosmos DB with minimal or no code changes. This API suits organizations with MongoDB expertise, existing MongoDB applications, or development teams preferring MongoDB's document model and query language. MongoDB API supports MongoDB query language, aggregation pipeline, indexes, and operators. It's compatible with MongoDB drivers in various programming languages—applications connect using standard MongoDB connection strings pointing to Cosmos DB endpoint. The API supports multiple MongoDB versions (currently 3.6, 4.0, and 4.2) providing flexibility for different application requirements.

Key benefits over running MongoDB yourself include fully managed service eliminating cluster management, automatic scaling handling throughput and storage, global distribution replicating data across regions (not available in standard MongoDB), enterprise SLAs for availability and performance, built-in security and compliance, and seamless Azure ecosystem integration. Use cases include migrating existing MongoDB applications to Azure without code rewrites, new applications leveraging MongoDB expertise, scenarios requiring MongoDB compatibility with cloud benefits, and development teams skilled in MongoDB seeking managed service. The MongoDB API translates MongoDB wire protocol to Cosmos DB's underlying engine, meaning applications use familiar MongoDB SDKs and syntax while benefiting from Cosmos DB's global distribution, consistency models, and guaranteed performance. Organizations choosing between MongoDB Atlas and Cosmos DB MongoDB API consider Azure integration, specific feature requirements, global distribution needs, and cost comparisons.

Cassandra API

Azure Cosmos DB Cassandra API provides wire protocol compatibility with Apache Cassandra enabling Cassandra applications and tools to work with Cosmos DB. This API suits organizations with Cassandra expertise or applications using Cassandra's wide-column data model. Cassandra API supports Cassandra Query Language (CQL), Cassandra data types, and keyspace/table concepts. It's compatible with Cassandra drivers enabling existing applications to connect with minimal changes. The column-family data model organizes data in tables with rows identified by primary keys and columns grouped into column families. This model excels at time-series data, IoT sensor readings, and scenarios requiring high write throughput with efficient range queries on time-based data.

Benefits over self-managed Cassandra include eliminated operational overhead (no node management, compaction, repairs), automatic scaling, global distribution with multi-region writes, comprehensive SLAs, and Azure integration. Use cases include migrating Cassandra applications to Azure, IoT and telemetry using Cassandra's data model for time-series storage, applications requiring wide-column storage, and scenarios needing Cassandra compatibility with managed convenience. Cassandra API particularly suits write-heavy workloads with time-series data where Cassandra's log-structured merge tree storage engine and wide-column model excel. Organizations with existing Cassandra investments or applications architected for Cassandra's eventual consistency and partition-tolerance model benefit from migrating to Cosmos DB Cassandra API gaining managed service benefits while preserving application compatibility. The API provides familiar Cassandra interface while leveraging Cosmos DB's infrastructure for scaling, distribution, and reliability.

Gremlin API (Graph Database)

Azure Cosmos DB Gremlin API provides graph database capabilities using Apache TinkerPop Gremlin graph query language. Graph databases excel at scenarios where relationships between data are as important as the data itself, modeling data as networks of connected entities. Gremlin API uses vertices (nodes) representing entities and edges representing relationships between entities, with properties on both. For example, social network might have Person vertices with Friend edges connecting them, or e-commerce graph with Customer, Product, and Purchase vertices connected by Bought, Viewed, and Recommended edges. Graph queries traverse these relationships: g.V().hasLabel('Person').has('name','Alice').out('Friend').out('Friend') finds friends of friends.

Use cases include social networks analyzing connections, friend suggestions, influence analysis, or community detection; recommendation engines analyzing purchase patterns, user similarities, and product relationships; fraud detection identifying suspicious patterns in transaction networks like rings of accounts or unusual relationship patterns; knowledge graphs representing interconnected information with semantic relationships; network and IT operations mapping dependencies between services, servers, and applications; and supply chain management tracking relationships between suppliers, components, products, and customers. Gremlin API benefits from Cosmos DB's global distribution, elastic scaling, and SLAs while providing graph semantics. Graph queries that would require complex self-joins in relational databases or multiple queries in document databases become simple traversals in graph databases. Choosing Gremlin API suits scenarios where relationships are first-class concerns, queries involve multiple levels of connections, or data naturally forms networks rather than hierarchies or collections.

Table API

Azure Cosmos DB Table API provides key-value storage compatible with Azure Table storage but with premium capabilities. This API uses same data model as Azure Table storage—entities (rows) identified by partition key and row key combination—enabling straightforward migration from Azure Table storage to Cosmos DB Table API by changing endpoint. Table API provides significant advantages over Azure Table storage: turnkey global distribution replicating data worldwide, dramatically lower latency with single-digit millisecond performance compared to Table storage's higher latency, comprehensive SLAs covering availability and performance, tunable consistency levels beyond Table storage's eventual consistency, automatic indexing enabling queries on any property not just keys, and better throughput scalability.

Use cases include applications currently using Azure Table storage requiring better performance, global distribution, or improved SLAs; new applications needing simple key-value storage with partition and row keys; scenarios requiring low-latency access to structured data at global scale; and applications needing Table storage compatibility with premium features. Table API suits similar use cases as Azure Table storage—user profiles, session state, metadata, device information, and scenarios with simple access patterns based on keys rather than complex queries. Organizations choose Table API over Azure Table storage when performance, global distribution, or SLAs justify higher costs. The API maintains Table storage interface simplicity while leveraging Cosmos DB infrastructure. Choosing between Azure Table storage and Cosmos DB Table API depends on performance requirements, global distribution needs, SLA requirements, and budget—Table API costs significantly more but provides enterprise features for mission-critical scenarios.

Cosmos DB Use Cases

Global Applications

Azure Cosmos DB excels for globally distributed applications serving users worldwide. Social media platforms need low latency regardless of user location—Cosmos DB replicates data across regions with users reading from and writing to nearest region. Multi-region writes enable localized updates without routing all writes to distant primary region. Consistency levels balance availability and consistency—social media might use eventual consistency for likes or views but session consistency for posts ensuring users see their own content immediately. International e-commerce platforms serve product catalogs globally with regional inventories. Customers in Europe browse European-hosted data while Asian customers access Asian replicas, both experiencing single-digit millisecond latencies. Price and currency adjustments per region store alongside products. Cosmos DB's global distribution without complex replication logic simplifies architecture.

Multiplayer games require low latency for responsive gameplay. Game state, player profiles, leaderboards, and matchmaking data store in Cosmos DB with players connecting to nearest region. Multi-region writes enable players anywhere to update state without cross-region latency. Session consistency ensures players see game state consistently within their session. Content delivery applications serving streaming video metadata, article content, or user-generated content distribute content globally. Users access metadata from nearby region reducing load times. Cosmos DB's change feed enables cache invalidation or content distribution workflows. Global distribution combined with predictable performance makes Cosmos DB ideal for applications where user location varies and latency directly impacts experience. The ability to add or remove regions dynamically enables expansion into new markets without architectural changes.

IoT and Real-Time Analytics

IoT applications generate massive volumes of telemetry from sensors, devices, and equipment requiring high ingestion throughput and flexible schemas accommodating diverse device types. Cosmos DB handles millions of writes per second ingesting sensor readings, device state, and events. Partition key selection by device ID or sensor ID distributes load evenly. Time-series data benefits from Cassandra API's column-family model or NoSQL API with appropriate partition strategy. Change feed enables real-time processing—as telemetry arrives, downstream systems process data for dashboards, alerting, or machine learning. Eventual consistency suits telemetry ingestion where immediate consistency isn't critical but high availability and throughput matter.

Real-time analytics applications process streaming data for operational intelligence. Manufacturing equipment monitoring ingests sensor data, detects anomalies, and triggers alerts—Cosmos DB stores current state and historical data with single-digit millisecond reads enabling dashboard queries. Retail analytics track inventory, sales, and customer behavior in real-time—stores update inventory as sales occur with Cosmos DB propagating updates across locations. Financial services process trading data, detect fraud, and analyze market conditions in real-time. Vehicle telematics collects GPS, speed, fuel consumption, and diagnostic data from fleets—Cosmos DB handles high write throughput from thousands of vehicles while enabling queries for fleet management dashboards. The combination of high ingestion throughput, global distribution, and real-time query capabilities makes Cosmos DB suitable for IoT and real-time scenarios impossible with traditional databases.

Retail and Personalization

Retail applications leverage Cosmos DB's flexible schemas and low latency for product catalogs, shopping carts, and customer profiles. Product catalogs have varying attributes—electronics have technical specifications, clothing has sizes and colors, groceries have nutritional information. Flexible schema accommodates diverse product types in single container without complex schema management. Categories, pricing, inventory, and images store with products. Global distribution serves international customers with localized pricing and inventory. Shopping carts require low latency as customers add items, calculate totals, and apply promotions—delays cause frustration and abandonment. Session consistency ensures customers see cart updates immediately even with replicated data.

Personalization engines analyze customer behavior, preferences, and interactions recommending products or content. Cosmos DB stores user profiles, browsing history, purchase patterns, and recommendations. Change feed enables real-time personalization—as users interact with site, updates flow through change feed to recommendation engine recalculating suggestions. Graph capabilities using Gremlin API model relationships between customers, products, and behaviors enabling collaborative filtering and network-based recommendations. Customer segmentation and A/B testing store user assignments and results. Order history and loyalty programs track purchases and rewards. The combination of flexible schemas, low latency, and global distribution enables responsive personalized experiences at scale. Retail scenarios particularly benefit from multi-region writes as customers transact from various locations expecting consistent experience regardless of region.

Gaming

Gaming applications demand single-digit millisecond latencies preventing lag affecting gameplay. Cosmos DB stores player profiles, game state, inventories, achievements, and social connections. Multiplayer games synchronize state across players in real-time—Cosmos DB's low latency and high throughput handle rapid updates. Multi-region distribution places game servers and data near players reducing network latency critical for competitive gaming. Leaderboards track scores and rankings queryable with low latency. Cosmos DB scales elastically handling usage spikes during game launches or special events. In-game economies manage virtual currencies, items, and trading requiring transactional consistency within partitions preventing duplication exploits.

Mobile games synchronize progress across devices—players start game on mobile and continue on tablet. Cosmos DB replicates player data ensuring consistency across devices. Session storage tracks active games, matchmaking, and temporary state. Social features including friends lists, clans, and messaging store in graph structure using Gremlin API or documents with references. Game analytics track player behavior, progression, monetization, and retention feeding business intelligence. Change feed enables real-time analytics and personalization adjusting difficulty, offers, or content based on player actions. The performance guarantees, global distribution, and elastic scaling make Cosmos DB excellent choice for gaming backends where latency and availability directly impact player experience and business outcomes.

Real-World Cosmos DB Scenarios

Scenario 1: Global Social Media Platform

Business Requirement: Social network serves users worldwide requiring low latency for posts, comments, likes, and feeds regardless of user location.

Azure Solution: Azure Cosmos DB NoSQL API with Global Distribution

  • Architecture: User profiles, posts, comments, and likes store as JSON documents in Cosmos DB containers distributed across Azure regions in North America, Europe, Asia, and South America. Multi-region writes enable users to post from any location writing to nearest region.
  • Partitioning: Users partition by user ID distributing load evenly. Posts partition by user ID grouping user's posts together. Feeds use separate container with fan-out pattern on write—when user posts, feed items create for followers enabling efficient feed queries.
  • Consistency: Session consistency ensures users see their own posts and interactions immediately (read-your-writes) while eventual consistency for view counts and like counts optimizes performance for non-critical metrics. Strong consistency for friend relationships prevents inconsistencies.
  • Features: Change feed powers notification service—when user posts or comments, change feed triggers notifications to relevant users. Automatic indexing enables searching posts by hashtags or keywords. Stored procedures implement atomic operations like incrementing like counts within transactions.
  • Scaling: Autoscale adjusts throughput based on usage patterns—high during evening hours, lower overnight. Global distribution places data near users providing single-digit millisecond latency worldwide. Elastic scaling handles viral posts or celebrity accounts with millions of followers.

Outcome: Social platform serves millions of users globally with consistent low latency. Multi-region writes enable local performance. Flexible schema accommodates evolving features. Change feed enables real-time notifications and analytics.

Scenario 2: IoT Fleet Management

Business Requirement: Logistics company tracks thousands of delivery vehicles collecting GPS location, speed, fuel consumption, and diagnostic data every few seconds requiring high ingestion throughput and real-time dashboards.

Azure Solution: Azure Cosmos DB Cassandra API for Telemetry

  • Architecture: Cassandra API chosen for time-series data pattern and high write throughput. Keyspace contains telemetry table with vehicle ID as partition key and timestamp as clustering key enabling efficient time-range queries per vehicle.
  • Ingestion: Thousands of vehicles send telemetry to Azure Event Hubs buffering high-volume ingestion. Azure Functions or Stream Analytics processes events writing to Cosmos DB Cassandra API. Partition key by vehicle ID distributes writes evenly preventing hot partitions.
  • Queries: Fleet management dashboard queries recent telemetry per vehicle using partition key (vehicle ID) and time range efficiently retrieving vehicle history. Aggregations calculate daily mileage, average speed, fuel efficiency. Alerts detect anomalies like speeding or low fuel.
  • Retention: Time-to-live (TTL) automatically deletes old telemetry after 90 days reducing storage costs while maintaining recent data. Historical aggregates store separately for long-term analysis.
  • Scaling: Provisioned throughput scales with fleet size. Global distribution places data near regional operations centers. Eventual consistency suits telemetry where slight delays are acceptable for operational dashboard.

Outcome: Scalable IoT platform ingests millions of telemetry events daily, provides real-time vehicle tracking and diagnostics, enables operational dashboards, and supports predictive maintenance through historical analysis—all with managed service eliminating operational overhead.

Scenario 3: E-Commerce Product Recommendations

Business Requirement: Online retailer needs personalized product recommendations analyzing customer purchase history, browsing behavior, and product relationships.

Azure Solution: Azure Cosmos DB Gremlin API for Graph Relationships

  • Architecture: Graph database models customers, products, and categories as vertices with edges representing relationships—Purchased, Viewed, AddedToCart, RatedHighly. Product similarity edges connect related products. Customer similarity edges identify similar shopping patterns.
  • Recommendations: Gremlin queries traverse graph: "customers who bought this product also bought" finds products connected through purchase edges from similar customers. "You might also like" traverses from customer through purchase edges to products then to similar products. Social recommendations leverage friend relationships and their purchases.
  • Real-time Updates: As customers interact with site, change feed updates recommendation engine. New purchases create edges in graph enabling immediate incorporation into recommendations. Analytics recalculate product similarities and customer segments.
  • Performance: Global distribution serves international customers with low latency. Session consistency ensures customers see consistent recommendations during browsing session. Partition key by customer ID for customer-centric queries or by product ID for product-centric analytics.
  • Integration: Cosmos DB integrates with Azure Machine Learning for advanced recommendation models using collaborative filtering. Azure Functions process change feed calculating recommendation scores. Power BI connects for business intelligence on customer behavior and product performance.

Outcome: Personalized recommendations increase conversion rates and average order values. Graph model naturally expresses relationships difficult in relational databases. Real-time updates ensure recommendations reflect recent interactions. Global distribution provides low latency for international customers.

Exam Preparation Tips

Key Concepts to Master

  • Cosmos DB capabilities: Global distribution, low latency (single-digit ms), elastic scaling, comprehensive SLAs
  • Consistency levels: Strong, bounded staleness, session, consistent prefix, eventual
  • Use cases: Global apps, IoT/telemetry, real-time analytics, gaming, retail, personalization
  • NoSQL API: Native document database with JSON and SQL-like queries, recommended for new apps
  • MongoDB API: MongoDB compatibility for existing MongoDB applications and expertise
  • Cassandra API: Wide-column model for time-series data and Cassandra compatibility
  • Gremlin API: Graph database for relationship-focused data and traversal queries
  • Table API: Key-value storage compatible with Azure Table storage with premium features
  • API selection: Choose based on data model, existing expertise, migration scenarios, query needs

Practice Questions

Sample DP-900 Exam Questions:

  1. Question: Which Azure Cosmos DB capability enables data replication across multiple Azure regions?
    • A) Automatic indexing
    • B) Global distribution
    • C) Change feed
    • D) Stored procedures

    Answer: B) Global distribution - Global distribution enables turnkey data replication across any Azure regions worldwide.

  2. Question: Which Cosmos DB API is recommended for new applications without specific compatibility requirements?
    • A) MongoDB API
    • B) Cassandra API
    • C) NoSQL API
    • D) Table API

    Answer: C) NoSQL API - NoSQL API (formerly SQL API) is the native API recommended for new applications providing full Cosmos DB capabilities.

  3. Question: Which Cosmos DB API should be used for migrating an existing MongoDB application to Azure?
    • A) NoSQL API
    • B) Table API
    • C) MongoDB API
    • D) Gremlin API

    Answer: C) MongoDB API - MongoDB API provides wire protocol compatibility enabling MongoDB applications to work with Cosmos DB with minimal changes.

  4. Question: Which Cosmos DB API is best suited for modeling social network relationships and connections?
    • A) Table API
    • B) Cassandra API
    • C) NoSQL API
    • D) Gremlin API

    Answer: D) Gremlin API - Gremlin API provides graph database capabilities ideal for modeling relationships and networks.

  5. Question: What is the guaranteed latency for reads in Azure Cosmos DB at the 99th percentile?
    • A) Under 1 millisecond
    • B) Under 10 milliseconds
    • C) Under 100 milliseconds
    • D) Under 1 second

    Answer: B) Under 10 milliseconds - Cosmos DB guarantees single-digit millisecond read latency at 99th percentile.

  6. Question: Which Cosmos DB consistency level provides read-your-writes consistency within a client session?
    • A) Strong
    • B) Eventual
    • C) Session
    • D) Bounded staleness

    Answer: C) Session - Session consistency ensures clients see their own writes and maintains consistency within sessions.

  7. Question: Which Cosmos DB API provides compatibility with Azure Table storage?
    • A) NoSQL API
    • B) Table API
    • C) Cassandra API
    • D) MongoDB API

    Answer: B) Table API - Table API provides key-value storage compatible with Azure Table storage with premium capabilities.

  8. Question: What is a common use case for Azure Cosmos DB?
    • A) Data warehousing with complex analytics
    • B) File storage and sharing
    • C) Globally distributed applications requiring low latency
    • D) Virtual machine disk storage

    Answer: C) Globally distributed applications requiring low latency - Cosmos DB excels at global distribution with guaranteed low latency for internet-scale applications.

DP-900 Success Tip: Remember Azure Cosmos DB is globally distributed, multi-model NoSQL database with guaranteed single-digit millisecond latency, elastic scaling, and comprehensive SLAs. Use cases include global applications, IoT/telemetry, real-time analytics, gaming, and retail needing low latency and flexible schemas. Five APIs: NoSQL (native document database, recommended for new apps), MongoDB (MongoDB compatibility), Cassandra (wide-column for time-series), Gremlin (graph database for relationships), and Table (key-value compatible with Azure Table storage). Choose API based on data model, existing expertise, and migration needs. Consistency levels range from strong to eventual enabling tradeoffs between consistency and performance.

Hands-On Practice Lab

Lab Objective

Explore Azure Cosmos DB by creating accounts with different APIs, understanding global distribution and consistency models, and working with documents, queries, and features demonstrating Cosmos DB's unique capabilities.

Lab Activities

Activity 1: Create Cosmos DB Account with NoSQL API

  • Navigate Azure Portal: Go to "Create a resource" and select "Azure Cosmos DB"
  • Select API: Choose NoSQL (formerly SQL) API as the API type
  • Configure account: Specify subscription, resource group, account name, location, and capacity mode (provisioned or serverless)
  • Review settings: Examine global distribution options, backup policies, and network settings
  • Create database: After deployment, create database and container with appropriate partition key
  • Add documents: Use Data Explorer in portal to add sample JSON documents
  • Query documents: Write SQL-like queries filtering, sorting, and projecting data

Activity 2: Explore Global Distribution

  • Review regions: Examine which Azure region your Cosmos DB account is deployed in
  • Add replica (optional): If possible, add read region replicating data to another geographic location
  • Understand failover: Review automatic failover configuration and manual failover options
  • Multi-region writes: Understand option for enabling writes to multiple regions
  • Document benefits: Note how global distribution reduces latency for distributed users and improves availability

Activity 3: Understand Consistency Levels

  • Review consistency options: Examine five consistency levels in portal (Strong, Bounded Staleness, Session, Consistent Prefix, Eventual)
  • Default consistency: Note account's default consistency level (typically Session)
  • Understand tradeoffs: Document characteristics of each level—Strong (highest consistency, higher latency), Eventual (lowest latency, eventual consistency)
  • Use case matching: For sample scenarios (financial transactions, social media likes, user profiles), identify appropriate consistency level
  • Per-request override: Understand that consistency can override per request for flexibility

Activity 4: Compare Cosmos DB APIs

  • NoSQL API: Document characteristics—JSON documents, SQL-like queries, native Cosmos DB API
  • MongoDB API: Review MongoDB compatibility for existing MongoDB applications
  • Cassandra API: Understand wide-column model for time-series and IoT data
  • Gremlin API: Understand graph capabilities for relationship-based data
  • Table API: Compare with Azure Table storage noting premium features
  • Create comparison: Build table comparing APIs by data model, query language, use cases, and migration scenarios
  • Match scenarios: For sample applications, select appropriate API with justification

Activity 5: Explore Advanced Features

  • Automatic indexing: Note that all document properties are automatically indexed
  • Partition keys: Understand importance of partition key choice for scalability
  • Change feed: Review change feed capability for event-driven architectures
  • TTL: Understand time-to-live for automatic data expiration
  • Stored procedures: Review server-side programming capabilities (NoSQL API)
  • Metrics and monitoring: Examine performance metrics, request units consumption, and latency statistics

Activity 6: Use Case Analysis

  • Global application: Design Cosmos DB solution for social media platform with users worldwide
  • IoT telemetry: Design solution for high-throughput sensor data ingestion
  • E-commerce: Design product catalog with flexible schema and personalization
  • Gaming: Design player profile and leaderboard system with low latency
  • For each scenario: Select appropriate API, define partition key strategy, choose consistency level, identify global distribution benefits
  • Compare alternatives: Contrast Cosmos DB solution with alternatives (Azure SQL Database, Table storage) noting advantages

Lab Outcomes

After completing this lab, you'll understand Azure Cosmos DB's globally distributed, multi-model capabilities including turnkey global distribution, guaranteed low latency, elastic scaling, and comprehensive SLAs. You'll know five APIs (NoSQL for documents, MongoDB for compatibility, Cassandra for wide-column, Gremlin for graphs, Table for key-value) and when to use each. You'll understand consistency levels enabling tradeoffs between consistency and performance. You'll recognize appropriate use cases including global applications, IoT, real-time analytics, gaming, and retail. This knowledge demonstrates Cosmos DB understanding tested in DP-900 exam and provides foundation for leveraging Cosmos DB in modern distributed applications.

Frequently Asked Questions

What is Azure Cosmos DB and what are its core capabilities?

Azure Cosmos DB is Microsoft's globally distributed, multi-model NoSQL database service designed for mission-critical applications requiring high availability, low latency, and elastic scalability. Core capabilities include turnkey global distribution enabling data replication across any Azure region with automatic multi-region writes and reads; guaranteed low latency with single-digit millisecond reads and writes at 99th percentile; elastic and unlimited scalability automatically scaling throughput and storage; comprehensive SLAs covering availability (99.999% for multi-region), latency, throughput, and consistency; five well-defined consistency levels balancing consistency and performance from strong to eventual; automatic indexing of all data without schema or index management; and support for multiple data models and APIs including document, key-value, graph, and column-family through different API endpoints. Cosmos DB is fully managed Platform-as-a-Service (PaaS) eliminating operational overhead. It's designed for internet-scale applications, IoT and telemetry, retail and marketing, gaming, mobile and web applications requiring global reach, predictable performance, and automatic failover. The service handles infrastructure concerns including patching, scaling, replication, and failover automatically while providing industry-leading SLAs. Cosmos DB supports both serverless and provisioned throughput models, enabling cost optimization for various workload patterns from occasional access to constant high throughput.

What are common use cases for Azure Cosmos DB?

Azure Cosmos DB suits scenarios requiring global distribution, low latency, high availability, and flexible schemas. Globally distributed applications serve users worldwide needing data replicated near users reducing latency—social networks, multiplayer games, and international e-commerce platforms benefit from multi-region writes and reads. IoT and telemetry applications ingest massive volumes of sensor data, device telemetry, and event streams requiring high write throughput and flexible schemas accommodating evolving device types. Retail and e-commerce platforms handle product catalogs with varying attributes, shopping carts requiring low latency, and personalization engines analyzing user behavior. Real-time analytics process streaming data for dashboards, operational intelligence, and immediate insights. Gaming applications manage player profiles, game state, leaderboards, and in-game transactions requiring single-digit millisecond latency preventing poor player experience. Mobile and web applications backend services handle user authentication, profiles, content management, and synchronization across devices. Personalization and recommendation engines analyze user behavior, preferences, and interactions suggesting relevant content or products. Financial services including fraud detection analyze transactions in real-time, trading platforms require low-latency access to market data, and banking applications handle transactions globally. Healthcare applications manage patient records, medical imaging metadata, and health monitoring data. Content management systems store articles, media metadata, and user-generated content with flexible schemas. Cosmos DB excels when applications require global scale, guaranteed performance, high availability, or flexible data models unsuitable for rigid relational schemas. The multi-model support through different APIs enables various application architectures from document-based to graph databases.

What is the Azure Cosmos DB NoSQL API?

Azure Cosmos DB NoSQL API (formerly SQL API) is the native core API providing document database functionality with JSON documents and SQL-like query syntax. It's the most commonly used Cosmos DB API and recommended for new applications. NoSQL API stores data as JSON documents in containers, supporting flexible schemas where documents in same container can have different structures. The SQL-like query language enables familiar querying with SELECT, WHERE, JOIN, ORDER BY supporting rich queries including filters, projections, aggregations, and user-defined functions. Advanced features include automatic indexing of all properties without manual index management, server-side programming through stored procedures, triggers, and user-defined functions written in JavaScript, change feed enabling reactive programming and event-driven architectures capturing all changes for processing, and transactions within partition ensuring ACID guarantees for related operations. NoSQL API suits applications requiring document-oriented storage, complex queries beyond simple key-value lookups, flexible schemas accommodating evolving data models, and server-side logic execution. Common scenarios include content management systems storing articles and media metadata, user profile management, product catalogs with varying attributes, and application backends requiring JSON data storage and rich querying. The API provides .NET, Java, Python, Node.js, and other language SDKs simplifying application development. NoSQL API benefits from all Cosmos DB capabilities including global distribution, tunable consistency, and elastic scaling while providing document database semantics familiar to developers from MongoDB or Couchbase backgrounds but with Azure-native integration and enterprise features.

What is the Azure Cosmos DB MongoDB API?

Azure Cosmos DB MongoDB API provides wire protocol compatibility with MongoDB enabling MongoDB applications, drivers, and tools to work with Cosmos DB with minimal or no code changes. This API suits organizations with existing MongoDB expertise or applications wanting to migrate to fully managed cloud database while preserving MongoDB compatibility. MongoDB API supports MongoDB query language, aggregation pipeline, indexes, and operators familiar to MongoDB developers. It's compatible with MongoDB drivers enabling existing applications to connect by simply changing connection strings. The API supports multiple MongoDB versions providing flexibility for different application requirements. Key benefits include fully managed service eliminating MongoDB cluster management, automatic scaling handling throughput and storage growth, global distribution replicating data across regions unavailable in standard MongoDB, enterprise-grade SLAs for availability, latency, and throughput, and built-in security including encryption and access controls. Use cases include migrating existing MongoDB applications to Azure without rewriting code, applications built with MongoDB requiring cloud hosting, scenarios needing MongoDB compatibility with enterprise features beyond open-source MongoDB, and development teams skilled in MongoDB wanting managed service. The MongoDB API provides compatibility layer translating MongoDB wire protocol to Cosmos DB's underlying engine, meaning applications use familiar MongoDB SDKs and syntax while benefiting from Cosmos DB's distribution, consistency models, and scaling. Organizations choosing between MongoDB Atlas and Cosmos DB MongoDB API consider factors like Azure ecosystem integration, global distribution requirements, SLA guarantees, and pricing models. Cosmos DB MongoDB API enables 'lift and shift' migrations from on-premises or other cloud MongoDB deployments to Azure with minimal friction.

What is the Azure Cosmos DB Cassandra API?

Azure Cosmos DB Cassandra API provides wire protocol compatibility with Apache Cassandra enabling Cassandra applications, drivers, and tools to work with Cosmos DB. This API suits organizations with Cassandra expertise or applications wanting to migrate from Cassandra to managed service. Cassandra API supports Cassandra Query Language (CQL), Cassandra data types, and keyspace/table concepts familiar to Cassandra users. It's compatible with Cassandra drivers in various languages enabling existing applications to connect with minimal changes. The column-family data model organizes data in tables within keyspaces similar to Cassandra. Key benefits include managed service eliminating Cassandra cluster operations including node management, compaction, and repairs; automatic scaling handling throughput and storage; global distribution with multi-region writes; comprehensive SLAs beyond typical Cassandra deployments; and integration with Azure ecosystem. Use cases include migrating existing Cassandra applications to Azure, IoT and telemetry applications using Cassandra's wide-column model for time-series data, applications requiring Cassandra compatibility with managed service convenience, and scenarios needing Cassandra data model with enterprise features. Cassandra API particularly suits time-series data, IoT sensor data, and applications requiring high write throughput with column-family data model. Organizations choosing Cassandra API often have existing Cassandra expertise, applications architected for Cassandra's data model, or requirements for wide-column storage unsuitable for document databases. The API provides Cassandra compatibility while leveraging Cosmos DB's global distribution, consistency models, and elastic scaling, eliminating operational complexity of managing Cassandra clusters while preserving application compatibility.

What is the Azure Cosmos DB Gremlin API?

Azure Cosmos DB Gremlin API provides graph database capabilities using Apache TinkerPop Gremlin graph query language. This API suits applications modeling data as networks of relationships where connections between entities are as important as entities themselves. Graph databases use vertices (nodes) representing entities and edges representing relationships between entities, with both vertices and edges having properties. Gremlin API uses graph traversal language enabling queries like 'find friends of friends' or 'recommend products based on purchase patterns' naturally expressing relationship-based queries difficult in relational or document databases. Use cases include social networks modeling users and connections with friend suggestions and influence analysis, recommendation engines analyzing relationships between users, products, and behaviors, fraud detection identifying suspicious patterns in transaction networks, knowledge graphs representing interconnected information, network and IT operations mapping infrastructure dependencies, and supply chain management tracking relationships between suppliers, products, and customers. Gremlin API benefits from Cosmos DB's global distribution, elastic scaling, and comprehensive SLAs while providing graph semantics. The API supports Gremlin query language and is compatible with TinkerPop-enabled drivers and tools. Graph queries perform traversals like finding paths between vertices, calculating centrality measures, or pattern matching. Choosing Gremlin API suits scenarios where relationships are first-class concerns, queries involve multiple levels of connections, or data naturally forms networks. Social networking, recommendation systems, and fraud detection scenarios particularly benefit from graph modeling as queries that would require complex joins in relational databases become simple traversals in graph databases.

What is the Azure Cosmos DB Table API?

Azure Cosmos DB Table API provides key-value storage compatible with Azure Table storage while offering premium capabilities. This API suits applications using Azure Table storage wanting to upgrade to Cosmos DB's enhanced features or new applications needing key-value storage with global distribution. Table API uses same data model as Azure Table storage with entities (rows) identified by partition key and row key, enabling straightforward migration from Azure Table storage to Cosmos DB by changing endpoint. Key advantages over Azure Table storage include turnkey global distribution replicating data worldwide, lower latency with single-digit millisecond reads and writes compared to Azure Table storage's higher latency, comprehensive SLAs covering availability, latency, and throughput, tunable consistency levels beyond Azure Table storage's eventual consistency, and automatic indexing enabling queries beyond partition and row keys. Use cases include applications currently using Azure Table storage requiring better performance or global distribution, new applications needing simple key-value storage with premium features, scenarios requiring low-latency key-value access at global scale, and applications needing Table storage compatibility with enterprise features. Table API suits structured data with simple access patterns not requiring complex queries or relationships, similar use cases as Azure Table storage but with enhanced capabilities. Organizations migrate from Azure Table storage to Cosmos DB Table API for global distribution, improved performance, or better SLAs while maintaining application compatibility. The API provides familiar Table storage interface with partition and row keys while leveraging Cosmos DB's infrastructure. Choosing between Azure Table storage and Cosmos DB Table API depends on performance requirements, global distribution needs, SLA requirements, and budget—Table API costs more but provides significantly enhanced capabilities for applications where performance and availability justify the cost.

How do you choose the appropriate Azure Cosmos DB API?

Choosing appropriate Cosmos DB API depends on application requirements, existing expertise, migration scenarios, and data model characteristics. Choose NoSQL API for new applications without specific API requirements, document-oriented data models storing JSON documents, applications requiring rich SQL-like queries including filters and aggregations, server-side programming with stored procedures or triggers, and when you want native Cosmos DB experience with newest features first. NoSQL API is the recommended default providing full Cosmos DB capabilities and broadest feature support. Choose MongoDB API when migrating existing MongoDB applications to Azure, development team has MongoDB expertise, applications use MongoDB drivers and tools, or you need MongoDB compatibility with managed service benefits. Choose Cassandra API when migrating Cassandra applications, team has Cassandra expertise, applications require wide-column data model, or scenarios involve time-series data suited to Cassandra's structure. Choose Gremlin API when data naturally forms graphs with relationships as important as entities, applications require graph traversal queries, use cases include social networks, recommendation engines, or fraud detection, or when modeling interconnected data. Choose Table API when migrating from Azure Table storage, applications need simple key-value storage with partition and row keys, compatibility with Table storage is required, or scenarios need Table storage model with premium features. Consider factors including existing application code and migration effort, team expertise and skills, data model characteristics (documents, graphs, column-families, key-value), query requirements (rich queries vs simple lookups), and specific features needed. Note that API choice is per database account and cannot change after creation, so careful upfront evaluation is important. Many organizations use multiple Cosmos DB accounts with different APIs for different application requirements, leveraging appropriate data models for specific scenarios.

Share:

Written by Joe De Coppi - Last Updated November 14, 2025