Azure Cosmos DB: Get the Most Out of Provisioned Throughput

Sign in to queue


Azure Cosmos DB is a globally distributed database with limitless elastic scale. To take advantage of elastic scale, you first need to choose a partition key. Kirill Gavrylyuk stops by Azure Friday to talk with Scott Hanselman about the choice of partition key, and how to use the new metrics charts to troubleshoot a poor partition key choice (e.g., "hot partitions").

For more information, see:





The Discussion

  • User profile image

    lets say you are designing a service like twitter and you have a partition key date with time, basically that day's particular hour. So this should mean almost equal partitioning but can the problem be of too many partitions because basically this means we would be creating a partition for a thousands of tweets for every hour, so 24 partitions being created every day. So again can the problem be of having too many partitions?

  • User profile image

    Don't worry about having too many partitions. In Cosmos, there are logical and physical partitions. So while you may have millions of logical partitions, you could potentially only have 10 physical partitions.

    As your data grows and RU provisioned throughput grows, Cosmos may split you into 20 physical partitions. Now each physical partition holds half of the original number of logical partitions. This process can continue to give you nearly infinite scale because the system will evenly divide the number of logical partitions by the number of physical partitions.

    TLDR: don't ever worry about too many partitions, just make sure they are uniform in its distribution.

  • User profile image

    @mhassanraza:In your example, you're using a single very hot partition per hour (so, no concurrent distribution). Better, take the last 4 chars of a GUID (which could be your message id) and then you have an even distribution across 64K of possible partition key values.

Add Your 2 Cents