Key Generation Strategies in Entity Framework 7

Sign in to queue

Description

In this short video Rowan takes us through a different approach to generating keys in Entity Framework 7 using sequences. This approach is interesting in cases of large INSERT batches primarily because keys are all fetched beforehand. Would love your thoughts!

Here are some links for more information:

As an aside, I spent a couple of days with Rowan (and team) recording additional videos regarding Entity Framework 7 and some of the new features to look for in the coming release. Stay tuned!

Embed

Download

Download this episode

The Discussion

  • User profile image
    fbehrens

    I'm creating the primary keys with Guid.NewGuid()

    It should be unique, shouldn't it ?

    Are there any drawbacks I am not aware ?

  • User profile image
    voroninp

    @fbehrens: Guids are not sequential, but index stores its data as ordered. Probably performance of inserts can degrade. 

  • User profile image
    SonOfSam

     

    Quick Question

    Since we get 10 ids by request and if I get a second request the ids will start at  (10 + 1) and so on, is safe to said that I will decrease the max value of my numeric id either int or long by 10. Meaning

    1. (intmax / 10)
    2. (longmax / 10)
  • User profile image
    SonOfSam

    @fbehrens: I'm not sure if GUID have performance problem during insert, the only drawback of GUID is the way that they are storage in the database, if I'm not mistaken and I'm not database guy, a regular int or bigint will take one page witch is 8k and GUID uses 2 pages witch make your DB size bigger and because of that read will perform better on int and bigint than a GUID. I'm big fan of both and each one its place 

  • User profile image
    dcga

    @fbehrens: there is a slight performance hit you take when using Guids as primary keys, but not like @SonOfSam states.
    Just to clarify:
    Each Sql Server page is 8KB (8x1024 bytes) (of which 96 bytes are dedicated to header info).
    Guids are stored in 16 bytes (128 bits), whereas an Int is stored in 4 bytes (32 bits)

    The size in itself is not really a big problem unless you churn an inmense amount of transactions a day. The problem is the fragmentation that is created when you are using Guids as PK as well as the CLUSTERED INDEX of the table. The PK is just the identifier for the rows in the table, it needs to be UNIQUE and NOT NULL.
    The Clustered Index has a STRICT ROW ORDER; And this is the main ingredient to the whole thing, because the order in which the clustered index is stored is the way the whole table is stored. When you use random GUIDs for the index, it gets fragmented fast and the performance suffers as a result.

    SQL SERVER Guids are composed of 4 groups of bytes, 4 2 2 8 = 16 
    The first 3 groups as little endian and the last group as big endian.
    When SQL Server sorts guids it takes the last group first, gets the last 6 bytes of that group and sorts them, then the first 2 bytes of that group and sorts them, then it continues with the first group, then the second group and lastly the third group.

    What this all means is that GUIDs are random and because of the special sort algorithm that SQL Server uses, they are not ideal for PK unless you make them sequential. This does take away the infalability of the randomness although the probability of a collision is far too low in 99% of the cases.

    This article by Alex Siepman, has code that generates sequential GUIDs replacing the last 6 bytes with a Date sequence (best implementation I have found so far)
    In my own code I made a couple changes on the date range used, ToString() and for serialization purposes but other than that, his solution works perfectly.

    Here is a better explanation of the clustered index. (USE THE INDEX LUKE -> unreasonable-defaults-primary-key-clustering-key)

     

    Edit: removed links because of spam detection... sigh

     

  • User profile image
    jambone

    @dcga MS introduced NEWSEQUENTIALID() T-SQL for sequential guid generation at the DB

    https://msdn.microsoft.com/en-us/library/ms189786.aspx

  • User profile image
    RichardD3

    Is this an "all or nothing" option on the context? Is there no way to turn it on only for specific tables?

    And is there a way to have it use a specific sequence, or are we stuck with "DefaultSequence" for all tables?

  • User profile image
    Gabriel Lamounier

    I really hope they don't name this feature HiLo, because it doesn't looks like an implementation of the HiLo algorithm.

  • User profile image
    dcga

    @jambone: I do know that (a)
    The problem is that the NEWSEQUENTIALID() is tied to the machine that creates the GUID, whereas the strategy I suggested is not tied to one specific computer.

    If your code runs on a different machine then NEWSEQUENTIALID will generate GUIDs in a different sequence, you gain almost nothing.

     

    Here is the link the COMB GUID article by Alex Siepman.
    http://www.siepman.nl/blog/post/2015/06/20/SequentialGuid-Comb-Sql-Server-With-Creation-Date-Time-.aspx

  • User profile image
    mrtaikandi

    I've created a nuget package for a utility class for generating sequential GUIDs independent from the system dll inspired by Alex Siepman blog which you might find interesting. Also the code is available in the github. http://nuget.org/packages/sequentialguidgenerator

  • User profile image
    romiller
    fbehrens (cc @voroninp @SonOfSam @dcga @jambone @mrtaikandimrtaikandi)
     
    By default in EF7, if you add an entity with a GUID primary key and it has not value set (i.e. it's set to the default GUID) then EF will use a client side version of the NEWSEQUENTIALID algorithm to generate values for you.
  • User profile image
    romiller
    @SonOfSam - The pools of IDs are actually shared at the AppDomain level (or ServiceProvider level if you are managing your own DI). So if you have 4 pools with 10 values each then the requests will cycle through the pools and all values get used (with the pools reaching out to the database for the next value when needed). Of course, when the AppDomain ends (web application is restarted etc.) then you will end up losing whatever values are left in each pool.
     
    @RichardD3 - Yes you can also set it individually on the properties that you want to use this strategy. There are overloads of both the property level and model level APIs that allow you to specify which sequence to use.
     
    @Gabriel Lamounier - In a very pure definition of HiLo you are probably correct. But reserving a high number from a central source (in this case a database sequence) and then being free to use all the numbers within the block bounded by the high number is representative of the pattern.

Add Your 2 Cents