Entries:
Comments:
Discussions:

Loading user information from Channel 9

Something went wrong getting user information from Channel 9

Latest Achievement:

Loading user information from MSDN

Something went wrong getting user information from MSDN

Visual Studio Achievements

Latest Achievement:

Loading Visual Studio Achievements

Something went wrong getting the Visual Studio Achievements

Key Generation Strategies in Entity Framework 7

6 minutes, 53 seconds

Download

Right click “Save as…”

In this short video Rowan takes us through a different approach to generating keys in Entity Framework 7 using sequences. This approach is interesting in cases of large INSERT batches primarily because keys are all fetched beforehand. Would love your thoughts!

Here are some links for more information:

As an aside, I spent a couple of days with Rowan (and team) recording additional videos regarding Entity Framework 7 and some of the new features to look for in the coming release. Stay tuned!

Tag:

Follow the discussion

  • Oops, something didn't work.

    Getting subscription
    Subscribe to this conversation
    Unsubscribing
    Subscribing
  • I'm creating the primary keys with Guid.NewGuid()

    It should be unique, shouldn't it ?

    Are there any drawbacks I am not aware ?

  • @fbehrens: Guids are not sequential, but index stores its data as ordered. Probably performance of inserts can degrade. 

  • SonOfSamSonOfSam Ultraman vs Godzilla There is no other way

     

    Quick Question

    Since we get 10 ids by request and if I get a second request the ids will start at  (10 + 1) and so on, is safe to said that I will decrease the max value of my numeric id either int or long by 10. Meaning

    1. (intmax / 10)
    2. (longmax / 10)
  • SonOfSamSonOfSam Ultraman vs Godzilla There is no other way

    @fbehrens: I'm not sure if GUID have performance problem during insert, the only drawback of GUID is the way that they are storage in the database, if I'm not mistaken and I'm not database guy, a regular int or bigint will take one page witch is 8k and GUID uses 2 pages witch make your DB size bigger and because of that read will perform better on int and bigint than a GUID. I'm big fan of both and each one its place 

  • @fbehrens: there is a slight performance hit you take when using Guids as primary keys, but not like @SonOfSam states.
    Just to clarify:
    Each Sql Server page is 8KB (8x1024 bytes) (of which 96 bytes are dedicated to header info).
    Guids are stored in 16 bytes (128 bits), whereas an Int is stored in 4 bytes (32 bits)

    The size in itself is not really a big problem unless you churn an inmense amount of transactions a day. The problem is the fragmentation that is created when you are using Guids as PK as well as the CLUSTERED INDEX of the table. The PK is just the identifier for the rows in the table, it needs to be UNIQUE and NOT NULL.
    The Clustered Index has a STRICT ROW ORDER; And this is the main ingredient to the whole thing, because the order in which the clustered index is stored is the way the whole table is stored. When you use random GUIDs for the index, it gets fragmented fast and the performance suffers as a result.

    SQL SERVER Guids are composed of 4 groups of bytes, 4 2 2 8 = 16 
    The first 3 groups as little endian and the last group as big endian.
    When SQL Server sorts guids it takes the last group first, gets the last 6 bytes of that group and sorts them, then the first 2 bytes of that group and sorts them, then it continues with the first group, then the second group and lastly the third group.

    What this all means is that GUIDs are random and because of the special sort algorithm that SQL Server uses, they are not ideal for PK unless you make them sequential. This does take away the infalability of the randomness although the probability of a collision is far too low in 99% of the cases.

    This article by Alex Siepman, has code that generates sequential GUIDs replacing the last 6 bytes with a Date sequence (best implementation I have found so far)
    In my own code I made a couple changes on the date range used, ToString() and for serialization purposes but other than that, his solution works perfectly.

    Here is a better explanation of the clustered index. (USE THE INDEX LUKE -> unreasonable-defaults-primary-key-clustering-key)

     

    Edit: removed links because of spam detection... sigh

     

  • jambonejambone

    @dcga MS introduced NEWSEQUENTIALID() T-SQL for sequential guid generation at the DB

    https://msdn.microsoft.com/en-us/library/ms189786.aspx

  • Is this an "all or nothing" option on the context? Is there no way to turn it on only for specific tables?

    And is there a way to have it use a specific sequence, or are we stuck with "DefaultSequence" for all tables?

  • Gabriel LamounierGabriel Lamounier

    I really hope they don't name this feature HiLo, because it doesn't looks like an implementation of the HiLo algorithm.

  • @jambone: I do know that (a)
    The problem is that the NEWSEQUENTIALID() is tied to the machine that creates the GUID, whereas the strategy I suggested is not tied to one specific computer.

    If your code runs on a different machine then NEWSEQUENTIALID will generate GUIDs in a different sequence, you gain almost nothing.

     

    Here is the link the COMB GUID article by Alex Siepman.
    http://www.siepman.nl/blog/post/2015/06/20/SequentialGuid-Comb-Sql-Server-With-Creation-Date-Time-.aspx

  • I've created a nuget package for a utility class for generating sequential GUIDs independent from the system dll inspired by Alex Siepman blog which you might find interesting. Also the code is available in the github. http://nuget.org/packages/sequentialguidgenerator

  • fbehrens (cc @voroninp @SonOfSam @dcga @jambone @mrtaikandimrtaikandi)
     
    By default in EF7, if you add an entity with a GUID primary key and it has not value set (i.e. it's set to the default GUID) then EF will use a client side version of the NEWSEQUENTIALID algorithm to generate values for you.
  • @SonOfSam - The pools of IDs are actually shared at the AppDomain level (or ServiceProvider level if you are managing your own DI). So if you have 4 pools with 10 values each then the requests will cycle through the pools and all values get used (with the pools reaching out to the database for the next value when needed). Of course, when the AppDomain ends (web application is restarted etc.) then you will end up losing whatever values are left in each pool.
     
    @RichardD3 - Yes you can also set it individually on the properties that you want to use this strategy. There are overloads of both the property level and model level APIs that allow you to specify which sequence to use.
     
    @Gabriel Lamounier - In a very pure definition of HiLo you are probably correct. But reserving a high number from a central source (in this case a database sequence) and then being free to use all the numbers within the block bounded by the high number is representative of the pattern.

Remove this comment

Remove this thread

Close

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.