SQL Generation Improvements in Entity Framework 7

Play SQL Generation Improvements in Entity Framework 7

The Discussion

  • User profile image
    Piotr Spikowski

    EF7 identifies which part of a query can run on DB and which part can only be run in-memory. This is excellent. One needs to however be cautious with this and fully control what is added into queries e.g. into where clauses… (yes… you should always control what is added anyway). The point is that the query (when modified) may still work but the performance can be different. You may not even instantly detect the differences. They might be noticed during perf tests (possibly on QA environments).

    The above might be the case as you may not always build queries in “one line”. You may receive a predicate as a param from the outside. You may use repos and use specification patterns etc. So from now on instead of knowing that you should investigate the query more (exception in previous version) you might be using perf results to look for problems.

    The last sample given by you is also interesting (the one where posts are included and a separate query for posts is run). I need to grab the code as I would like to know what happens if I add a predicate to blogs and that predicate can only be run in memory (and I still have the ‘include’ on Posts).

  • User profile image

    About the "Include" improvement, I am curious if the following could be a better way to generate the sql:

    1. Query the parent table into a temp table.
    2. Select from the temp table (this will be resultset #1 on the data reader).
    3. Select from the child table joining into the temp table (resultset #2 on the data reader).
    4. Drop the temp table.  

    All 4 of these sql statements could be batched as a single query sent to the db (separated by ";"). The resulting data reader would have 2 result sets (first for the parent objects and second for the children). This is as opposed to running two separate queries and doing the parent "where" filtering twice. Have you evaluated this option?  Could your current way cause issues if one of the blog names change in the middle of the two queries?  For example, if "ADO.NET Blog" is changed to "ADO.NET", the first query would return that blog name as "ADO.NET Blog" but because of the second repeated query, it's child posts wouldn't be included, right?

  • User profile image


  • User profile image

    Really cool stuff.
    But wouldn't sending 2 queries instead of 1 join to get related posts take more time on the db side?
    I guess if latency was really bad it would be good but what if the db itself is the bottleneck?

  • User profile image

    Is sequential Include() processing used only  for one-to-many and many-to-many relationships?

    Why were separate queries issued ? Are not multiple result sets supported?

    Are you going to repeat joins with 'where' for each step? Entity1.Inlcude("Entity2.Entity3.....Entity100")

    Will EF generate next queries for Entity3...Entity100 if Entity1 relates to Entity2 as 1:0..* and there was empty result set for Entity2?

    Isn't it really more optimal to do all these things in a way @eersonmez suggests?



  • User profile image

    FromSql is great for the tricky sections of queries that need join hints. Is it possible to consume linq from FromSql, the opposite of what was shown?  Maybe joins are enough for composing both ways.

    I don't really liked parts of my queries silently running client side. I believe Linq2sql allowed this and EF broke with that.  EF7 seems a step back in the regards. Now, if turning code client side is explcit (with AsEnumerable now, for example), the feature gets my approval.

    I'm also REALLY concerned about the change in how include is done. This will be cause perf regression all over our code.  Repeating complex queries is awful. Please, please use a temp table, or use xml result sets. I prefer xml because I can consume only part of the query result without the server wastefully reading everything from disk.

  • User profile image

    Is the new Include system compatible with the async world?

    There are now 2 queries to be executed, there is like a single wrapping task to await?

  • User profile image

    Thanks EF team. Everyone one of those features is awesome. I've been wanting a lot of these for a while. I.e. batching, and a way to do full-text search like that. Also, I definitely look forward to trying the new way that joins work for fetching data. I have some really horrible queries going on right now and it looks like that may help a lot. The part where it executes some things at the database level and others in memory looks like it is exactly what I need for one of my queries as well. Definitely looking forward to EF 7. The only downside is no EntityDataSource control which I need for working with Telerik UI controls. And I need a MySQL provider. Can't wait until it's available. If large model startup time is improved like it's supposed to be, this will be a major improvement over EF 6.

  • User profile image

    Would that implementation of .Include() result in race conditions since the two tables are loaded separately?

  • User profile image

    I hope this .Include() "improvement" can be turned off, systems where a round-trip is expensive won't benefit from that, specially for simple queries like the one demonstrated.

    I really don't see the point in this, if I understood correctly this is exact behaviour of using the old .Collection() method after your entity was loaded.

  • User profile image

    @Piotr Spikowski & @br1 - Agreed that excessive client evaluation is a concern. We will be logging warnings in some cases where we know it's probably not what you want (i.e. filtering on the client). We are also going to add a flag to completely disable client eval (tracked by https://github.com/aspnet/EntityFramework/issues/1856).


    @eersonmez – Yes these are valid suggestions. What we have at the moment is already a significant improvement over EF6 and we may look at further improvements (such as multiple result sets on the same query) in the future.


    @Clement – It's possible it could take more time if latency is very high and the data set is very small. But in reality the reduction in the amount of data being sent over the wire should outweigh this.


    @voroninp – Yes, multiple result sets is a possible improvement we will look at. Compared to EF6 though, your 100 includes would have resulted in a query with 100 JOINs in it... it may not be optimal but at least the EF7 version is going to run :) (and 100 joins is probably not a real world scenario).


    @SimoneInvernizzi – Yes, it still works the same - the thread will be freed up while .NET code is waiting on the database (possibly multiple times).


    @akamud – It's different from .Collection() because that would load the related objects for a single entity in a query. This loads the data for all related entities in a single query (i.e. Blogs.Include(b => b.Posts).ToList() with 1000 blogs each with 100 posts still only results in 2 queries).

  • User profile image

    One thing that occurred to me with regard to the multiple queries rather than doing a join. I'm assuming you can't read the data in a streaming fashion then and everything would need to be read into RAM and then merged? I have some apps where I turn off tracking and read large amounts of data that couldn't be loaded all into RAM. At the moment, I'm using my own CUD methods for doing updates as EF is just too slow. I would like to not have to use my own methods for this in the future. Maybe the new batch update support will help and maybe even be faster than my code. I hope that EF 7 is a lot better about handling bulk updates. The only thing EF 6 is useful for is for queries with tracking turned off when working with large amounts of data.

  • User profile image

    It would be nice if OnConfiguring would provide (optionally) a reference to the DI container (if avaiable) so that you can read information like the connection string out of a config file. Right now that seems to be a real pain to do.

    Without access to the initial configuration information the OnConfigure() method is kind of worthless given that DbContext can't get at any configuration stores without it.

  • User profile image

    Wow, EF7 will give us big improvements against EF6.

    What's about the contains-problem we all know from EF6?

    Example of this problem from real world:
    We have a very complex application with a more complex rights management where we have to check access rights of entities. For this we need a lot of extra data.
    In EF6 we tried to get only the fields we need to check rights to minimize loaded data that we will discard when checking rights for the given entity will fail... but this ends up in reloading extra data with ids and contains -> Result: slow query translation by lots of ids and a very confusing and overloaded sql query, e.g. casting of null values, casting of guids/uniqueidentifiers, lots of subselects without any profit in the results...

    I hope generation of sql will get more close to how a human would write the query ;)

Add Your 2 Cents