Tech Off Thread

11 posts

Database design/performance question

Back to Forum: Tech Off
  • User profile image
    LostIn​Tangent

    I have a group of about 6 datatables that collectively CAN contain millions of records (it is based on the user). Users have the ability to create a personal version of the data contained in the 6 tables so that they can modify/add/delete/whatever they want to the data to perform "what-if" scenarios and not affect the live data. The design originally placed a column in all 6 tables called VersionID that was part of the respective primary keys, allowing each record in all 6 tables to be replicated with a modified version identifier. So when a user created a new version, it would copy all data in each table, apply a new VersionID value to the data and re-insert it into each table.

    This solution seems to be pretty crappy in terms of performance/design.

    1) The tables get cluttered with tons of data that is user created.
    2) The tables record counts get blown out of whack if excessive versions are made, which in turns kills performance.
    3) In order to use these tables with any sort of effeciency, an index needed to be put on the VersionID column to allow for selecting of records by it, but adding that index affected the other non-version related operations on each table.
    4) Other issues...

    My question is: what would be a better approach to this solution? Would it be better to create a copy of each of the tables for each version? So that the data would be seperated and data operations would perform better since each version would be isolated. Plus when the version is deleted later, I could easily perform DROP TABLEs on each copy which would be basically instant and wouldn't jack the index of the live tables with deleting millions of records.

    Any input is appreciated.

  • User profile image
    blowdart

    Ouch.

    On the face of it creating new "temporary" (but not in temp) tables seems the way to go; even better create as the user not dbo. so only that user sees them. Or you could have a single table per user, and the size problems would then just be releated to that user (well, obviously long running queries would slow down everyone at the same time)

    The fun part comes in querying them; you're going to have to pass the table name in, and I'm not convinced SQL's automatic performance thingumies will cope that well with a stored procedure that isn't fixed to a table.

    Keeping "what-if" data in the same table as live data strikes me as rather worrying

  • User profile image
    LostIn​Tangent

    Thanks for the response blowdart. I think creating new tables will probably be the best solution in the long run. Unless someone else on here has a better solution Big Smile

  • User profile image
    littleguru

    LostInTangent wrote:
    Thanks for the response blowdart. I think creating new tables will probably be the best solution in the long run. Unless someone else on here has a better solution Big Smile


    I'm with blowdart here... Creating new tables is the best for this scenario here.

  • User profile image
    thumbtacks2

    The idea of the "what if" scenarios makes me think of using queries instead and manipulating data in Excel via pivot tables (if its possible)...or by using a reporting tool.  How often are these user-created db tables reused?

  • User profile image
    LostIn​Tangent

    thumbtacks2 wrote:
    The idea of the "what if" scenarios makes me think of using queries instead and manipulating data in Excel via pivot tables (if its possible)...or by using a reporting tool.  How often are these user-created db tables reused?


    A lot. When the user creates the version, they will use the app just like they did when not using a version. This will include adding/editing/deleting/importing records into all tables involved. They may choose to keep the version around for a few weeks or just a day.

  • User profile image
    kidzi

    OK, few things... is there a clean up strategy? (30 days old things are removed, etc)

    How does the tables record count get blown out of whack? What do you mean by this?

    I understand the versionid column, but what non-version related operations are you doing, and if you have indices on those, then what is the real issue? An index isn't a bad thing....

    It seems that if they are performing what ifs and making as many rows as they are, that creating tables for each of those situations will be a nightmare to maintain - even if it is done automatically, because now you'll have to know which table to go into, and looking at the table from a dbo perspective will be much much harder to aggregate and manage.

    If there is Real data and WhatIf data, then instead of making tables for each version, I'd make a set of six tables for the WhatIf scenarios and put all of the user generated stuff in there.  Then the live data is where it needs to be, and the what if can be separated off (and you can do cleanup operations on there at intervals, or however you want to do it). 

    If you put those tables in a different database, like "Live.Table1" and WhatIf.Table1" then you get the benefit of being able to use the same sprocs for both scenarios so you do not have to rewrite all of the 'what if' sprocs, so it is as close to production as possible.


  • User profile image
    LostIn​Tangent

    kidzi wrote:
    OK, few things... is there a clean up strategy? (30 days old things are removed, etc)


    No. A user is allowed create as many versions as they want and keep them around for as long as they need. In reality, they won't keep them for more than a few weeks, but the flexibility is them to allow them to keep them.

    kidzi wrote:
    How does the tables record count get blown out of whack? What do you mean by this?


    I apologize, I said that stupidly. What I meant was, with the current design, the amount of records in each of the 6 seconds get so heavilly balooned when versions begin to be created. It starts to affect performance.

    kidzi wrote:
    I understand the versionid column, but what non-version related operations are you doing, and if you have indices on those, then what is the real issue? An index isn't a bad thing....


    The 6 tables are the backbone of the entire application. There are numerous areas of the app that use the tables to derive calculated data. I wasn't saying having an index on VersionID is bad, but the tables already have numerous other indexes that are required to speed up selecting records from the tables outside of taking versioning into account.

    kidzi wrote:
    It seems that if they are performing what ifs and making as many rows as they are, that creating tables for each of those situations will be a nightmare to maintain - even if it is done automatically, because now you'll have to know which table to go into, and looking at the table from a dbo perspective will be much much harder to aggregate and manage.

    If there is Real data and WhatIf data, then instead of making tables for each version, I'd make a set of six tables for the WhatIf scenarios and put all of the user generated stuff in there.  Then the live data is where it needs to be, and the what if can be separated off (and you can do cleanup operations on there at intervals, or however you want to do it).


    Knowing which table to look to is simple because in the business logic I'll know whether a user is currently within a version and if so, what the version ID is. With that I can easily direct the query at the versioned table as opposed to the live table.

    I agree that having a live and "what if" set of tables would be very clean, but my concern is that performance would still degrade over time. The live tables would stay in tact and work nicely, but as users are creating more versions, selecting from the "what if" table will get slower and slower due to the massive amounts of data. Plus once a version is deleted, performing the DROP TABLE operations on the versioned tables would be almost instant, as opposed to having the delete the records from all 6 "what if" tables which would affect a lot of records, causing the tables indexes to need to be rebuilt.

    kidzi wrote:
    If you put those tables in a different database, like "Live.Table1" and WhatIf.Table1" then you get the benefit of being able to use the same sprocs for both scenarios so you do not have to rewrite all of the 'what if' sprocs, so it is as close to production as possible.


    I was already planning on using a seperate database.

  • User profile image
    kidzi

    OK then, good luck with that. 1 Million rows is not really all that much for Sql Server, and with the proper indices setup, it can work very well.  If there are performance issues, then normally it is due to a lack of some index, or due to the way the query is structured (inefficient where clause conditions due to sargability or the way the conditions are formatted (where an index cannot be derived0. 


  • User profile image
    Nullable

    kidzi wrote:
    OK then, good luck with that. 1 Million rows is not really all that much for Sql Server, and with the proper indices setup, it can work very well.  If there are performance issues, then normally it is due to a lack of some index, or due to the way the query is structured (inefficient where clause conditions due to sargability or the way the conditions are formatted (where an index cannot be derived0.


    It's true that a *few million* records isn't a big deal to SQL... but a few million * a few veriations now can equal to lets say 10 million (still not that bad as I've made and managed tables with 200million+ records for some clients).

    The issue comes in with the fact that when you *used* to join on lets say "AccountID"... now you have to join on VersionID,AccountID... so where you may have in the past used a ClusteredIndex on "blah" ... now you'd want to proably do VersionID,blah... I don't know if you follow the point but, you're greatly losing index value.

    If there are 6 tables, having 6 version ID columns to keep track gets messy.

    A solution that we (in my company) have used in the past is this very same "version" type table approach and joined them together with a UNIONing view.

    dbo.MarketingLeads200701 UNION ALL
    dbo.MarketingLeads200702

    Peace,

  • User profile image
    kidzi

    Nullable wrote:
    
    kidzi wrote:OK then, good luck with that. 1 Million rows is not really all that much for Sql Server, and with the proper indices setup, it can work very well.  If there are performance issues, then normally it is due to a lack of some index, or due to the way the query is structured (inefficient where clause conditions due to sargability or the way the conditions are formatted (where an index cannot be derived0.


    It's true that a *few million* records isn't a big deal to SQL... but a few million * a few veriations now can equal to lets say 10 million (still not that bad as I've made and managed tables with 200million+ records for some clients).

    The issue comes in with the fact that when you *used* to join on lets say "AccountID"... now you have to join on VersionID,AccountID... so where you may have in the past used a ClusteredIndex on "blah" ... now you'd want to proably do VersionID,blah... I don't know if you follow the point but, you're greatly losing index value.

    If there are 6 tables, having 6 version ID columns to keep track gets messy.

    A solution that we (in my company) have used in the past is this very same "version" type table approach and joined them together with a UNIONing view.

    dbo.MarketingLeads200701 UNION ALL
    dbo.MarketingLeads200702

    Peace,


    I still do not see how you're greatly losing index value, so you changed from using accountid to versionid... what is the big deal? You've already had to modify all of your scripts to support a version - that was part of the initial update. So now your clustered index is versionid, great - that is what indices are for.  I guess we can agree to disagree on this Smiley 


Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.