Posted By: zzzzz | Apr 3rd, 2006 @ 5:33 PM
page 1 of 2
Comments: 27 | Views: 19615
zzzzz
zzzzz
Yes its an Economy vehicle
Let me state this i don't want this to turn into a flame war.  I was curious to see the difference between PostgreSql and MSSQL sense its starting look like that i will going to PostgreSql

So here is the performance difference  i can see playing with a small chunk of data 35,534 records 17 megs in size so each record is about 478.42 bytes in size

What i did was load PostgresSql and SQL2005 on the host os  then tested it from a VMWare machine installed on the same computer.  the client tool is Foxpro 9.0 sp1. Used the ODBC connections for MSSQL and PostgreSQL

My computer hardware
p4 3.2gigahertz  with hyper threading turned on
512  megs of memory
100megs network
vmware version 5.0
    guest OS WinXp Pro sp2
Host os
    Win XP pro Sp2
    PosgreSql 8.1.3
    Sql Express 2005

 
the first test was get all 35,534 records and looping it 5 times disconnecting the ODBC connection each time so the servers would have to build the SQL statement each time then i test it without dropping connections

Select * from qclog

Average seconds to return record set disconnecting and reconnecting to the server s
4.2844 PostgreSql
1.6594 MSSql 2005

then with testing without disconnecting
4.6656 PostgreSql
1.2376 MsSql 2005

Select * from qclog order by inspecnum  Bot servers have index on this table.

Average seconds to return record set disconnecting and reconnecting to the servers 
5.2876 PostgreSql
3.0126 MSSql 2005

then with testing without disconnecting
 5.0906 PostgreSql
 2.7782 MsSql 2005


Then a test where the server would have to use a adhoc index in other words sort a field that does not have an index?

Select * from qclog order by dateinspec

Average seconds to return record set disconnecting and reconnecting to the servers
4.9626 PostgreSql
2.5970 MSSql 2005

then with testing without disconnecting
 4.8530 PostgreSql
 2.6688 MsSql 2005

I did not try to optimize the servers at all i just keep the way it is installed no changes to the config files.

I know i will get H E double hockey sticks over this but hey i wanted to see the difference between the two servers, and share it.

here is the test code
DIMENSION testresults(5,2)

FOR i = 1 TO  5
    lnConPost = SQLCONNECT("postgres")
    lnCon2005 = SQLCONNECT("erpdeve", 'justin@emproshunts.com', 'amsc516')
   
    starttime = seconds()
    SQLEXEC(lnconpost, 'Select * from qclog order by dateinspec ', "qclog")
    timed = seconds() - starttime
    testresults[i,1] =  timed

    starttime = seconds()
    SQLEXEC(lncon2005, 'Select * from qclog order by dateinspec ', "qclog2")
    timed = seconds() - starttime
    testresults[i,2] = timed
    SQLDISCONNECT(lnconpost)
    SQLDISCONNECT(lncon2005)
    CLOSE DATABASES all

ENDFOR

average1=0
average2=0

FOR ia = 1 TO 5
    average1 = average1+testresults[ia,1]
    average2 = average2+testresults [ia,2]
ENDFOR
? average1/5
? average2/5
IMHO, although PostgresSQL have Windows version, don't use it if you can install a Linux one, as their Windows version is known to have performance issue and runs very much slower than their Linux one.

 PostgresSql is arguably the worst  performance commercial database on Windows platform.

I'm not intended to say PostgresSQL is a bad one, when running on Linux its performance in some test can outrun other database like Oracle, but seems their code is too optimized for running on Linux.

Interesting results, I've been wondering about the performance differences myself.

In my opinion PostgreSql is much nicer to work with as far as the language goes. I created transforms to generate DDL for a few vendors like SQL Server 2005, Oracle, PostgreSql, and DB2. Our approach was to generate to the SQL 2003 Standard and then override any differences in the output for each vendor. Out of the list PostgreSql was much closer to the standard than the others. In fact, most vendors (excluding PostgreSql) for various features in the standard was a joke. Oracle being the worst by far. I haven't had a chance to work with MySql yet, maybe they're better, not sure.

AdityaG
AdityaG
OMG VISTA FTW LOLZ!!1one
androidi wrote:
Another thing of note:HOWEVER if a lot of (thousands) of records were returned in result of the query, then query from SQL Express was always faster 100% of time. I am thinking that the MySQL connector is slowing things down when there is any records returned. I'd also note that SQL server express memory/working set went from 30 to ~50 MB during runs and MySQL went from 3 to 25.


I think MS does things a bit different when it comes to using memory. I have seen that VS2005, it takes up a good bit more memory but the actual application itself kicks (I need to watch my language) when it comes to speed. (It's not really a leaky memory hog like some of the Flash MX2004 versions... but still takes up more memory than previous versions). I think SQL server was optimized in the same way.

I personally prefer MySQL because I do a lot of work in PHP (can't afford ASP.NET 2.0 hosting yet Tongue Out) but I have heard some really good things about the others. The speeds are usually just a bit off.. but I guess it makes a difference to the uber-enterprise devs.

Either way, I have to say. 1 whole day and no flames. Awesome thread! Wink
blowdart
blowdart
Peek-a-boo
androidi wrote:

I'd also note that SQL server express memory/working set went from 30 to ~50 MB during runs and MySQL went from 3 to 25.



And who cares? Well if you're using MySQL with .NET and getting large results back often then you may want to see if you get better perf with some other solution than MySQL (on windows) + Connector.



Your memory stats are interesting. On my laptop if you look at Task Manager you'll see that a "sleeping" SqlExpress instance takes up around 750K. That's right. K. (My normal Sql server workstation instance on this box takes around 8Mb)

Now once you send a connection towards it and it wakes up it will ramp up, and start taking up its runtime memory footprint, loading the indexes into memory, starting the query optimiser, etc.

MySql on the other hand is always "ramped up".

So first connection/query time is always going to be slower with SqlExpress.

Then there's the speed of the connector. Whilst not a .net issue the options on the ADO connector for MySql are worrying. I can set options on the connection string which simply break ADO. I had to (quickly) pull some data out of a bugzilla database, so I wrote a WSH script to do it. Every time I tried rs["FieldName"] where the contents of the field were null I would get ADO errors. Even if I tried to do a null check. I tried fiddling with the option bitmask, to no avail. I ended up having to do on error resume next. Frankly that worried me.
Sourcecode
Sourcecode
Whatever it is, I didn't do it.

androidi wrote:

And who cares? Well if you're using MySQL with .NET and getting large results back often then you may want to see if you get better perf with some other solution than MySQL (on windows) + Connector.


I am thinking about MySQL to replace SQL Server because of cost implications for our prospective clients. I only us the DB for full text indexing and searching.

*Normally*; you would not return large result sets back to the client unless it is absolutely required.

So would the tests be positive/same between the 2 for smaller results sets?

How would you compare the 2 in a full text index and search context? Your opinions. I don't need numbers..

Thanks

SQL 2005 Express creates databases with the 'autoclose' option switched ON by default, if you create them simply using the 'CREATE DATABASE' statement. When you first create a connection to a particular database, it runs through various checks to mount the database and roll forward the transaction log, and that takes a little time. By contrast the full versions of SQL Server mount all databases at startup.

If you use the SQL Server Management Studio [Express] graphical tools, Auto Close defaults to off. I'm not sure what happens if you use VS2005.
Sourcecode
Sourcecode
Whatever it is, I didn't do it.

zzzzz wrote:

On large or Small data sets it appears MSSQL has most of performance advantage.  35,500 records is not a large data set to return to a client.   But the PostgreSql language options seem to be in the front compared to MSSQL. 

IMHO .Net data access tools suck.  There is to much overhead and not enough flexiblity in its design...  Now its allot better than the orginal ADO and oleDB tools which where worthless...  What i mean by this is once the client has the data it very difficult to change, filiter or move the data into other Datatables.  This should be fixed by with LINQ


Thanks for the input.

This is not a rude remark, or flame bait. I'm quite sure you have a reason for sending 35,000 records over the wire.

My associates, and clients would take away my keyboard for sending that much data over the wire. What is the User to do with all that data? How can an average user consume and/or comprehend that much data?

Sorry, I’m a huge advocate of only giving the necessary data the user *needs*, not loading it all, and letting the user deal with using it. It’s very rare that I give the User more then 20 rows to look at on a page. If it’s not in the first 3 pages a User will search or sort it.

I don’t even give them all columns if their preferences do not specify it. Sure they can add cols at any time. I only pull data required for the view at hand.

Or maybe it’s just me? I’m working on a web-based app so; I have to think about memory and limits. Even on reports or charts I try to get it down the least amount of rows server side before I send it over the wire.

That said it does make it easier for the dev to sort, and filter etc. Also that way you’re using the client processes (which are usually idol) instead of the server.

TommyCarlier
TommyCarlier
I want my scalps!
Our main application is a client/server application (WinForms client, IIS server (ASP.NET web service), SQL Server 2005) where it's very common to send large amounts of data from the server to the client. There are forms with GridControls of 30.000 rows and 30 columns. When measuring everything, we discovered that reading the data from SQL Server, serializing, compressing, encrypting, sending it to the client (via HTTP), decrypting and decompressing took less time than filling the GridControl on the client. The huge advantage you have, is the rich client experience: once the data is on the client, the user can sort, group and filter extremely fast. And if you use client-side caching, you can even reduce the amount of data that is transfered.
Sourcecode
Sourcecode
Whatever it is, I didn't do it.

TommyCarlier wrote:
Our main application is a client/server application (WinForms client, IIS server (ASP.NET web service), SQL Server 2005) where it's very common to send large amounts of data from the server to the client. There are forms with GridControls of 30.000 rows and 30 columns. When measuring everything, we discovered that reading the data from SQL Server, serializing, compressing, encrypting, sending it to the client (via HTTP), decrypting and decompressing took less time than filling the GridControl on the client. The huge advantage you have, is the rich client experience: once the data is on the client, the user can sort, group and filter extremely fast. And if you use client-side caching, you can even reduce the amount of data that is transfered.


I see, and I'm aware of the benefits, and I even agree to a certain extent. With fat apps we did this as well. Yes, it does provide faster client manipulation. However I strongly feel that even though this was acceptable and pertinent on a fat client (even I did it). I don't see it the same way anymore. If you give a user 30,000 rows of data what is he expected to do with all that. He only requires certain rows. He is not going to scroll through 30,000 rows. He won’t open, or modify even 20-100 rows in a session for that particular data set. He quite likely has multiple applications open all eating up memory resources.

I thought about compression and yes it works but only on large datasets, on a smaller dataset it slows things down.

Sure it’s easier from a developer perspective, but where is the real benefit for the User. If you as a User load up a grid, the very first thing you do is filter or sort. It’s very rare the record you search is on the first line. So why not filter or sort before hand? This is in the context of a web app, and possible bandwidth, or target device limits.

I guess it comes down too what your goals are. I suppose there is no “right/wrong way” to do things.

As I stated above I only use a db for indexing and search. The actual data is stored as xml files. It may sound crazy, but the goals of the product demand it. XML storage I’m finding is really quite robust, and extensible.

All our other products to this point were SQL Server back end, and logically so. Unfortunately for this particular product it would be a risky undertaking.

So, how will a web app that loads that much data reliably scale up, or down? Down to other more limited devices (insert Origami, Tablet, or Pocket PC here), or up to many, many users geographically dispersed?

Another point comes to mind with regards to the client caching, when you don’t know the target OS or browser.

Your points are valid. But, I can’t assume a certain device, OS, or browser. Rather, I choose not to.

Is it just me, or is this thread getting way off topic? No thanks to me of course. Big Smile

Thanks for the input.

DoomBringer
DoomBringer
Doom!
zzzzz wrote:
On the memory footprint i never understood why MS is always larger???

On my Computer with nothing running its 8 megs for MSSQL, PostgreSql is 1.24 megs

When running the test MSSQL jump to 50 Megs, PostgreSql  about 10megs

So it seems MS is moving the data to memory, plus MS keeps the prepared version of SQL statement in memory even if the connection which created it is disconnected.   PostgreSql states it does not keep the Prepared Sql statements once the client disconnects.

You can tell SQL Server to consume more or less memory, as you want.  For me, I don't care if it uses that much memory, as long as its performant.  What it might be doing is just allocating a large chunk of memory and using it in whatever way.
I know with Oracle you can tweak memory stuff to no end, SQL server is the same.  I don't know about PostgreSQL.
Sourcecode
Sourcecode
Whatever it is, I didn't do it.

zzzzz wrote:
SourceCode you make good arguments for sending limited data to the client and it seems to be the commone theme when programming to databases todate.  I agree with you reasons and do what you do.

But the reason to send this much data is allow the user to decide how they want to play with  it or create reports in excel or what not.

I here endless complaints from my users when they are looking for something and need to keep fetching more records from the server.
In this case the user has an idea what they looking for but can't figure out how to but it in search box so they just want to scroll through all records until they find it.  The comment is "I'll know what i'm looking for when i find it"


What i have done with the interfaces is put an option in to select how many records to return to the client.  It defaults to 100 records  but a user and reset it to 1,000,000 if they like. This way its up to user if they want a fast interface or a slow one. 





In the context you spell out there, it's fine. I was not assuming to know the right, or wrong way. As I said I also allow large datasets but am re-thinking my motivation for that.

My motivations are turning to. How can I allow a user to search in a way he comprehends. So that his search can be what he is thinking? How can I make programs smarter?

For example we(our older programs) have a user search by selecting combo boxes and dates etc., and the result usually looks similar to.

(Model='Ford') And (Color='blue')

You can select on fields from related child tables etc.. Slick from my point of view.

The above context (and I still use it) has been around for years. Why are we not evolving and making the programs smarter. Closer to what the user is thinking, which is.

”I want to see all blue ford car's.”
 
I feel like the development of software is in a rut. Not from a technology stand point but from a user interaction stand point. That is why I grabbed Coopers latest book, the analogy above is from the book, different context but just the same. He is quite fascinating to read.

So now after reading that I often find my self in a dilemma, asking myself, does the user want this? How can I break this back to the lowest function set, that will help 80% of the users, and put more time into making the software smart. I can’t assume to know what they want but I should be able to take what they want in their language, and turn it into something tangible that will help.

Again off topic. Sorry for that.

There just has to be other ways to do things, are we stuck in a black box?

Thanks for the input Smiley

Another thing of note:

Using C# with MySQL Connector  with old MySQL 3.23.53 on Windows
vs
Using C# with SQL Server Express 2005

Everything locally on same Windows machine and warmed up so that everything is in memory.

The test was simple: select * from blah where name like '%xyz%'

the "xyz" was 1-3 random letters and the query returned 0-200 rows with 3 letters and almost every row with 1 letter. The table had ~30000 rows.

Results:

MySQL was always* faster on the FIRST run (ctrl+f5 Release Build execute). (*provided only few or none results were returned)

On subsequent loops with random letters, when no matches were found, both were about equal.

HOWEVER if the returned rows were in the thousands, then the query from the SQL Server Express 2005 was always faster.

I believe the MySQL .NET Connector has overhead, however without verifying this with newer MySQL 4/5.xx version, I'd not jump into any conclusions over these results. It may be the Connector is using a method of fetching the records that involves more overhead when used with MySQL 3.xx. Pure speculation though.


Another observation was that the SQL Server Express 2005 memory/working set went from 30 to ~50 MB during the runs where MySQL went from 3 to 25.



And who cares? Well if you're using MySQL with .NET and getting large results back often then you may want to see if you get better perf with some other solution than MySQL (on windows) + .NET Connector.

blowdart wrote:
androidi wrote:
I'd also note that SQL server express memory/working set went from 30 to ~50 MB during runs and MySQL went from 3 to 25.

Your memory stats are interesting. On my laptop if you look at Task Manager you'll see that a "sleeping" SqlExpress instance takes up around 750K. That's right. K. (My normal Sql server workstation instance on this box takes around 8Mb)


Here are the stats on my computer when no queries have been made (just booted up the computer):

MSSql 2005 Express: mem usage 6,5 MB peak 25,4 MB pagefault ~9100
MySQLd-nt 3.x: mem usage 2,7 MB peak 2,7 MB pagefaults ~700


The results you quote were from warmed up state, such that both had been queried already hundreds of times for all the records with all kind of random queries. 


Anyway I just put the mem things there as a curious side note. The only important takeaway from my test is that it is better to verify whether the .NET Connector is a source of major overhead in ones particular situation, since in my tests it severely impacted the large results from MySQL.


Sourcecode wrote:
I am thinking about MySQL to replace SQL Server because of cost implications for our prospective clients. I only us the DB for full text indexing and searching.  


I can't really compare the performance of particular DB, the "test" was just something I wrote to quickly see what kind of (dis-)advantages there would be for a very specific scenario occuring on a *development* environment (aka LAPTOP).

I would also point out that I read somewhere (haven't verified this since I only used it for non-commercial use) that using MySQL for commercial purposes is not free?!? I was quite surprised to read this, after all using Sql Server Express for commercial purposes is free (with some very specific exception)... To be safe (in terms of free) I'd recommed to look into PostgreSQL if SQL Express does not meet your needs.

Sourcecode wrote:
*Normally*; you would not return large result sets back to the client unless it is absolutely required.


From user standpoint I'd say it is better to return larger* result set than *just what the user can view at once*. This is best seen in these damned web forums which use paging and someone  thought that the user would never look beyond the first page. If paging is used and 15 rows fit in the screen, return atleast 3*15 rows and let the client decide what is shown. Though in some cases this kind of read-ahead caching can also surface unexpected issues...

*larger in the context of web forums means anything more than what might be needed.  Mad

Having the option to query and view a larger than usual amount of records is sometimes very useful though and if that option is missing the user will be swearing...
Sourcecode
Sourcecode
Whatever it is, I didn't do it.

zzzzz wrote:
In the same boat your in.

The one thing i'm trying to figure out is trying to get Email, hand written notes, and the Erp database into one comprehensive database that can managed from one interface.

It is becoming a common occurence where a customer calls up wants to discuss a conversation that has taken place in email but that person is not here that day so the Service Agent on the Phone is lost for words because half what has been discussed is hidden in some other person's  inbox. 

I don't know how many times i have to go through a users inbox looking for emails dealing with that customer then forwarding it off to the Service Agent.  This is just annoying..  Plus i'm on the recieving end of this when dealing with vendors where sales people have not clue what the other sales people are doing.


hmmm, interesting that the new product we are working on does just that + more Smiley

If your interested in a collaborative effort, and/or more details let me know in a private email.

blowdart
blowdart
Peek-a-boo
Sourcecode wrote:

”I want to see all blue ford car's.”



What ever happened to English Query?
Sourcecode
Sourcecode
Whatever it is, I didn't do it.
androidi wrote:

Sourcecode wrote: *Normally*; you would not return large result sets back to the client unless it is absolutely required.

From user standpoint I'd say it is better to return larger* result set than *just what the user can view at once*. This is best seen in these damned web forums which use paging and someone  thought that the user would never look beyond the first page. If paging is used and 15 rows fit in the screen, return atleast 3*15 rows and let the client decide what is shown. Though in some cases this kind of read-ahead caching can also surface unexpected issues...

*larger in the context of web forums means anything more than what might be needed. 

Having the option to query and view a larger than usual amount of records is sometimes very useful though and if that option is missing the user will be swearing...



I suppose you're correct. The problem is that we are striving to find a middle solution that

1: Gives the user what s/he wants/needs in a goal/task orientated environment.

2: Makes the product accessible between groups of dispersed Users using multiple/different devices.

3: Provides the User with an experience, and expectations that are the same where possible between devices.

4: Keeps infrastructure requirements of the product to a minimum on the server side and trivial on the client side.

5: Try’s to make the most of available bandwidth, and resources server side, and make no assumptions of the bandwidth, or resources on the client side.

So given the 5 constraints above (insert more here), where is the middle line?

The problems arise mostly from the need to support multiple devices, bandwidth, and dispersion of users.

So the choices are either support multiple devices, or not. When you support multiple devices (insert some farmer living in the countryside) your bandwidth expectations are not constant, or guaranteed. A doctor visiting a patient in the countryside does not have a 2mbps connection available. At least we can count on GPRS dial up (in Europe => 128k). We can assume/insist that a nomad user is to have at least a portable telephone that can be used for data, and some sort of handheld device.

So in theory, you could support all this by identifying the device and bandwidth for a given session, and then have your web services or back end systems account for the limitations. However someone with no limitations (insert desktop local access) pulling a large dataset would be resource intensive server side. This will in fact penalize the users with limitations.

So the choice was made to use xml for data stores server side, and DB for search. Each user has a set of data stores that can be shared with other users. The search results are another type of xml data store. The data stores can be pulled, or pushed to the required device/destination.

The data store structure and model code is constant across devices. Adapting the View to the/a device is trivial.

I’m having trouble identifying a middle ground where all Users/devices are treated equal. It should be possible to find a solution. Or maybe we should stop worrying about this stuff and let the chips fall where they may. I prefer the solution. Though I sometimes get carried away with questioning.

Thanks for the input. Smiley

Sourcecode
Sourcecode
Whatever it is, I didn't do it.
blowdart wrote:
Sourcecode wrote:

”I want to see all blue ford car's.”



What ever happened to English Query?



AFAIK, and from what I have researched English Query (input) technology is not up to snuff. It has problems with context and is rarely correct in it’s assumptions (from what I know, and I could be wrong).

English Query (output) on the other hand has some great possibilities for user-aided search.

Where by allowing a user to create a phrase using various selection mechanisms. So it is possible that a user would read/think a query as “I want to see all blue cars” and build that sentence he is thinking about from name/value pair lists. This is what our new product will do. As choices are made the other selection lists update dynamically with choices that would pertain to the last choice made by the user.

The user does not build a search query using some cryptic language that we enforce. Rather they build a phrase.

I will have a second look at it though. I didn’t spend more then a couple day’s researching it.

What are your opinions of English Query? Gut feeling, or otherwise.

Thanks for the input.

Sourcecode wrote:
So given the 5 constraints above (insert more here), where is the middle line?


If the users only occasionally like to have largers sets of data available at once, there are many ways that could be done without stressing the server. You could have a GUI where the user can enter a large result query into a queue which would be executed when the server has very low usage (night?) and the results dumped into separate server where the results are downloadable as packed xml (like wikipedia,imdb). Another solution might be to have separete server dedicated to these large queries such that the primary db would not be affected. The client UI would preferably then allow querying servers both at once (to allow working while waiting large result set)..

Few more fancy/technical solutions come to mind but when the user needs to shift through a ton of data, it is best to have the data locally if at all possible, anyone who has used Access with linked servers knows it just ain't as nice as if the data was local.
Sourcecode
Sourcecode
Whatever it is, I didn't do it.
androidi wrote:
Sourcecode wrote: So given the 5 constraints above (insert more here), where is the middle line?


If the users only occasionally like to have largers sets of data available at once, there are many ways that could be done without stressing the server. You could have a GUI where the user can enter a large result query into a queue which would be executed when the server has very low usage (night?) and the results dumped into separate server where the results are downloadable as packed xml (like wikipedia,imdb). Another solution might be to have separete server dedicated to these large queries such that the primary db would not be affected. The client UI would preferably then allow querying servers both at once (to allow working while waiting large result set)..

Few more fancy/technical solutions come to mind but when the user needs to shift through a ton of data, it is best to have the data locally if at all possible, anyone who has used Access with linked servers knows it just ain't as nice as if the data was local.


Thanks Smiley

Those are possible solutions to research.
page 1 of 2
Comments: 27 | Views: 19615
Microsoft Communities