Hi There,
I am building a proof of concept application which makes heavy use of XmlDocument and XML parsing in general, at the moment being a POC I am getting around the performance issue of using Xpath to navigate the XML DOM with the use of caching. My main goal at
the moment is just to get the thing working and I have built an API around the companies XML schema to expose an object model to my application. My initial requirement was for it to work on .Net 2.0 only but I want to push for them to adopt 3.5 and one of
the reasons is the heavy use of XML and the ability to use LINQ to XML instead.
I have built my object model in a way that allows me to rip out the xpath stuff and stick in LINQ to XML underneath if I get the go ahead. My question is twofold :
1. What kind of performance gains can I see with LINQ to XML over standard XML DOM xpath queries if any.
2. If I can't go the LINQ to XML route can anyone suggest the best way to optimise the XPATH stuff. For instance to keep it simple im making alot of use of XmlDocoument.SelectNodes and SelectSingleNode methods. The application is really slow when coming to
refresh the cache and load the XML back up. Any suggestions here would be good.
Thanks
Lee
-
-
I can't attest to either question with any degree of certainty, however, my limited understanding of Linq to XMl is that there is no performance guarantee by switching from Xpath to Linq.
Reviewing various articles and newsgroups, I have seen a lot of unverified claims and rhetoric.
The lack of evidence is silly, Take a look here for example: the author claims that the linq version should 'probably' be faster in 'most cases', but then she provides a ridiculous sample utilizing an XML fragment that is neither well-formed nor rationally structured. basically, she used an extremely poorly designed XML fragment to illustrate her point. Cherry picking data to prove a theory isn't proof. If anyone ever came to me with that data structure I would throw them out of my office, seriously.
I'm not in a position to comment with any authority which is faster, but I find it hard to imagine that linq could outperform a well optimized xpath query against a reasonably well structured document.
That's just my 2 cents though, and I really don't have the experience with linq to substantiate it. I do however have tons (over 7 years of hardcore) experience with Xpath, XSL and XML. -
Thanks for your reply, I guess then it would not be worth forcing a move over to 3.5 for LINQ and I'll wait until I get the nod to develop on 3.5. If anything it makes the code more semantic, faster to develop and easier to read which I am all for.phreaks said:I can't attest to either question with any degree of certainty, however, my limited understanding of Linq to XMl is that there is no performance guarantee by switching from Xpath to Linq.
Reviewing various articles and newsgroups, I have seen a lot of unverified claims and rhetoric.
The lack of evidence is silly, Take a look here for example: the author claims that the linq version should 'probably' be faster in 'most cases', but then she provides a ridiculous sample utilizing an XML fragment that is neither well-formed nor rationally structured. basically, she used an extremely poorly designed XML fragment to illustrate her point. Cherry picking data to prove a theory isn't proof. If anyone ever came to me with that data structure I would throw them out of my office, seriously.
I'm not in a position to comment with any authority which is faster, but I find it hard to imagine that linq could outperform a well optimized xpath query against a reasonably well structured document.
That's just my 2 cents though, and I really don't have the experience with linq to substantiate it. I do however have tons (over 7 years of hardcore) experience with Xpath, XSL and XML.
Ok so my next question is with your expereience of xpath can you give me any tips or best practices I should be using when navigating the XML DOM.
I have basically built an API around something called NewsML from which there are around maybe 20-30 files I need to parse on each refresh of the cache. It's alot of XML to load in and my code takes a long time to get through it I am just using SelectNodes and SelectSingleNode methods of the XmlNode class with lots of foreach around XmlNodeLists, it doesn't strike me as the most optimized methods of navigating XML.
Anyway your thoughts would be much appreciated.
Cheers
Lee
-
You're using XPath on XmlDocument? I recommend you try using XPathDocument instead of XmlDocument. It's designed to read XML via XPath-queries and has a better performance.leeappdalecom said:
Thanks for your reply, I guess then it would not be worth forcing a move over to 3.5 for LINQ and I'll wait until I get the nod to develop on 3.5. If anything it makes the code more semantic, faster to develop and easier to read which I am all for.phreaks said:*snip*
Ok so my next question is with your expereience of xpath can you give me any tips or best practices I should be using when navigating the XML DOM.
I have basically built an API around something called NewsML from which there are around maybe 20-30 files I need to parse on each refresh of the cache. It's alot of XML to load in and my code takes a long time to get through it I am just using SelectNodes and SelectSingleNode methods of the XmlNode class with lots of foreach around XmlNodeLists, it doesn't strike me as the most optimized methods of navigating XML.
Anyway your thoughts would be much appreciated.
Cheers
Lee -
For help with XPath performance, try to avoid using the XPath recursive descent operator '//'. It is one of the most common mistakes that I see when reviewing under-performing XPath. For some reason, everyone just feels that they *have to* use it even when they don't need it - doh!
Also, when possible, be as node-specific as possible in the XPath. In other words, if the node you are looking for can ever only be 5 levels deep, then your XPath query should be specified such that it only looks in the 5th level for the node instead of anything above or below.
HTH -
leeappdalecom said:
Thanks for your reply, I guess then it would not be worth forcing a move over to 3.5 for LINQ and I'll wait until I get the nod to develop on 3.5. If anything it makes the code more semantic, faster to develop and easier to read which I am all for.phreaks said:*snip*
Ok so my next question is with your expereience of xpath can you give me any tips or best practices I should be using when navigating the XML DOM.
I have basically built an API around something called NewsML from which there are around maybe 20-30 files I need to parse on each refresh of the cache. It's alot of XML to load in and my code takes a long time to get through it I am just using SelectNodes and SelectSingleNode methods of the XmlNode class with lots of foreach around XmlNodeLists, it doesn't strike me as the most optimized methods of navigating XML.
Anyway your thoughts would be much appreciated.
Cheers
LeeHi, I'm looking for an implementation of any kind of newsML reader (or parser) do you have one? If you can help me with some info or sharing your API, I'll be send you thanks in advance.
Guillermo G.
-
Necro? I just have one comment, which is that XLinq works quite nicely with XPath. So you can do a.XPathSelectElement("b/c/d/e") and you will get back an XElement whereas if you use plain XLinq, you'll have to do a.Element("b").Element("c").Element("d").Element("e") (I don't think XElement.Element contains any magic there?)
-
In my opinion xml related programs and class libraries should use the System.Xml.Linq.dll library. There there are methods that simpify your work. Do not use System.Xml directly. That will complicate things. For querying use LINQ to XML or if the language you use does not use the XPathSelectElement method under the System.Xml.XPath.Extensions namespace which uses an Xpath query string to return an XElement. IF you want to create a data type in XML look at this project I made and extend it to your liking http://borgdylan.web.officelive.com/LinqXmlDatadll.aspx
Thread Closed
This thread is kinda stale and has been closed but if you'd like to continue the conversation, please create a new thread in our Forums,
or Contact Us and let us know.