Okay, I have a particulary ugly block of xml that I need to parse out and stick into sql server.
The challenge is getting this thing into the correct format since it has both element and attribute
peppered in it. I was just going to use a dataset and read the xml into it. But there are some problems
with that because the repeating <geo_co> tag in mail and residence address blocks gum up the works.
Anyone point me in the right direction? I have started to sort it out, but the voting regions return a
very strange row, this one has me baffled, I spent the better part of a day futzing with it. Any help would
------------[ Cut Here ]--------------------
DECLARE @IDOC INT
DECLARE @DOC VARCHAR(4000)
I was afraid that this would stump everyone. Here is the situation in case anyone replies with an overly simplistic answer. Unfortunately, this XML is being genarated by a state voting system so I have no way of asking them to modify it, what you see if
what I am getting and what I am expected to interface to. At this point I am thinking that I am going to have to parse this mess out line by line without being able to use an XML document to parse it. If anyone has any brilliant advice I'd love to hear it.
But, so far DataSet.ReadXML() is failing as well as OPENXML. So, I am faced with quiet the conundrum. What have otherse in this situation done? You are faced with a butchered XML file that the provider will not or cannot modify, that is close enough to real
XML to be flustering but not valid enough to be read as valid XML. ideas people? Or is this one just too tough for everyone?
Erm, who is Rumpi Gravenstein and how do they feel about you posting this information to this forum?
Rumpi is a test name, all the data is for testing alone and is a publicly published standard for the state that is requiring this interface.
I'm not sure if I got that right, but here ,latitude varchar(20) '/boe_vr_init/mail_address/geo_co'
,longitude varchar(20) '/boe_vr_init/mail_address/geo_co'
you're actually selecting the XML node geo_co instead of the latitude and longitude?
So, setting the correct XPath to
respectively might help.
On the other hand, I don't know * about SQL Server so, this post may be a troll. Anyway, just thinking...
Update: Oops, my bad. Forget this post.
I can't help much with the XML, but it's interesting to see test data that isn't all fake. The addresses listed in Cuyahoga Falls is legitimate. I know because I've been there (not to the suite, but to the office building).
Indeed! The address exists, that is interesting. The specification is public knowledge so I doubt that the publisher of it would put sensitive data into it. Just a short background. This is a voter registration system that is being handled by a mid western
state. They require that the counties within that state interface to their main voter registration system. The rub is that the state's system is Oracle (YUK!) and the XML that they give you is butt ugly. The communication layer between the state and the counties
is all done via XML packets. The state sends a county a packet and you respond back with a packet. What is ugly about it is that withing the spec the Oracle produced XML does not conform to the common XML specs. For instance, if they send more than 1 voter
in an XML file it will actually look like this!!!!! YES, I am not lying! The actually will place a new <?xml> tag at the top of each new row in the file instead of wrapping it in a root tag. For the life of me I cannot figure out why they (or Oracle) would
do this, it boggles the mind.
-----------[ XML voter registration packet ]--------
<?xml version="1.0" encoding="UTF-8"?>
<?xml version="1.0" encoding="UTF-8"?>
------------[ end of line ]----------
It sounds pretty nasty. i'm not sure if any of our xml stack will cope with the multiple <?xml tags.
Can you process it in the middle tier maybe?
Use string / regex stuff to pull out the fragments and then process them into some sql friendly strucure ( generate a batch of inserts maybe? ) using an xmlreader over the fragments?
It could get kinda chunky depending upon the filesize, but you could store the data as a string and
Or string.replace("<?xml version=""1.0"" encoding=""UTF-8""?>" & vbCrLf, "")
That is, if you know it is coming from a windows machine. Otherwise, you would need to find out how they do their CRs or just ommit the & vbCrLf since when you read in the xml it doesnt care about white space.
My experience with XSLT is that if you can understand it and it is at least Formatted correctly (tag hierarchy), you can write a declaritive StyleSheet to transform it into just about anything you want. Depending on the complexity, this might not be better
than a procedural analysis of the XML.
Of course providing multiple voters as multiple XML blocks is totally stupid, and requires that you write at least SOME procedural code to break it up.
I would suggest creating a format which is your objective (ie. one that will work), and then it should be pretty clear what kind of transformation needs to take place. If it is not complicated, then it should be pretty easy to write an XSLT to take care of
it, especially if you just need to change a couple of tag names.
Regardless, it sounds like you won't be able to get away with directly importing it into SQL Server.
ps. I didn't actually have time to examine the structure of the XML. If you'd like me to make a more specific suggestion, let me know that you are interested in this kind of solution and what kind of changes need to take place.
That's odd. I don't see anything wrong with the XML you posted. Sure 'geo_co' is used twice, but that should be fine. It means the same thing in both cases, simply with a different context. I have little experience dealing with XML in SQL Server, but
I'd have to say that if it can't parse this it is SQL Server's problem...
Actually, I take that back...I could see a problem with importing this directly in that this XML block is really several table joined together and then seperated structurally in the XML. Still...SQL Server should be able to parse this!
Forgive me if this is a dumb idea, but does the state have any published schema for this document -- can you just get the XSD and use that to generate appropriate classes to read the data?
Alternatively, you can try using the XSD.exe tool in the .NET framework to have it guess at a schema (the exact parameter escapes me ATM, but it's there).
Since it looks like you'll need to do some middle-tier processing anyway (to break-up the multiple documents in the stream), can you load up each document into an XPathDocument and parse out the needed info? You could then use your favorite ADO.Net technique
to load the data...
Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.