Tech Off Thread

9 posts

Grab data from this HTML page

Back to Forum: Tech Off
  • User profile image
    henryed07

    Hi i have this website, it's created in HTML, and there are no XML pages or other pages similar to XML. I have been told that i should use "string.IndexOf method then i can extract the data from it using string.Substring

    i am new to coding before and i was wondering it someone was able to help me create the app, so that i can sell it.

     

  • User profile image
    W3bbo

    Don't. String parsing is very flakey, brittle, and will fail over time.

    Use HtmlAgilityPack instead, it provides you with a DOM-like API for working with HTML elements and can successfully parse broken HTML pages in a reliable fashion.

  • User profile image
    spivonious

    Correct me if I'm wrong, but can't XML parsers parse HTML? They're both markup languages, after all.

  • User profile image
    PerfectPhase

    @spivonious: True XHTML can be passed by an XML parser, but HTML can't be reliably as it does not require all tags to be closed, <br> for example.  As w3bbo said, HtmlAgilityPack is your friend in these cases.

  • User profile image
    exoteric

    @PerfectPhase:Does HtmlAgilityPack provide a method for this?

    "HtmlAgilityDocument" -> XDocument

    I see no reason to have to work outside of XML, even if the source document isn't strictly XML.

  • User profile image
    PerfectPhase

    @exoteric: I think it has a SaveAsXml function you could then parse with XDocument, not used it myself.  HAP is more XPath based in it's query system.

  • User profile image
    W3bbo

    , exoteric wrote

    @PerfectPhase:Does HtmlAgilityPack provide a method for this?

    "HtmlAgilityDocument" -> XDocument

    I see no reason to have to work outside of XML, even if the source document isn't strictly XML.

    In principle, no; but because HTML has some special nuances of its own that make it difficult to model as XML. If you've got a pure, validating (non-XHTML) document then you can get away with it, but so many web-pages have broken HTML on them, that require even more special and arcane rules for processing that isn't available as part of any XML-processing library.

  • User profile image
    henryed07

    thank you for the information

    Here is the code - 

    <select name="routeId" size="1" class="selection" onchange="OnLinieChanged()">
                   <option value="-1"> </option>
                  <option value="182/1">1</option>
                  <option value="182/1A">1A</option>
                  <option value="182/2">2</option>
                  <option value="182/2A">2A</option>

    How would one get this information from the code to the phone?. So i would put all the routes in a drop down list and then when the user selects a route the stops will change in another drop down list.

  • User profile image
    teslaBytes

    yeah, just parse the HTML as raw input from socket, then get the info you need, and create a web srevice to query the pattern.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.