Tech Off Thread

4 posts

Retrieving all valid URLs/URIs from a string

Back to Forum: Tech Off
  • User profile image
    SlackmasterK

    So, I'm trying to add trackbacks to my blog, the code to which I'm writing myself (with your help). To do so, my belief is I need to pull all URLs / URIs out of the body of a post as I post it, pull those pages back, and look for a trackback:ping property within a rdf:RDF tag within the response.  Correct me if I'm wrong. I retrieved the tech specs to the Trackback from: http://www.sixapart.com/pronet/docs/trackback_spec

    I can easily get a string representing the body of an entry as I post it. How can I retrieve all URIs/URLs within that string without having to mess with substrings and IndexOf() each possible extension (/, .php, .aspx, etc...)?

  • User profile image
    TommyCarlier

    I think a regular expression is definitely the way to go for this problem. You define a regular expression to extract URI's, and then use the Matches-method to get all the matching URI's.

  • User profile image
    YuviPanda

    I think this regexp is what you need:
    <a href="(?<url>http://.*?)".*>(?<text>.+?)<\/a>
    That's a .NET regexp though, so you may need to modify it to work on your blogging platform's language....

  • User profile image
    SlackmasterK

    Got that working... Now, I'm looking into Trackbacks; they're very popular in the blogosphere, but apparently difficult to implement when one writes his own blogging platform. I've researched the tech specs and I've gotten it to the point where by all logic, it should work. However, my trackback sends keep returning an error saying I'm missing my referring URL despite it being there. The code for such is as follows; if anyone can tell me wtf I'm doing wrong I'd appreciate it.

    string content =
    "POST http://admin.damnednice.com?Req=Trackback&ID=" + Entries.Rows[0]["ID"].ToString() + "\r\n" +
    "Content-Type: application/x-www-form-urlencoded; charset=utf-8\r\n" +
    "title=" + Entries.Rows[0]["Title"] + "\r\n" +
    "excerpt=" + Entries.Rows[0]["Description"] + "\r\n" +
    "url=\"http://admin.damnednice.com?Req=Post&ID=" + Entries.Rows[0]["ID"].ToString() + "\"\r\n" +
    "blog-name=DamnedNice\r\n" +
    "added = " + DateTime.Now.ToShortDateString() + " " + DateTime.Now.ToShortTimeString();


    Resp = Lib.readHtmlPage(URI,content);
    if (Resp != null && Resp.Contains("trackback:ping=")) {
    i1 = Resp.IndexOf("trackback:ping=") + 16;
    i2 = Resp.IndexOf("\"", i1);
    if (i1 > 0 && i2 > 0) tburl = Resp.Substring(i1, (i2 - i1));
    if (i1 > 0 && i2 > i1)
    Resp = Lib.readHtmlPage(tburl, content);

    // ...

    public string readHtmlPage(string url, string e) {
    string ReturnStr = "";
    System.IO.StreamWriter myWriter;
    System.Net.HttpWebRequest objRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);

    objRequest.Method = "POST";
    objRequest.ContentLength = e.Length;
    objRequest.ContentType = "application/x-www-form-urlencoded";
    objRequest.KeepAlive = false;
    myWriter = new System.IO.StreamWriter(objRequest.GetRequestStream());

    try { myWriter.Write(e); }
    catch (Exception ex) { return ex.ToString(); }
    finally { myWriter.Close(); }

    System.Net.HttpWebResponse objResponse = (System.Net.HttpWebResponse)objRequest.GetResponse();
    System.IO.StreamReader sr = new System.IO.StreamReader(objResponse.GetResponseStream());
    ReturnStr = sr.ReadToEnd();
    sr.Close();
    myWriter.Close();
    return ReturnStr; }

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.