Tech Off Thread

4 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Retrieving all valid URLs/URIs from a string

Back to Forum: Tech Off
  • User profile image

    So, I'm trying to add trackbacks to my blog, the code to which I'm writing myself (with your help). To do so, my belief is I need to pull all URLs / URIs out of the body of a post as I post it, pull those pages back, and look for a trackback:ping property within a rdf:RDF tag within the response.  Correct me if I'm wrong. I retrieved the tech specs to the Trackback from:

    I can easily get a string representing the body of an entry as I post it. How can I retrieve all URIs/URLs within that string without having to mess with substrings and IndexOf() each possible extension (/, .php, .aspx, etc...)?

  • User profile image

    I think a regular expression is definitely the way to go for this problem. You define a regular expression to extract URI's, and then use the Matches-method to get all the matching URI's.

  • User profile image

    I think this regexp is what you need:
    <a href="(?<url>http://.*?)".*>(?<text>.+?)<\/a>
    That's a .NET regexp though, so you may need to modify it to work on your blogging platform's language....

  • User profile image

    Got that working... Now, I'm looking into Trackbacks; they're very popular in the blogosphere, but apparently difficult to implement when one writes his own blogging platform. I've researched the tech specs and I've gotten it to the point where by all logic, it should work. However, my trackback sends keep returning an error saying I'm missing my referring URL despite it being there. The code for such is as follows; if anyone can tell me wtf I'm doing wrong I'd appreciate it.

    string content =
    "POST" + Entries.Rows[0]["ID"].ToString() + "\r\n" +
    "Content-Type: application/x-www-form-urlencoded; charset=utf-8\r\n" +
    "title=" + Entries.Rows[0]["Title"] + "\r\n" +
    "excerpt=" + Entries.Rows[0]["Description"] + "\r\n" +
    "url=\"" + Entries.Rows[0]["ID"].ToString() + "\"\r\n" +
    "blog-name=DamnedNice\r\n" +
    "added = " + DateTime.Now.ToShortDateString() + " " + DateTime.Now.ToShortTimeString();

    Resp = Lib.readHtmlPage(URI,content);
    if (Resp != null && Resp.Contains("trackback:ping=")) {
    i1 = Resp.IndexOf("trackback:ping=") + 16;
    i2 = Resp.IndexOf("\"", i1);
    if (i1 > 0 && i2 > 0) tburl = Resp.Substring(i1, (i2 - i1));
    if (i1 > 0 && i2 > i1)
    Resp = Lib.readHtmlPage(tburl, content);

    // ...

    public string readHtmlPage(string url, string e) {
    string ReturnStr = "";
    System.IO.StreamWriter myWriter;
    System.Net.HttpWebRequest objRequest = (System.Net.HttpWebRequest)System.Net.WebRequest.Create(url);

    objRequest.Method = "POST";
    objRequest.ContentLength = e.Length;
    objRequest.ContentType = "application/x-www-form-urlencoded";
    objRequest.KeepAlive = false;
    myWriter = new System.IO.StreamWriter(objRequest.GetRequestStream());

    try { myWriter.Write(e); }
    catch (Exception ex) { return ex.ToString(); }
    finally { myWriter.Close(); }

    System.Net.HttpWebResponse objResponse = (System.Net.HttpWebResponse)objRequest.GetResponse();
    System.IO.StreamReader sr = new System.IO.StreamReader(objResponse.GetResponseStream());
    ReturnStr = sr.ReadToEnd();
    return ReturnStr; }

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.