Tech Off Thread

8 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Regex Question

Back to Forum: Tech Off
  • User profile image
    Animaonline

    How do I extract:
    77u/Q29uc29sZS5Xcml0ZSgiVGVsbCBtZSB5b3VyIG5hbWUgc2lyOiAiKTsNCiBzdHJpbmcgbmFt
    ZSA9IENvbnNvbGUuUmVhZExpbmUoKTsNCiAJaWYgKG5hbWUuVG9Mb3dlcigpID09ICJyb21hbiIp
    DQogCXsgCQlDb25zb2xlLldyaXRlTGluZSgiV2VsY29tZSBNYXN0ZXIhIik7DQogCX0NCgllbHNl
    DQoJew0KCQlDb25zb2xlLldyaXRlTGluZSgiSW50cnVkZXIhIik7DQoJfQ0KCQ0KIENvbnNvbGUu
    V3JpdGVMaW5lKCJDb21wbGV0ZS4uLiIpOw0K



    From:

    Content-Disposition: attachment; filename="ACSS Script.acss"

    77u/Q29uc29sZS5Xcml0ZSgiVGVsbCBtZSB5b3VyIG5hbWUgc2lyOiAiKTsNCiBzdHJpbmcgbmFt
    ZSA9IENvbnNvbGUuUmVhZExpbmUoKTsNCiAJaWYgKG5hbWUuVG9Mb3dlcigpID09ICJyb21hbiIp
    DQogCXsgCQlDb25zb2xlLldyaXRlTGluZSgiV2VsY29tZSBNYXN0ZXIhIik7DQogCX0NCgllbHNl
    DQoJew0KCQlDb25zb2xlLldyaXRlTGluZSgiSW50cnVkZXIhIik7DQoJfQ0KCQ0KIENvbnNvbGUu
    V3JpdGVMaW5lKCJDb21wbGV0ZS4uLiIpOw0K
    ------=_Part_147_27014827.1200381734762


    Using Regex.


    Thanks
    - Roman

  • User profile image
    stevo_

    I don't think regex is the answer for this, given a multipart post you want to find and split the content by the boundaries..

    You could google for parsing multipart form data and probably get some answers.. it's been ages since I wrote a parser for multipart myself so I wouldn't be much help..

  • User profile image
    rcardona

    RegEx is designed around matching patterns on single lines. You can hack it to work on multiple lines but like steveo says it's probably the not the solution for this problem. You need write a multi line processor separating the headers (noted by the blank line), capture all of the encoded text looking for a line containing the marker, i.e. IndexOf('---marker----') <> -1, capture the beginning of that line up until the marker and you're done.

  • User profile image
    Matthew van Eerde

    base64 uses a-z, A-Z, 0-9, +, and / with occasional newlines.

    So \n\n([a-zA-Z0-9\+/\n]+) should do the trick.

  • User profile image
    DoomBringer

    Congrats, you're using regex.  Now you have two problems.  It looks like you're parsing MIME, why are you doing that by hand?

  • User profile image
    stevo_

    By hand? Tongue Out

    Could well be he doesn't have access to a 'ready made' parser.

  • User profile image
    DoomBringer

    stevo_ wrote:
    By hand? Tongue Out

    Could well be he doesn't have access to a 'ready made' parser.


    That could be true, but there are so many libraries out there for this sort of thing.  Of course, I admit to writing my own MIME thing, but that was because I had to inspect the raw MIME before any library did anything to it.

  • User profile image
    odujosh

    Animaonline wrote:
    How do I extract:
    77u/Q29uc29sZS5Xcml0ZSgiVGVsbCBtZSB5b3VyIG5hbWUgc2lyOiAiKTsNCiBzdHJpbmcgbmFt
    ZSA9IENvbnNvbGUuUmVhZExpbmUoKTsNCiAJaWYgKG5hbWUuVG9Mb3dlcigpID09ICJyb21hbiIp
    DQogCXsgCQlDb25zb2xlLldyaXRlTGluZSgiV2VsY29tZSBNYXN0ZXIhIik7DQogCX0NCgllbHNl
    DQoJew0KCQlDb25zb2xlLldyaXRlTGluZSgiSW50cnVkZXIhIik7DQoJfQ0KCQ0KIENvbnNvbGUu
    V3JpdGVMaW5lKCJDb21wbGV0ZS4uLiIpOw0K



    From:

    Content-Disposition: attachment; filename="ACSS Script.acss"

    77u/Q29uc29sZS5Xcml0ZSgiVGVsbCBtZSB5b3VyIG5hbWUgc2lyOiAiKTsNCiBzdHJpbmcgbmFt
    ZSA9IENvbnNvbGUuUmVhZExpbmUoKTsNCiAJaWYgKG5hbWUuVG9Mb3dlcigpID09ICJyb21hbiIp
    DQogCXsgCQlDb25zb2xlLldyaXRlTGluZSgiV2VsY29tZSBNYXN0ZXIhIik7DQogCX0NCgllbHNl
    DQoJew0KCQlDb25zb2xlLldyaXRlTGluZSgiSW50cnVkZXIhIik7DQoJfQ0KCQ0KIENvbnNvbGUu
    V3JpdGVMaW5lKCJDb21wbGV0ZS4uLiIpOw0K
    ------=_Part_147_27014827.1200381734762


    Using Regex.


    Thanks
    - Roman



    If these parts are always the same:
    Content-Disposition:
    <blank line(s)>
    stuff you care about
    ------ (six dashes)


    Search for StartWith("Content-Disposition") ignore that line.
    Ignore blank lines using Trim than testing length.
    Get Garbage till you see the six dashes.

    IE don't really need regulaur expressions. In many ways using a streaming approach keeps you from having to read the whole thing into memory, so is more performant.

    I think about regulaur expressions mostly when I trying to enforce a tighter format or simple pattern matching. Here I think all you would make is a mess no one but the creator will ever be able to figure out.

    If that doesn't scare you away. I recommend decomposing it and figuring out a pattern for each part. Literals are easy you just have to be able to predict a domain that will produce a match.

    ? is very handy operator.




Conversation locked

This conversation has been locked by the site admins. No new comments can be made.