The Sandbox Thread

7 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

IsPalindrome checker for mock whiteboard problem

Back to Forum: The Sandbox
  • User profile image
    Maurits

    C# app to drive the IsPalindrome function I posted in the mock whiteboard problem thread

    Requirements:
    .NET framework 1.1

    EDIT: In the latest version, I have set textbox1.maxlength to 0 to allow verifying this beautiful palindrome:
    A 17,259 word Palindrome (or Palindromic Sentence)
    Default of 32767 was too short

    Possible improvements - I could add checkboxes for the various CompareOptions flags.  Something like
    [x] Ignore Case
    [x] Ignore Kana Type (different Japanese alphabets)
    [x] Ignore Non-Space (accents, umlauts, etc.)
    ...

    EDIT2: added checkboxes.  Also added an iterative checker.
    Iterative checker is slower but more memory efficient - or it would be if I could iterate on graphemes without having to build an array (Whidbey beta 2 better coding fixes this, I hear.)

    EDIT3:
    Added normalization options per FoldStringA.  It handles some ligatures like ae, but not others like fl.  Whidbey will bring NormalizationForms which will allow all ligatures to be expanded per michkap.
    Some FoldStringA options are mutually exclusive, but perhaps the Expand Ligatures option could be made a checkbox independent of what-to-do-with-accents radio buttons.  This would require two calls to FoldStringA.
    Rewrote StringReverse to use StringBuilder for efficiency.
    Did away with unnecessary String[] text-element cache in the iterative verson.
    Both the last two rewrites depend heavily on navigating the int[] returned by ParseCombiningCharacters.

    EDIT4: Normalization w/ FoldStringA has a nasty habit of changing characters it doesn't understand into literal question marks... turning it off by default.

  • User profile image
    michkap

    "Compare To Reverse" fails if you have U+0061 U+030a at the beginning and end (that is "a+combining ring", one of what a user calls a character.

    And of course it will fail for surrogate pairs, too, in the same way.

    You have to use StringInfo.ParseCombiningCharacters or the new StringInfo methods in Whidbey Beta 2 to make those cases work...

  • User profile image
    Maurits

    Hmmm... trying to reproduce.
    Do you have any tips for entering Unicode characters into a .NET textbox?

    Any way to normalize a string in C#?

    KNOWN ISSUES:

    Performance:
    StringReverse should really use a StringBuilder to add all the text elements together

    The iterative string walker shouldn't need to make copies of the individual text elements... instead, it should be able to use Substring to pull out the [i]th text element, using the ParseCombiningCharacters array of text element offsets (and the string length for the length of final element.)  Effectively this means rewriting the Whidbey Beta 2 LengthInTextElements and SubstringByTextElements methods.

    UNDER INVESTIGATION:
    what about things like "fl"?  Consider a palindrome

       Am rofl! (Al for ma.)

    In an IM client this could reasonably be assume to be rendered simply.
    But suppose this was an alibi in a criminal case, and the story was written up in Time magazine.  They, being a stodgy old publication (not to pick on Time) would probably render it in a professional word processing application.

    This would be quite likely to combine the "fl" into a single grapheme.

    Would this then break the palindrome-ness of the string?  Certainly the "l" in Al and the "f" in "for" can't combine.

    If it does break the palindrome-ness of the string, is that a bug?  Or a feature?

    What about things like Åbba?  The circle over the A includes a "combine with the next character" meaning... but if the string is reversed, there IS no "next character."  Is it possible to well-define a "correct" behavior here?  Is this behavior "natural"?

    EDIT: "fl" update - this DOES appear to affect palindromicity.
    Am rofl! (Al for ma.) <-- IS a palindrome
    Am rofl! (Al for ma.) <-- IS NOT a palindrome under current implementation

    Trying to "fix" this makes the problem much more difficult.  It is not enough to compare the string Unicode-character-by-Unicode-character any more - we have to worry about comparing individual characters to sequences of characters.  The "equivalence" part of this "equivalence relation" is beginning to dissolve.

    Further down this road are more opportunities for development.
    For example, in Spanish, the sequence "LL" is considered a single letter.  It's even in the dictionary under a separate entry, between "L" and "M".  Similarly, "CH" is another letter between "C" and "D".  It would be nice to consider "y Chachi" an acronym under relaxed rules - the "Ch"'s are each single "letter"s, and y matches i when used as a vowel (as it ALWAYS is in Spanish [except for foreign words])

    The letter K is used only for foreign words and does not appear in the Spanish alphabet - similar to English's use of a cedilla for foreign words only.

  • User profile image
    michkap

    Try Unicode normalization -- form KC solves your ligature problem. Try adding a .Normalize(NormalizationForm.FormKC) to your initial string.

    For letters like A-ring, they are done as a unit. That is why normalization and text elements are your friends.

    However, sort elements are not considered unique letters. So I do not think they are really an issue.

  • User profile image
    Maurits

    NormalizationForm is another Whidbey Beta 2-ism - well, I'll do what I can. Smiley

    Added normalization powered by FoldStringA (handled ae, FoldStringW didn't - neither handles fl *sigh*)
    Also rewrote StringReverse to use StringBuilder sensibly with the ParseCombiningCharacters array.
    Also rewrote iterative checker to use the ParseCombiningCharacters array rather than building a String[] cache of text elements

  • User profile image
    michkap

    See the code at http://blogs.msdn.com/michkap/archive/2005/04/30/413676.aspx for a good example of how best to do all of this.

    For an interview at Microsoft, it is unlikely that the person doing the interview would be unable to accept using Whidbey features in the solution.... Smiley

  • User profile image
    michkap

    An one more time, to handle the sort elements as well. Smiley

    http://blogs.msdn.com/michkap/archive/2005/06/15/429279.aspx 

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.