C# app to drive the IsPalindrome function I posted in the
mock whiteboard problem thread
Requirements:
.NET framework 1.1
EDIT: In the latest version, I have set textbox1.maxlength to 0 to allow verifying this beautiful palindrome:
A 17,259 word Palindrome (or Palindromic Sentence)
Default of 32767 was too short
Possible improvements - I could add checkboxes for the various CompareOptions flags. Something like
[x] Ignore Case
[x] Ignore Kana Type (different Japanese alphabets)
[x] Ignore Non-Space (accents, umlauts, etc.)
...
EDIT2: added checkboxes. Also added an iterative checker.
Iterative checker is slower but more memory efficient - or it would be if I could iterate on graphemes without having to build an array (Whidbey beta 2 better coding fixes this, I hear.)
EDIT3:
Added normalization options per FoldStringA. It handles some ligatures like ae, but not others like fl. Whidbey will bring NormalizationForms which will allow all ligatures to be expanded per michkap.
Some FoldStringA options are mutually exclusive, but perhaps the Expand Ligatures option could be made a checkbox independent of what-to-do-with-accents radio buttons. This would require two calls to FoldStringA.
Rewrote StringReverse to use StringBuilder for efficiency.
Did away with unnecessary String[] text-element cache in the iterative verson.
Both the last two rewrites depend heavily on navigating the int[] returned by ParseCombiningCharacters.
EDIT4: Normalization w/ FoldStringA has a nasty habit of changing characters it doesn't understand into literal question marks... turning it off by default.
Forum Read Only
This forum has been made read only by the site admins. No new threads or comments can be added.
-
-
"Compare To Reverse" fails if you have U+0061 U+030a at the beginning and end (that is "a+combining ring", one of what a user calls a character.
And of course it will fail for surrogate pairs, too, in the same way.
You have to use StringInfo.ParseCombiningCharacters or the new StringInfo methods in Whidbey Beta 2 to make those cases work... -
Hmmm... trying to reproduce.
Do you have any tips for entering Unicode characters into a .NET textbox?
Any way to normalize a string in C#?
KNOWN ISSUES:
Performance:
StringReverse should really use a StringBuilder to add all the text elements together
The iterative string walker shouldn't need to make copies of the individual text elements... instead, it should be able to use Substring to pull out the [i]th text element, using the ParseCombiningCharacters array of text element offsets (and the string length for the length of final element.) Effectively this means rewriting the Whidbey Beta 2 LengthInTextElements and SubstringByTextElements methods.
UNDER INVESTIGATION:
what about things like "fl"? Consider a palindrome
Am rofl! (Al for ma.)
In an IM client this could reasonably be assume to be rendered simply.
But suppose this was an alibi in a criminal case, and the story was written up in Time magazine. They, being a stodgy old publication (not to pick on Time) would probably render it in a professional word processing application.
This would be quite likely to combine the "fl" into a single grapheme.
Would this then break the palindrome-ness of the string? Certainly the "l" in Al and the "f" in "for" can't combine.
If it does break the palindrome-ness of the string, is that a bug? Or a feature?
What about things like Åbba? The circle over the A includes a "combine with the next character" meaning... but if the string is reversed, there IS no "next character." Is it possible to well-define a "correct" behavior here? Is this behavior "natural"?
EDIT: "fl" update - this DOES appear to affect palindromicity.
Am rofl! (Al for ma.) <-- IS a palindrome
Am rofl! (Al for ma.) <-- IS NOT a palindrome under current implementation
Trying to "fix" this makes the problem much more difficult. It is not enough to compare the string Unicode-character-by-Unicode-character any more - we have to worry about comparing individual characters to sequences of characters. The "equivalence" part of this "equivalence relation" is beginning to dissolve.
Further down this road are more opportunities for development.
For example, in Spanish, the sequence "LL" is considered a single letter. It's even in the dictionary under a separate entry, between "L" and "M". Similarly, "CH" is another letter between "C" and "D". It would be nice to consider "y Chachi" an acronym under relaxed rules - the "Ch"'s are each single "letter"s, and y matches i when used as a vowel (as it ALWAYS is in Spanish [except for foreign words])
The letter K is used only for foreign words and does not appear in the Spanish alphabet - similar to English's use of a cedilla for foreign words only. -
Try Unicode normalization -- form KC solves your ligature problem. Try adding a .Normalize(NormalizationForm.FormKC) to your initial string.
For letters like A-ring, they are done as a unit. That is why normalization and text elements are your friends.
However, sort elements are not considered unique letters. So I do not think they are really an issue. -
NormalizationForm is another Whidbey Beta 2-ism - well, I'll do what I can.

Added normalization powered by FoldStringA (handled ae, FoldStringW didn't - neither handles fl *sigh*)
Also rewrote StringReverse to use StringBuilder sensibly with the ParseCombiningCharacters array.
Also rewrote iterative checker to use the ParseCombiningCharacters array rather than building a String[] cache of text elements -
See the code at http://blogs.msdn.com/michkap/archive/2005/04/30/413676.aspx for a good example of how best to do all of this.
For an interview at Microsoft, it is unlikely that the person doing the interview would be unable to accept using Whidbey features in the solution....
-
An one more time, to handle the sort elements as well.

http://blogs.msdn.com/michkap/archive/2005/06/15/429279.aspx