Tech Off Thread

6 posts

Sending UTF-8 between Java and C#

Back to Forum: Tech Off
  • User profile image
    GurliGebis

    I'm having some problems sending UTF-8 encoded data between Java and C#, but the data is broken when it reaches the other end.

    In Java I do this:
    BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"), 8192);

    In C# I do this:
    StreamWriter writer = new StreamWriter(client.GetStream(), System.Text.Encoding.UTF8);

    When I have this in C# :
    String s = "287";
    writer.WriteLine(s);

    In Java, I do this:
    String line = reader.readLine();
    int i = Integer.parseInt(line);

    This fails, since line contains 4 chars, with these values:
    [0] = '' 65279
    [1] = '2' 50
    [2] = '8' 56
    [3] = '7' 55

    Anyone who knows, why this happens, and maybe how to fix it?

  • User profile image
    littleguru

    Isn't the first char containing the UTF 8 identifier? Am I remembering correctly?

    Edit: Sven said it is a BOM (byte-order mark). I was remembering corretly.

    Strange that JAVA doesn't recognize it.

  • User profile image
    Sven Groot

    The first character is the Unicode Byte-Order Mark (BOM). It's strange that Java doesn't recognize this (I bet there's an option that makes it recognize it though).

    However, you can if you want prevent .Net from writing the BOM. You do this by creating the writer like this:
    StreamWriter writer = new StreamWriter(client.GetStream(), new System.Text.UTF8Encoding(false));

    The false parameter to the constructor tells it not to use a BOM.

  • User profile image
    GurliGebis

    Got that part working, now I have another problem.
    It reads it correctly, as long as I'm not sending anything but ASCII, as soon as I send anything non-ascii, it fails on the java side with this exception:

    com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.

    Any ideas of what might be wrong?
    (I'm sending an int first, followed by a \n, and then I send some XML data. Then I read the amount of chars, and reads the chars into a char array, and then generates a String from that)

  • User profile image
    littleguru

    Is the int converted to a string before sending? Could you post some sample code?

  • User profile image
    GurliGebis

    Yes, it's all send as XML, it's when I try to parse some XML that contains some non-ASCII chars.

Comments closed

Comments have been closed since this content was published more than 30 days ago, but if you'd like to continue the conversation, please create a new thread in our Forums, or Contact Us and let us know.