Tech Off Thread

6 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Sending UTF-8 between Java and C#

Back to Forum: Tech Off
  • User profile image
    GurliGebis

    I'm having some problems sending UTF-8 encoded data between Java and C#, but the data is broken when it reaches the other end.

    In Java I do this:
    BufferedReader reader = new BufferedReader(new InputStreamReader(socket.getInputStream(), "UTF-8"), 8192);

    In C# I do this:
    StreamWriter writer = new StreamWriter(client.GetStream(), System.Text.Encoding.UTF8);

    When I have this in C# :
    String s = "287";
    writer.WriteLine(s);

    In Java, I do this:
    String line = reader.readLine();
    int i = Integer.parseInt(line);

    This fails, since line contains 4 chars, with these values:
    [0] = '' 65279
    [1] = '2' 50
    [2] = '8' 56
    [3] = '7' 55

    Anyone who knows, why this happens, and maybe how to fix it?

  • User profile image
    littleguru

    Isn't the first char containing the UTF 8 identifier? Am I remembering correctly?

    Edit: Sven said it is a BOM (byte-order mark). I was remembering corretly.

    Strange that JAVA doesn't recognize it.

  • User profile image
    Sven Groot

    The first character is the Unicode Byte-Order Mark (BOM). It's strange that Java doesn't recognize this (I bet there's an option that makes it recognize it though).

    However, you can if you want prevent .Net from writing the BOM. You do this by creating the writer like this:
    StreamWriter writer = new StreamWriter(client.GetStream(), new System.Text.UTF8Encoding(false));

    The false parameter to the constructor tells it not to use a BOM.

  • User profile image
    GurliGebis

    Got that part working, now I have another problem.
    It reads it correctly, as long as I'm not sending anything but ASCII, as soon as I send anything non-ascii, it fails on the java side with this exception:

    com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 3-byte UTF-8 sequence.

    Any ideas of what might be wrong?
    (I'm sending an int first, followed by a \n, and then I send some XML data. Then I read the amount of chars, and reads the chars into a char array, and then generates a String from that)

  • User profile image
    littleguru

    Is the int converted to a string before sending? Could you post some sample code?

  • User profile image
    GurliGebis

    Yes, it's all send as XML, it's when I try to parse some XML that contains some non-ASCII chars.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.