Tech Off Thread

3 posts

Forum Read Only

This forum has been made read only by the site admins. No new threads or comments can be added.

Override XMLHTTPRequest charset

Back to Forum: Tech Off
  • User profile image
    Sven Groot

    I'm requesting some text file using IE7's XMLHTTPRequest. XMLHTTPRequest is interpreting the file as utf-8, but it's not in fact that, and it breaks because of that. The server doesn't return a charset directive, and I have no control at all over the server.

    How can I make XMLHTTPRequest interpret the response with a different encoding? Or failing that, how can I manually decode the responseBody byte array into a string? Using nothing but javascript.

    An alternative method to download the file would also work. The script has local machine privileges so it can do more than a regular browser script.

  • User profile image
    Rossj

    Sven Groot wrote:
    I'm requesting some text file using IE7's XMLHTTPRequest. XMLHTTPRequest is interpreting the file as utf-8, but it's not in fact that, and it breaks because of that. The server doesn't return a charset directive, and I have no control at all over the server.

    How can I make XMLHTTPRequest interpret the response with a different encoding? Or failing that, how can I manually decode the responseBody byte array into a string? Using nothing but javascript.

    An alternative method to download the file would also work. The script has local machine privileges so it can do more than a regular browser script.

    The default charset for XHR is UTF-8 with BOM being used to differentiate other encodings when the server returns no charset, so you might try using setRequestHeader() on the XHR to see if sending Accept-Charset might change the server's mind.

    Failing that, maybe this might help Wink

  • User profile image
    Sven Groot

    Nope, that doesn't work either. Thanks anyway.

    The only reason it's really a problem is that the file I'm downloading uses an ASCII character >0x7F for an escape sequence. The character they use however (§) starts with 10 in binary and as such can never occur as the first byte of a valid multi-byte utf-8 character. As such the XMLHTTPRequest translates them to 0xFFFF.

    At the moment I'm considering working around it by using xmlhttp.responseText.replace(/\uffff/g, '§'). This wouldn't work right for other high-value ASCII characters in the stream but those weren't displayed correctly even if you used the correct character set.

Conversation locked

This conversation has been locked by the site admins. No new comments can be made.