Page 1 of 1

Converting Charsets

Posted: 22 Aug 2007, 10:31
by RaMireZ
Hi there,

i wa trying to solve my problem with the help of the big boards, but it was no use, so i'll try to get a solution on this rather tiny one ;)

I am currently coding an email-client. it has main functionality already. I am able to show all the stuff that has been sent with the email. but unfortunately, i don't know how to handle the charsets.
When you receive an email, it is (if necessary) encoded in a certain charset. The encoding charset is always given with header of the email, so its not an issue to find out which character set i need.
But what comes after that ? Where to do i have to convert the given text encoded in the given charset ? And how do i convert ? Please help me doing this...it is the last thing i have to work on for my email client.

Posted: 22 Aug 2007, 10:48
by TiKu
I'm not sure it is the best way to do this, but I think the MultByteToWideChar API function is what you're looking for. It converts a string to UTF-16 and you can specify the character set of the input string.
If UTF-16 isn't what you want, you may use WideCharToMultiByte to convert the result of the first conversion back to any other character set.

HTH
TiKu

Posted: 22 Aug 2007, 11:10
by RaMireZ
ok thx at first...

maybe this isn't bad what you said but i wonder the following:

1. Is Windows supporting UTF-16 ? (ok, as far as it seems to be implemented in the API, i think so)

2. I don't know what i am acctually doing now. Can you tell me the concept about converting from one charset to another ?
this can be independent of a programming language but I'd be happy if someone has an examle for VB.
Please tell me about the progress of charset conversion.

3. Does anyone know about tutorials for this ?

4. In which dll is that Function you were talkin' bout ?

Posted: 22 Aug 2007, 11:49
by TiKu
RaMireZ wrote:1. Is Windows supporting UTF-16 ? (ok, as far as it seems to be implemented in the API, i think so)
Actually Visual Basic 6.0 uses UTF-16 internally.
RaMireZ wrote:2. I don't know what i am acctually doing now. Can you tell me the concept about converting from one charset to another ?
this can be independent of a programming language but I'd be happy if someone has an examle for VB.
Please tell me about the progress of charset conversion.
Not tested:

Code: Select all

Private Declare Function MultiByteToWideChar Lib "kernel32.dll" (ByVal CodePage As Long, ByVal dwFlags As Long, ByVal lpMultiByteStr As Long, ByVal cbMultiByte As Long, ByVal lpWideCharStr As Long, ByVal cchWideChar As Long) As Long

Dim buffer As String
Dim bufferSize As Long

' assuming the input string is stored in a Byte-Array called arrInputBytes
' and this input string is null-terminated (i. e. ends with Chr$(0))
bufferSize = MultiByteToWideChar(inputCodePage, 0, VarPtr(arrInputBytes(0)), -1, 0, 0)
buffer = String$(bufferSize + 1, Chr$(0))
MultiByteToWideChar inputCodePage, 0, VarPtr(arrInputBytes(0)), -1, StrPtr(buffer), bufferSize
buffer = Left$(buffer, bufferSize - 1)
' now buffer contains the converted string in UTF-16 format
RaMireZ wrote:4. In which dll is that Function you were talkin' bout ?
kernel32.dll

Posted: 22 Aug 2007, 11:54
by RaMireZ
i am currently not at home, i am at work, so i cant try it.
Usually i wouldn't ask the following questions cause i might test it tho...

In Email Text there are some signs representing other signs. F.e.:

sometimes a "=3D" occurs. this sign needs to be replaced by "=" as it is representing this sign. Will the MultiByteToWideChar be able to "translate" that ? and when do i use the other function WideCharToMultiByte ?

Posted: 22 Aug 2007, 12:05
by TiKu
RaMireZ wrote:In Email Text there are some signs representing other signs. F.e.:

sometimes a "=3D" occurs. this sign needs to be replaced by "=" as it is representing this sign. Will the MultiByteToWideChar be able to "translate" that ?
I don't think so. This is called "quoted printable" encoding. Wikipedia has a good article about it. I think you'll first have to decode the quoted printable text to a text that uses the charset specified in the mail header. Then you can pass it to MultiByteToWideChar to convert it to UTF-16.
RaMireZ wrote:and when do i use the other function WideCharToMultiByte ?
You use it to convert a UTF-16 string to another charset. This may become helpful if you want to send e-mails.

Posted: 22 Aug 2007, 12:11
by RaMireZ
alright, thx a lot for now.

i will try this at home and tell you if i was succesfull.


(i knew those smaller boards are sometimes more useful ;) )

Posted: 22 Aug 2007, 12:18
by RaMireZ
i'm sorry for dblposting...but i would appretiate it, it you maybe could do an example ? only if it doesn't disturb you of course ;)

Posted: 22 Aug 2007, 12:23
by TiKu
RaMireZ wrote:i'm sorry for dblposting...but i would appretiate it, it you maybe could do an example ? only if it doesn't disturb you of course ;)
An example for decoding quoted printable text and converting it to UTF-16? Sorry, my time is very limited. Also I have never done this before, so I'm in the same point of departure as you.
With the Wikipedia article, Google and the code I gave you above, it shouldn't be that difficult.

Posted: 22 Aug 2007, 12:28
by RaMireZ
ah ok, sorry, it sounded like you were doing this in the past.

i will tell you later or tomorrow if i was able to cope with that.