BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

RFC 821 "Period promotion" vs "Quoted-Printable"
Goto page 1, 2  Next
 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock
View previous topic :: View next topic  
Author Message
JonB
Guest





PostPosted: Wed Dec 08, 2004 11:41 pm    Post subject: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote



If a message is encoded as quoted-printable
If a line in the message body starts with a period
RFC821 says you should promote it to a double period and the receiver of
the mail must
demote it down to a single period.

That works fine in TIdMessageClient.

What doesn't work is TIdCoderQuotedPrintable has code to specifically change
the first period to be encoded. I can't find where an RFC that says that
should happen. I can't find an email server that having that first period
encoded will work properly (doesn't work on google, yahoo, exchange,
sendmail).

So if your message body has this line

..mystyle{color:red}

it should become

...mystyle{color:red}

not

=2E.mystyle{color.red}

Anybody know an RFC that says first periods should be promoted?


Jon B.


Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 1:12 am    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote




"JonB" <jonb of cirris.com> wrote


Quote:
What doesn't work is TIdCoderQuotedPrintable has code to
specifically change the first period to be encoded.

You did not say which version of Indy you are actually using, but I am going
to assume Indy 9 because TIdCoderQuotedPrintable in Indy 9 is buggy in this
regard. It thinks that it is protecting the data by encoding lines that
contain just a period and nothing else. Such lines would break the
transmission. What it fails to do properly, though, is to actually make
sure that EOL sequences appear around the period. Currently, it just looks
at the first period starting a line, it does not look at the data
surrounding the period.

TIdCoderQuotedPrintable in Indy 10, on the other hand, does not have the bug
(I don't think). Instead of doubling up the periods, it just encodes the
starting period like any other unsafe character, and leaves other periods
alone. In other words:

.mystyle{color:red}

becomes

=2Emystyle{color:red}

Quote:
Anybody know an RFC that says first periods should be promoted?

You already stated it - RFC 821.

I think you meant "encoded" rather than "promoted", right? In which case,
there is no RFC that states the starting period should be encoded. It is a
bug of TIdCoderQuotedPrintable in Indy 9.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 1:25 am    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote




"Remy Lebeau (TeamB)" <no.spam (AT) no (DOT) spam.com> wrote


Quote:
Currently, it just looks at the first period starting a line, it
does not look at the data surrounding the period.

Oh, I should mention that the starting periods of the message body are
doubled up before TIdCoderQuotedPrintable is even invoked. That is another
bug, I think. If TIdMessageClient did not do that in the first place, then
lines such as this:

.mystyle{color:red}

would be encoded by TIdCoderQuotedPrintable in Indy 9 as this:

=2Emystyle{color:red}

So the question is now this: What is the proper encoding scheme, as defined
by RFC 821? Should the periods be doubled up first and then the new data
encoded by QP? Or should the periods be left alone prior to encoding and
the QP encoding should handle doubling up the single periods?

The answer turns out to be - neither. I think the periods do not need to be
doubled up until the actual transmission of data, not during the encoding of
data. Which means that the single periods should be left as single periods
prior to encoding, then QP encodig can choose whether to leave them as
periods or encode them as =2E, and then the final transmission - after all
things have been processed - should be validating the need for doubling up
any remaining periods prior to sending the data over the socket.

That is my take on it, anyway. RFC 821 has no mention of encoding schemes,
because RFC 821 is for the SMTP protocol, which doesn't care about how data
is encoded, just as long as the periods are handled properly. QP encoding
doesn't come into effect until you take MIME encoding of data into account,
which is a separate issue.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 1:29 am    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"Remy Lebeau (TeamB)" <no.spam (AT) no (DOT) spam.com> wrote


Quote:
Oh, I should mention that the starting periods of the message body are
doubled up before TIdCoderQuotedPrintable is even invoked. That is
another bug, I think. If TIdMessageClient did not do that in the first
place,
then lines such as this:

.mystyle{color:red}

would be encoded by TIdCoderQuotedPrintable in Indy 9 as this:

=2Emystyle{color:red}

Actually, this applies to Indy 10 as well.


Gambit



Back to top
JonB
Guest





PostPosted: Thu Dec 09, 2004 4:33 am    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

On Wed, 8 Dec 2004 17:29:51 -0800, Remy Lebeau (TeamB)
<no.spam (AT) no (DOT) spam.com> wrote:

Remy,

I thought all this was Indy 10 (just installed it a few months ago), not
Indy 9. I'll double check that. Sorry for the oversite.

RFC 2045 covers the quoted-printable mime format. I was wondering if I'm
missing an RFC here. Something other than 2045 or 821. I'm locking down
the message behavior with some DUnit test cases and I need to know if I've
missed some RFC.

As I read 821 and 2045 Indy is expected to:
Add a period to any line in the body of the mime content that starts
with a period so:

.mystyle{}

becomes

..mystyle{}

The RFC's do not specify whether Indy should encode the periods as =2E.
I would assume you could since 2045 says anything can be encoded but I can
tell you it will not work if you do. I've been running it through email
servers and they are not recognizing =2E. as .. and changing it back to .
I know for sure yahoo and exchange don't like it and I'm fairly sure
google and sendmail don't either.

So I'm locking it down to never encode a period and always add on an
additional period to any line that starts with a period.

I hate to break it. Anybody think I'm reading this wrong please let me
know.

Jon B
Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 9:04 am    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"JonB" <jonb of cirris.com> wrote

On Wed, 8 Dec 2004 17:29:51 -0800, Remy Lebeau (TeamB)
<no.spam (AT) no (DOT) spam.com> wrote:

Quote:
RFC 2045 covers the quoted-printable mime format.
snip
As I read 821 and 2045 Indy is expected to:

RFC 2045 contains no provisions whatsoever regarding the handling of
starting periods when applying quoted-printable encoding. The doubling of
the periods only applies to the TRANSPORT of the message data, not the
ENCODING of the message data prior to transport.

In that regard, I think Indy is currently implemented wrong. It doubles the
starting periods prior to encoding the data. It should be performing the
doubling after the encoding. In which case, QP encoding may translate
starting periods to =2E, but is not required to. If it does, the transport
won't have to worry about doubling them because they won't exist anymore.

On the receiving side, the doubling has to be removed during TRANSPORT
before DECODING can be applied. By making Indy double the periods prior to
encoding, I think Indy is breaking the receiver's ability to properly
receive and decode the original data. In other words:

--- sending ---
encode data...
double starting periods...
send data...

--- receiving ---
read data...
remove starting periods...
decode data...

As you can see, the operations are mirror images of each other. Indy does
this instead, though:

--- sending ---
double starting periods...
encode data...
send data...

--- receiving ---
read data...
remove starting periods...
decode data...

They are not mirror images anymore. That is the bug, and it exists in both
Indy 9 and 10, as far as I can see.

Quote:
.mystyle{}

becomes

..mystyle{}

Only if QP decides not to encode the starting periods. Otherwise, it should
become this instead:

=2Emystyle{}

Quote:
The RFC's do not specify whether Indy should encode the periods as =2E.

Actually, RFC 2045 says that QP does not have to encode them at all.

Quote:
I would assume you could since 2045 says anything can be
encoded but I can tell you it will not work if you do.

That is most likely because Indy is reversing the proper sequence of
operations.

Quote:
I've been running it through email servers and they are not recognizing
=2E. as .. and changing it back to .

They are not supposed to. They are supposed to convert ".." to "." first
and then decode "=2E" to ".". When receiving a message, Indy does remove
the doubling and then decode the data in the proper order. The issue is
when sending a message, the doubling anf encoding are performed in the wrong
order.

Quote:
So I'm locking it down to never encode a period and always add
on an additional period to any line that starts with a period.

That is not the proper fix. TIdMessageClient itself contains the real bug
by invoking the QP encoding at the wrong time. The QP encoding itself
should make no assumptions about starting periods at all. A period is just
a period, no hidden meaning behind it.


Gambit



Back to top
Chad Z. Hower aka Kudzu
Guest





PostPosted: Thu Dec 09, 2004 1:34 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

JonB <jonb of cirris.com> wrote in news:opsip4aec226c924@jon:
Quote:
.mystyle{}

becomes

..mystyle{}

This is incorrect.

..

should become

...

But not
..Anything

It should only do it when . is on a line byitself. As for what RFC, IIRC
its in 822.

Quote:
The RFC's do not specify whether Indy should encode the periods as
=2E.

This is a separate issue - this is maybe correct just being triggered by an
initial bug.



--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Want to keep up to date with Indy?

Join Indy News - it free!

http://www.atozed.com/indy/news/

Back to top
Chad Z. Hower aka Kudzu
Guest





PostPosted: Thu Dec 09, 2004 1:35 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

"Remy Lebeau (TeamB)" <no.spam (AT) no (DOT) spam.com> wrote in
news:41b8151f$1 (AT) newsgroups (DOT) borland.com:
Quote:
In that regard, I think Indy is currently implemented wrong. It doubles
the starting periods prior to encoding the data. It should be
performing the doubling after the encoding. In which case, QP encoding

Yes, this as well. The problem is that we use stream based encoders. We might
just have to change the . --> .. translation to only work on unencoded
messages, and whenever passed to an encoder it its responsibility to take
care of any such . items.



--
Chad Z. Hower (a.k.a. Kudzu) - http://www.hower.org/Kudzu/
"Programming is an art form that fights back"

Got Indy? Got the book?

http://www.atozed.com/indy/book/

Back to top
JonB
Guest





PostPosted: Thu Dec 09, 2004 3:47 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

Remy and Chad - thanks for the help. I sure do need it :)

Remy said:

Quote:
[servers] are supposed to convert ".." to "." first and then decode "=2E"
to ".".

Of course - I thought the order was reversed. This fits with what I'm
seeing. Thanks!

Chad said:

Quote:
It should only do it when . is on a line byitself.

That's what I first thought from reading the rfc but it says:

"Before sending a line of mail text the sender-SMTP checks
the first character of the line. If it is a period, one
additional period is inserted at the beginning of the line."

It's doesn't say anything about whether there is more data on the line but
you can assume since .Anything is in no danger of triggering a crlf.crlf
sequence then you wouldn't have to encode it.

So let me see if I can write the test case here. There are two issues that
need to be locked down: encoding periods and mime/quoted-printable encoding.

******

For encoding periods:

It's a drag to have WriteTextPart be inside SendBody - I can't get to it.
But if I define a function that is EncodeLonePeriods I could test it like
this:

procedure T_TIdMessageClient.Test_EncodeLonePeriods;
begin
data := mObj.EncodePeriods('.');
Assert(data = '..');
data := mObj.EncodePeriods('.'#13#10);
Assert(data = '..'#13#10);
data := mObj.EncodePeriods('.a');
Assert(data = '.a');
data := mObj.EncodePeriods('a.');
Assert(data = 'a.');
end;

******

For quoted-printable encoding:

procedure T_TIdEncoderQuotedPrintable.Test_EncodeString;
begin
data := TIdEncoderQuotedPrintable.EncodeString('.');
Assert(data = '.');
data := TIdEncoderQuotedPrintable.EncodeString('.'#13#10);
Assert(data = '.'#13#10);
data := TIdEncoderQuotedPrintable.EncodeString('..');
Assert(data = '..');
data := TIdEncoderQuotedPrintable.EncodeString('..'#13#10);
Assert(data = '..'#13#10);
data := TIdEncoderQuotedPrintable.EncodeString('123456789 123456789
123456789 123456789 123456789 123456789 123456789 123456789'#13#10);
Assert(data = '123456789 123456789 123456789 123456789 123456789 123456789
123456789 12='#13#10'3456789'#13#10);
end;


Sound good?

Thanks,

Jon B.



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 6:05 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"Chad Z. Hower aka Kudzu" <cpub (AT) hower (DOT) org> wrote


Quote:
Yes, this as well. The problem is that we use stream based encoders.

I don't see that as being a problem in this particular case.
TIdMessageClient is encoding data one string at a time. It passes a string
to TIdEncoderQuotedPrintable as input and it outputs a string:

for i := 0 to ATextPart.Body.Count - 1 do begin
LBodyLine := ATextPart.Body[i];
if (LBodyLine <> '') and (LBodyLine[1] = '.') then begin
ATextPart.Body[i] := '.' + LBodyLine;
end;
LData := TIdEncoderQuotedPrintable.EncodeString(ATextPart.Body[i] +
EOL);
if TransferEncoding = iso2022jp then begin
IOHandler.Write(Encode2022JP(LData))
end else begin
IOHandler.Write(LData);
end;
end;

It should do this instead:

for i := 0 to ATextPart.Body.Count - 1 do begin
LBodyLine := ATextPart.Body[i];
LData := TIdEncoderQuotedPrintable.EncodeString(ATextPart.Body[i] +
EOL);
if (LData <> '') and (LData[1] = '.') then begin
LData := '.' + LData;
end;
if TransferEncoding = iso2022jp then begin
IOHandler.Write(Encode2022JP(LData))
end else begin
IOHandler.Write(LData);
end;
end;

The only problem with this is that the encoded string from
TIdEncoderQuotedPrintable may have multiple lines in it. Which means that
the string would have to be broken up, each line checked for periods, and
then the lines concatentated back together (or just transmitted separately).

Perhaps EncodeString() can be expanded on to return a TIdStringList instead
of a string? In fact, TIdEncoderQuotedPrintable.Encode() already encodes
the data to a TIdStringList internally, and then just returns the Text
property at the end. We could add a couple of new methods to the encoders
to fill in a TIdStringList with the encoded data, and then have
TIdEncoderQuotedPrintable.Encode() call that method for the actual encoding
work and then return the resulting Text. For example:

TIdEncoder = class(TIdBaseComponent)
public
function Encode(const ASrc: string): string; overload;
function Encode(ASrcStream: TIdStreamRandomAccess; const ABytes:
Integer = MaxInt): string; overload; virtual; abstract;
class function EncodeString(const AIn: string): string;

// add these methods
procedure EncodeToStringList(ASrcStream: TIdStreamRandomAccess; var
VDest: TIdStringList; const ABytes: Integer = MaxInt); virtual; abstract;
class procedure EncodeStringToStringList(const AIn: string; var
VDest: TIdStringList);
end;

procedure TIdEncoderQuotedPrintable.EncodeToStringList(ASrcStream:
TIdStreamRandomAccess; var VDest: TIdStringList; const ABytes: Integer);
begin
// do the work here, fill in VDest as needed...
end;

function TIdEncoderQuotedPrintable.Encode(ASrcStream:
TIdStreamRandomAccess; const ABytes: integer): string;
var
st : TIdStringList;
begin
st := TIdStringList.Create;
try
EncodeToStringList(ASrcStream, st, ABytes);
Result := st.Text;
finally
FreeAndNil(st);
end;
end;

Then TIdMessageClient can call the new EncodeToStringList() method directly
instead of using EncodeString():

LData := TIdStringList.Create();
try
for i := 0 to ATextPart.Body.Count - 1 do begin
TIdEncoderQuotedPrintable.EncodeStringToStringList(LBodyLine[i]
+ EOL, LData);
for j := 0 to LData.Count-1 do begin
if (LData[j] <> '') and (LData[j][1] = '.') then begin
LData[j] := '.' + LData[j];
end;
if TransferEncoding = iso2022jp then begin
LData[j] := Encode2022JP(LData[j]);
end;
end;
IOHandler.Write(LData);
end;
finally
FreeAndNil(LTemp);
end;


Quote:
We might just have to change the . --> .. translation to only work on
unencoded messages, and whenever passed to an encoder it its
responsibility to take care of any such . items.

I do not agree. Doubling the periods is the transport's responsibility, not
the encoder's.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 6:10 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"Chad Z. Hower aka Kudzu" <cpub (AT) hower (DOT) org> wrote


Quote:
It should only do it when . is on a line byitself.

Wrong. Although a line with just a period is the troublesome scenerio, the
checking has to be performed on ALL lines regardless of length. According
to RFC 821:

To allow all user composed text to be transmitted transparently the
following procedures are used.

1. Before sending a line of mail text the sender-SMTP checks
the first character of the line. If it is a period, one
additional period is inserted at the beginning of the line.

2. When a line of mail text is received by the receiver-SMTP
it checks the line. If the line is composed of a single
period it is the end of mail. If the first character is a
period and there are other characters on the line, the first
character is deleted.

There is no provision in that statement that says a line with just a period
only is the only line that is converted. It says that any line beginning
with a period has to be converted.


Gambit



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 6:13 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"JonB" <jonb of cirris.com> wrote


Quote:
It's doesn't say anything about whether there is more data on the
line but you can assume since .Anything is in no danger of triggering
a crlf.crlf sequence then you wouldn't have to encode it.

Yes, you would have to encode it. The RFC specifically requires the
receiver to remove the beginning period if there is *any* characters
following the period:

2. When a line of mail text is received by the receiver-SMTP
it checks the line. If the line is composed of a single
period it is the end of mail. If the first character is a
period and there are *other characters* on the line, the first
character is deleted.

If you do not encode ".Anything" as "..Anything", the receiver will lose the
original period and only see "Anything" instead.

Quote:
It's a drag to have WriteTextPart be inside SendBody - I can't get to it.

Please see my other reply for a different way to fix TIdMessageClient and
TIdEncoderQuotedPrintable.


Gambit



Back to top
JonB
Guest





PostPosted: Thu Dec 09, 2004 6:28 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

Remy,

Thanks - you are really helping to focus this down correctly.

Okay, so for period encoding I know I have the correct code when none of
these asserts generate exceptions:

procedure T_TIdMessageClient.Test_EncodeLonePeriods;
begin
data := mObj.EncodePeriods('.');
Assert(data = '..');
data := mObj.EncodePeriods('.'#13#10);
Assert(data = '..'#13#10);
data := mObj.EncodePeriods('.a');
Assert(data = '..a');
data := mObj.EncodePeriods('a.');
Assert(data = 'a.');
end;

"Remy Lebeau (TeamB)" <no.spam (AT) no (DOT) spam.com> wrote

Quote:

"JonB" <jonb of cirris.com> wrote in message
news:41b87387 (AT) newsgroups (DOT) borland.com...

It's doesn't say anything about whether there is more data on the
line but you can assume since .Anything is in no danger of triggering
a crlf.crlf sequence then you wouldn't have to encode it.

Yes, you would have to encode it. The RFC specifically requires the
receiver to remove the beginning period if there is *any* characters
following the period:

2. When a line of mail text is received by the receiver-SMTP
it checks the line. If the line is composed of a single
period it is the end of mail. If the first character is a
period and there are *other characters* on the line, the first
character is deleted.

If you do not encode ".Anything" as "..Anything", the receiver will lose
the
original period and only see "Anything" instead.

It's a drag to have WriteTextPart be inside SendBody - I can't get to it.

Please see my other reply for a different way to fix TIdMessageClient and
TIdEncoderQuotedPrintable.


Gambit





Back to top
Ciaran Costelloe
Guest





PostPosted: Thu Dec 09, 2004 8:13 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote

Quote:
2. When a line of mail text is received by the receiver-SMTP
it checks the line. If the line is composed of a single
period it is the end of mail. If the first character is a
period and there are *other characters* on the line, the first
character is deleted.

This is a badly-phrased part of the RFC. If the line is not composed of a
single period, but it starts with a period, then there will always be other
characters on the line - specifically there will always be a second period
after the first.

What it means is that if a line is just a period, it is the end of the mail,
otherwise remove any period that is at the start of a line.

Ciaran



Back to top
Remy Lebeau (TeamB)
Guest





PostPosted: Thu Dec 09, 2004 8:45 pm    Post subject: Re: RFC 821 "Period promotion" vs "Quoted-Printable" Reply with quote


"Ciaran Costelloe" <ccostelloe (AT) flogas (DOT) ie> wrote


Quote:
This is a badly-phrased part of the RFC.

Actually, I think it is worded very clearly. It clearly defines two
separate conditions when handing periods.

Quote:
If the line is not composed of a single period, but it starts with a
period, then there will always be other characters on the line -
specifically there will always be a second period after the first.

Only if the sender followed rule #1 properly ;-)


Gambit



Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Internet Winsock All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.