 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Avatar Zondertau Guest
|
Posted: Sun May 08, 2005 5:09 pm Post subject: Voting for FastCode rules on memory reading |
|
|
Hi to all,
I would like to suggest a change of FastCode rules for reading memory.
I would suggest that these two rules:
- It is allowed to read past the end of AnsiStrings and WideStrings
including the dword containing the zero-terminator.
- It is not allowed to read past the end of ShortStrings and PChar
strings.
Be replaced by this new rule:
- It is allowed to read anywhere from any memory page containing at
least one byte of a ShortString, PChar, AnsiString or WideString
string, including a #0 terminator.
I would like us to vote on this issue.
My reason for voting for such a change is this:
Memory access permissions are set on a per-memory-page basis. This
means that if reading is allowed for a certain byte in page X then
every byte in that same page can be read without further side-effects.
Memory pages are 4096 = 2^12 bytes large and start at multiples of this
numer. This means that any aligned structure whose size is a power of
two and which has a size is smaller than or equal to 4096 bytes is
entirely in one page.
This is useful, because it would allow reading entire aligned MMX and
XMM registers from anywhere in a string. This means MMX/SSE string
processing code needs less checks, while it can still work at the same
level of reliability.
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Sun May 08, 2005 6:00 pm Post subject: Re: Voting for FastCode rules on memory reading |
|
|
Hi Avatar
I am against these rules, but I have no real arguments I hope somebody
has the knowledge to answer questions like:
Can we be 100% sure that there are no other mechanisms that can give AV's or
any other errors if we read outside memory ranges allocated for "us"?
What about the new non execute bits?
Which page sizes exist on any Windows system - eg Windows 95 etc?
Can we think of any problems related to multithreading and/or dual processor
PC's?
Can we think of memory manager related problems?
Can we think of future security features?
The requested rules look quite dirty to me an am strongly against getting
dirty just to get a little extra speed. But this is not really a strong
argument ;-)
Let us consider this carefully before changing anything. It creates to much
noise an extra work to change rules back and forth.
Regards
Dennis
|
|
| Back to top |
|
 |
Avatar Zondertau Guest
|
Posted: Sun May 08, 2005 6:21 pm Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | I am against these rules, but I have no real arguments I hope
somebody has the knowledge to answer questions like:
Can we be 100% sure that there are no other mechanisms that can give
AV's or any other errors if we read outside memory ranges allocated
for "us"?
What about the new non execute bits?
|
These are set per page:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/memory/
base/data_execution_prevention.asp
(http://tinyurl.com/7wala)
| Quote: | Which page sizes exist on any Windows system - eg Windows 95 etc?
|
They are determined by the CPU, not the OS. Page size has been 4096 for
all IA32 CPUs (that is, since the 80386).
See IA32 developers manual volume 1, paragraph 2.1.3
ftp://download.intel.com/design/Pentium4/manuals/25366515.pdf
| Quote: | Can we think of any problems related to multithreading and/or dual
processor PC's?
|
No, because the extra data that has been read is ignored anyways.
| Quote: | Can we think of memory manager related problems?
|
This isn't related to memory managers, because reading (in contrast to
writing) has no effect on the data.
| Quote: | Can we think of future security features?
|
All current security features, both in the CPU and in the OS
(VirtualProtect) are defined to work on page level.
| Quote: | The requested rules look quite dirty to me an am strongly against
getting dirty just to get a little extra speed. But this is not
really a strong argument
|
The same could be said about the current DWORD boundaries. Using DWORD
boundaries is quite arbitrairy. If you're allowing reading irrelevant
bytes anyway, then you might just as well allow the greatest unit that
is safely possible. This is much less arbitrairy.
Of course we could also vote on another boundary, like 32 bytes (AFAIK
the size of a cache line), but this seems the choice that makes most
sense.
| Quote: | Let us consider this carefully before changing anything. It creates
to much noise an extra work to change rules back and forth.
|
Making rules more strict, like the writing issue before, takes more
work than relaxing them.
|
|
| Back to top |
|
 |
Pierre le Riche Guest
|
Posted: Sun May 08, 2005 8:30 pm Post subject: Re: Voting for FastCode rules on memory reading |
|
|
Hi,
| Quote: | - It is allowed to read anywhere from any memory page containing at
least one byte of a ShortString, PChar, AnsiString or WideString
string, including a #0 terminator.
|
Sounds ok to me... I vote yes.
Regards,
Pierre
|
|
| Back to top |
|
 |
Thorsten Engler [NexusDB] Guest
|
Posted: Mon May 09, 2005 12:23 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | - It is allowed to read anywhere from any memory page containing at
least one byte of a ShortString, PChar, AnsiString or WideString
string, including a #0 terminator.
Sounds good. |
|
|
| Back to top |
|
 |
Aleksandr Sharahov Guest
|
Posted: Mon May 09, 2005 6:38 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | - It is allowed to read anywhere from any memory page containing at
least one byte of a ShortString, PChar, AnsiString or WideString
string, including a #0 terminator.
|
Yes.
--
Aleksandr.
|
|
| Back to top |
|
 |
Anders Isaksson Guest
|
Posted: Mon May 09, 2005 7:10 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
On 8 May 2005 10:09:43 -0700, "Avatar Zondertau" <avatarzt (AT) gmail (DOT) com>
wrote:
| Quote: | Memory pages are 4096 = 2^12 bytes large and start at multiples of this
numer.
|
Is this fixed in stone? Will this always be true? On what level is the
page size determined (CPU, OS?)
| Quote: | This means that any aligned structure whose size is a power of
two and which has a size is smaller than or equal to 4096 bytes is
entirely in one page.
|
Why does it mean that? Why can't a structure start in the middle of a
page and continue into the next?
And what about a structure which doesn't have a 'power of two' size?
--
Anders Isaksson, Sweden
BlockCAD: http://web.telia.com/~u16122508/proglego.htm
Gallery: http://web.telia.com/~u16122508/gallery/index.htm
|
|
| Back to top |
|
 |
Avatar Zondertau Guest
|
Posted: Mon May 09, 2005 8:01 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | Memory pages are 4096 = 2^12 bytes large and start at multiples of this
numer.
Is this fixed in stone? Will this always be true? On what level is the
page size determined (CPU, OS?)
|
It is determined by the CPU and fixed in the IA32 architecture.
| Quote: | This means that any aligned structure whose size is a power of
two and which has a size is smaller than or equal to 4096 bytes is
entirely in one page.
Why does it mean that? Why can't a structure start in the middle of a
page and continue into the next?
|
I was speaking of aligned structures.
| Quote: | And what about a structure which doesn't have a 'power of two' size?
|
Those don't have natural alignment boundaries, so the word "aligned"
doesn't apply there.
The structures i was referring are thos that you read; if you read an
aligned 16 byte block it will always be in only one page.
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 09, 2005 8:04 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | Is this fixed in stone? Will this always be true?
|
Not universally, but FastCode ASM functions being tied to Win32 usage
means it will pretty much be true forever (on Win32).
| Quote: | Why does it mean that? Why can't a structure start in the middle of a
page and continue into the next?
|
Indeed, it can happen IIUC.
Eric
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 09, 2005 8:06 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | I was speaking of aligned structures.
|
Won't aligned structures be aligned to 16 bytes rather than 2048?
In the MM challenge, there was a cache associativity boost found
when not being too highly-aligned f.i.
Eric
|
|
| Back to top |
|
 |
Avatar Zondertau Guest
|
Posted: Mon May 09, 2005 9:00 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | Is this fixed in stone? Will this always be true?
Not universally, but FastCode ASM functions being tied to Win32 usage
means it will pretty much be true forever (on Win32).
|
The page size of 4096 is part of IA32 protected mode. The choice of
using those pages (as opposed to only segments) is made by Win32.
| Quote: | Why does it mean that? Why can't a structure start in the middle of a
page and continue into the next?
Indeed, it can happen IIUC.
|
Not if they are aligned. Say an aligned structure has size 2^n; this
means that it must start at some offset m*2^n. as long as 2^n <= 4096
this means that the structure is contained in only one page (which
starts at some offset k*2^12).
The point here is that if you read a full MMX or XMM register (an
possibly larger ones in the future) for an aligned memory location, all
of the data comes from the same page. Therefore if you want to read one
byte, but read instead the entrie aligned XMM register it is contained
in, all data will come from the same page as that byte, and thus has the
same access permissions.
|
|
| Back to top |
|
 |
Avatar Zondertau Guest
|
Posted: Mon May 09, 2005 9:25 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | I was speaking of aligned structures.
Won't aligned structures be aligned to 16 bytes rather than 2048?
|
Yes, they are aligned to 16 bytes. This means however that they cannot
cross any 4096 byte multiple boundaries (which are the page boundaries)
either (this goes for all multiples of any power of 2 that is at least
16, including 2048).
Here is an example of an aligned 4-byte structure xxxx:
012345789ABCDEF
xxxx
yyyyyyyy
As you can see it is contained entirely in exactly one aligned 8-byte
structure: yyyyyyyy.
This is what i meant: a 16-byte XMM register (for example) is contained
in exacty one page, so an aligned read of it doesn't cross page
boundaries, just like xxxx didn't cross 8-byte boundaries.
| Quote: | In the MM challenge, there was a cache associativity boost found
when not being too highly-aligned f.i.
|
The reason for allowing reading the entire page is because all bytes in
the page have the same protection, so if byte x can be read with no AV,
the byte y in the same page can also safely be read.
Of course only testing will show if this actually brings a speed
advantage, but ISTM that it probably will, because in some situation it
can remove some special cases (like reading all of the string using
8-byte MMX registers, but have to include a check so that the last part
is read in a 4-byte register).
|
|
| Back to top |
|
 |
Florent Ouchet Guest
|
Posted: Mon May 09, 2005 10:00 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
Hi,
I vote no and explain my position with 3 remarks:
* Intel is not using this optimisation while working on unaligned
datas. It begins scalar working with first unaligned bytes, then uses
SSE/MMX vector working until the data can't fit in one vectorial
register, then it finishes the last bytes with scalar working.
* I don't think using only XMM register will result in a real gain of
performance, you will have to build some masks to make differences
between valid and not valid datas.
* It should be possible to trigger some data breakpoints while reading
extra-bytes.
Cheers,
Florent
|
|
| Back to top |
|
 |
Avatar Zondertau Guest
|
Posted: Mon May 09, 2005 11:12 am Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | * Intel is not using this optimisation while working on unaligned
datas. It begins scalar working with first unaligned bytes, then uses
SSE/MMX vector working until the data can't fit in one vectorial
register, then it finishes the last bytes with scalar working.
|
In what Intel code is this? Are they providing a reason why they're
doing this (speed of safety)?
| Quote: | * I don't think using only XMM register will result in a real gain of
performance, you will have to build some masks to make differences
between valid and not valid datas.
|
That depends on the task you're doing. I some cases masking may be
needed, but in others (like Pos(Ex) and related stuff) this isn't always
needed.
The speed gain or loss can only be determined by measuring.
| Quote: | * It should be possible to trigger some data breakpoints while reading
extra-bytes.
|
That is possible, but doesn't cause bugs. Note that with the current
rule it is also possible that a data breakpoint is hit while it
shouldn't have been, so if this is really the problem then the rule
should be changed anyway.
IMHO your third argument is valid, but if this would really be an issue
it would also apply to the current rule of allowing DWORD reads. I think
it is not an issue, because it doesn't introduce bugs. The first
argument may be valid, depending on what code this is about and what the
motivation is.
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 09, 2005 12:14 pm Post subject: Re: Voting for FastCode rules on memory reading |
|
|
| Quote: | Not if they are aligned. Say an aligned structure has size 2^n;
|
This is the part I do not understand, an aligned structure has a
size that is a multiple of 16 + <15 bytes.
Also I can have an aligned structure 32 bytes long that is across
the the 4096 page boundary (one half in the 1st page, the second
half in the second page).
| Quote: | The point here is that if you read a full MMX or XMM register (an
possibly larger ones in the future) for an aligned memory location, all |
of the data comes from the same page.
For a 16 byte entity, yes, but an aligned structure or entity can
be larger than 2^n, and not be a 2^n.
Also, if the allocator isn't 16 byte aligned, but 8 byte aligned
(like WinMem f.i.) then an MMX access will be in a single page,
but an XMM won't.
There is also the issue of the extra bytes added by Delphi at the
beginning of strings f.i., which throw off data-alignment even if
the allocator is aligned.
Eric
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|