 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Lai Chi Wai Guest
|
Posted: Sun Mar 13, 2005 4:27 pm Post subject: MMX and SSE support on Borland C++Builder 6 |
|
|
Dear all,
In my job, I need to write MMX and SSE codes in order to optimize the vector
and parallel codes. I come across a page in Intel:
http://www.intel.com/cd/ids/developer/asmo-na/eng/catalog/19785.htm
At first, I think that, assembler and optimization should mean inline
assembly support. Though, it is not true.
I can write inline assembly codes but, there is just a lack of compiler
support on data alignments! The largest data alignment I can use is
quadword. More worse, it only guarantees data size and not memory location
alignments.
I searched the web for a few days and still, I get nothing helpful. So, I
would like to ask you experts, how can I attain the same effect as:
__declspec(aligh(16)) as in MSVC++ and Intel's compiler? Thanks alot.
|
|
| Back to top |
|
 |
Bob Gonder Guest
|
Posted: Sun Mar 13, 2005 5:15 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Lai Chi Wai wrote:
| Quote: | I can write inline assembly codes but, there is just a lack of compiler
support on data alignments! The largest data alignment I can use is
quadword. More worse, it only guarantees data size and not memory location
alignments.
|
Does -a16 not do what you want?
Either on the compile line, or as a pragma
#pragma option push -a16
//16byte variables
#pragma option pop
|
|
| Back to top |
|
 |
Lai Chi Wai Guest
|
Posted: Sun Mar 13, 2005 5:41 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Dear all,
It does not work. No matter I set the #pragma option, #pragma pack or in the
command line bcc32. Seems compiler accept 16 bytes as data alignment but it
doesn't really have support on it. When I use #pragma alignment, it display
8 bytes if setting -a16.
Thanks.
|
|
| Back to top |
|
 |
Bob Gonder Guest
|
Posted: Mon Mar 14, 2005 3:18 am Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Lai Chi Wai wrote:
| Quote: | It does not work. No matter I set the #pragma option, #pragma pack or in the
command line bcc32. Seems compiler accept 16 bytes as data alignment but it
doesn't really have support on it. When I use #pragma alignment, it display
8 bytes if setting -a16.
|
If your data is 8-byte, then it is aligned 8-byte.
To align 16-byte, you must have a 128-bit integer.
What are you using for 128-bit integers, and does BCB6 support it?
When all else fails, you can always cheat.
Allocate double the space, then move up the address as needed:
__int64 *i64Temp = new __int64[2];
__int64 *i64Value = (i64Temp+15) &(~15);
/* Use i64Value */
delete[] i64Temp;
|
|
| Back to top |
|
 |
Ed Mulroy [TeamB] Guest
|
Posted: Mon Mar 14, 2005 3:55 am Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
One way people use to make an align work is to put it into an __asm
statement and then arrange for the compiler to call the assembler by
putting #pragma inine at the top of the file.
However the alignment cannot be set to larger than that of the
segment, a requirement which has existed for as long as the Intel
series of processors has had assemblers. If I remember correctly the
segments under Win32 are not aligned on 16 or greater byte boundaries
and therefore the Windows program loader will not honor larger
boundaries.
.. Ed
| Quote: | Lai Chi Wai wrote in message
news:4234794b$1 (AT) newsgroups (DOT) borland.com...
It does not work. No matter I set the #pragma option, #pragma pack
or in the command line bcc32. Seems compiler accept 16 bytes as data
alignment but it doesn't really have support on it. When I use
#pragma alignment, it display 8 bytes if setting -a16.
|
|
|
| Back to top |
|
 |
Lai Chi Wai Guest
|
Posted: Mon Mar 14, 2005 6:21 am Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Dear all,
For parallel processing, it is benefitial to use buffers and pointers. For
vector (I mean math) algorithms, it is not.
For SISD, it is benefitial to have compiler align data in memory locations
which give the maximum CPU memory access performance.
For SIMD it is nearly a requirement. For maximum performance:
0 == int(&DataStructure) % 16;
So, here comes the __declspec(align(16)) on MSVC++, Intel C++, GCC to signal
the compiler to align data in 128 bits "boundary". Please note that, it
requires precise memory location alignment, not just padding.
In C++Builder, there is no such directive. In order to archieve the same
effect, I need -a16 to function properly at least.
Anyone know it is a bug, a deficiency of just a joke from C++Builder
on -a16? And there's a way to make it works?
class Vec4
{
float x;
float y;
float z;
float w;
};
void main(void)
{
float F4;
Vec4 V4;
}
Memory address Data structure
0x0010: F4
0x0014:
0x0018:
0x001C:
0x0020: V4
0x0024:
0x0028:
0x002C:
0x0030:
p.s. A segment is 4096 bytes by default in Win32. I think, there would not
be 32768 bits CPU out soon.
|
|
| Back to top |
|
 |
Ed Mulroy [TeamB] Guest
|
Posted: Mon Mar 14, 2005 1:11 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
| Quote: | p.s. A segment is 4096 bytes by default in Win32. I think,
there would not be 32768 bits CPU out soon.
|
Yes but the issue is the alignment of the segment, not its size.
Have you examined the physical address in a loaded copy of the
programs you describe, those which used with the __declspec(align(16))
directive, to verify that they in fact did align the data block in
question on a 16 byte boundary at runtime?
.. Ed
| Quote: | Lai Chi Wai wrote in message
news:42352d85$1 (AT) newsgroups (DOT) borland.com...
For parallel processing, it is benefitial to use buffers and
pointers.
For vector (I mean math) algorithms, it is not.
For SISD, it is benefitial to have compiler align data in memory
locations which give the maximum CPU memory access
performance.
For SIMD it is nearly a requirement. For maximum performance:
0 == int(&DataStructure) % 16;
So, here comes the __declspec(align(16)) on MSVC++, Intel C++,
GCC to signal the compiler to align data in 128 bits "boundary".
Please note that, it requires precise memory location alignment, not
just padding.
In C++Builder, there is no such directive. In order to archieve the
same effect, I need -a16 to function properly at least.
Anyone know it is a bug, a deficiency of just a joke from C++
Builder on -a16? And there's a way to make it works?
class Vec4
{
float x;
float y;
float z;
float w;
};
void main(void)
{
float F4;
Vec4 V4;
}
Memory address Data structure
0x0010: F4
0x0014:
0x0018:
0x001C:
0x0020: V4
0x0024:
0x0028:
0x002C:
0x0030:
p.s. A segment is 4096 bytes by default in Win32. I think, there
would
not be 32768 bits CPU out soon.
|
|
|
| Back to top |
|
 |
Ed Mulroy [TeamB] Guest
|
Posted: Mon Mar 14, 2005 2:22 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
| Quote: | ... In the mean time, I will switch to MSVC++ and pure assembly for
the vector and parallel library. ...
|
We may not be fully communicating.
Please check the actual alignment of the physical address that you
experience in a loaded program under MSVC++ 6.0, 7.0 and 7.1 and GCC
to verify that the data block which is decorated with
__declspec(align(16)) actually is aligned on a 16 byte boundary. It
is not sufficient that a compiler, linker and loader system merely
accept the directive. For best performance the alignment must
actually be in the physical address.
.. Ed
| Quote: | Lai Chi Wai wrote in message
news:42359cbb$1 (AT) newsgroups (DOT) borland.com...
In MSVC++ 6.0, 7.0 and 7.1, __declspec(align(16)) will align all
data declared with it on 16 bytes boundary.
After search on the web and experimenting, I found that, there is
really no way for C++Builder to do the same thing as
__declspec(align(16)). The directive is not about packing (though,
packing affect the alignment requirement), but how the compiler
assign memory locations for the variables.
This is just a lag behind of technology and hope this will be
supported by the next version. In the mean time, I will switch to
MSVC++ and pure assembly for the vector and parallel library.
(p.s. from my knowledge, the stack is allocated on the same segment,
alignment means how data are aligned on the stack of the function)
(p.s. MMX and SSE and SSE2 and most SIMD coprocessors require
properly aligned data for maximum performance)
|
|
|
| Back to top |
|
 |
Lai Chi Wai Guest
|
Posted: Mon Mar 14, 2005 2:25 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Dear all,
In MSVC++ 6.0, 7.0 and 7.1, __declspec(align(16)) will align all data
declared with it on 16 bytes boundary.
After search on the web and experimenting, I found that, there is really no
way for C++Builder to do the same thing as __declspec(align(16)). The
directive is not about packing (though, packing affect the alignment
requirement), but how the compiler assign memory locations for the
variables.
This is just a lag behind of technology and hope this will be supported by
the next version. In the mean time, I will switch to MSVC++ and pure
assembly for the vector and parallel library.
(p.s. from my knowledge, the stack is allocated on the same segment,
alignment means how data are aligned on the stack of the function)
(p.s. MMX and SSE and SSE2 and most SIMD coprocessors require properly
aligned data for maximum performance)
Thanks a lot.
|
|
| Back to top |
|
 |
Bob Gonder Guest
|
Posted: Mon Mar 14, 2005 6:13 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Lai Chi Wai wrote:
| Quote: | Dear all,
For parallel processing, it is benefitial to use buffers and pointers. For
vector (I mean math) algorithms, it is not.
For SISD, it is benefitial to have compiler align data in memory locations
which give the maximum CPU memory access performance.
For SIMD it is nearly a requirement. For maximum performance:
0 == int(&DataStructure) % 16;
So, here comes the __declspec(align(16)) on MSVC++, Intel C++, GCC to signal
the compiler to align data in 128 bits "boundary". Please note that, it
requires precise memory location alignment, not just padding.
In C++Builder, there is no such directive. In order to archieve the same
effect, I need -a16 to function properly at least.
Anyone know it is a bug, a deficiency of just a joke from C++Builder
on -a16? And there's a way to make it works?
class Vec4
{
float x;
float y;
float z;
float w;
};
void main(void)
{
float F4;
Vec4 V4;
}
Memory address Data structure
0x0010: F4
0x0014:
0x0018:
0x001C:
0x0020: V4
0x0024:
0x0028:
0x002C:
0x0030:
p.s. A segment is 4096 bytes by default in Win32. I think, there would not
be 32768 bits CPU out soon.
|
|
|
| Back to top |
|
 |
Bob Gonder Guest
|
Posted: Mon Mar 14, 2005 6:39 pm Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Lai Chi Wai wrote:
(Ooops, lost track and pressed send before writting anything)
| Quote: | For parallel processing, it is benefitial to use buffers and pointers. For
vector (I mean math) algorithms, it is not.
|
So, which do you want? Pointers or static?
| Quote: | For SIMD it is nearly a requirement. For maximum performance:
0 == int(&DataStructure) % 16;
|
#define n 16
static char filler[n];
static Vec4 DataStructure[100];
int main()
{
assert( 0 == (&DataStructure % 16) );
// fiddle with n until assert goes away
}
| Quote: | In C++Builder, there is no such directive. In order to archieve the same
effect, I need -a16 to function properly at least.
|
It does function properly, or at least as documented.
| Quote: | Anyone know it is a bug, a deficiency of just a joke from C++Builder
on -a16? And there's a way to make it works?
class Vec4
{
float x;
float y;
float z;
float w;
};
|
Those are 4byte values.
They don't require 16byte alignment.
| Quote: | void main(void)
{
float F4;
Vec4 V4;
|
Stack variables are just subtracted from esp.
{ esp -= sizeof( float )*5 }
There is no alignment code involved.
The stack itself (in Win32) is maintained on a 4byte alignment.
So, that's the best you will get from simple stack variables.
Assembly is your best bet.
_main:
push ebp
mov ebp,esp
and esp,-16
sub esp,2 * 16
F4 equ esp+16
V4 equ esp
leave
ret
|
|
| Back to top |
|
 |
Lai Chi Wai Guest
|
Posted: Tue Mar 15, 2005 7:04 am Post subject: Re: MMX and SSE support on Borland C++Builder 6 |
|
|
Dear all,
If using stack pointer adjustment, won't this affect the variable access? Or
will the compiler knows this? Thanks.
I checked on MSVC++ 6.0 (with processor pack), 7.0 and 7.1, they can align
data on memory address of 16 byte boundary. Which is necessary for the
optimal performance of MMX, SSE and SSE2. Please check on Intel's site for
informatiion about MMX, SSE and SSE2.
class Vector3
{
public:
float x;
float y;
float z;
};
__declspec(align(16)) Vector3 V1;
__declspec(align(16)) Vector3 V2;
__declspec(align(16)) Vector3 V3;
assert(0==int(&V1) % 16));
assert(0==int(&V2) % 16));
assert(0==int(&V3) % 16));
All assertion will get a true value. This is what I want. Though, in
C++Builder, there is no such facility.
This is what I want to do the next:
__asm
{
MOVAPS xmm0, V1
MOVAPS xmm1, V2
... some calculations
MOVAPS xmm7, V3
}
Without the proper alignment, MOVAPS will through out exception and/or load
or store in the wrong memory place. Using MOVUPS can overcome the alignment
problem but still, I need to pack a dummy float between V1 and V2, V2 and V3
and after V3 such that, MOVUPS will not conteminate other variables.
Yours sincerely,
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|