 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Eric Grange Guest
|
Posted: Mon May 02, 2005 8:07 am Post subject: MM call for benchmarks |
|
|
Could those involved - or not - in the MM challenge supply
small "memory usage" benchmarks?
By usage I mean a benchmark where memory is allocated/reallocated,
but also *used*, and that usage, like in a real world applications,
represents a fair share of the execution time of the benchmark.
These could be anything from your own implementation of a dynamic
list, of a tree, of a mini-XML parser, of a mini report generator,
or whatever other task that involves both memory and its usage.
Ideally, all MMs should score close in those cases, but experience
proved there could be vulnerabilities (aligned MMs in the Double bench,
WinMem in the new LinkedList bench), and vulnerabilities are things
we want to avoid if possible :)
Most current benchmarks focus only on allocation/reallocation/release
cycle times, and few make any significant usage of the allocated memory,
which lead to oddities, like the alignment one which was discovered
rather late (maybe the case is not frequent, but that was still a
one to six performance drop for allocating "too aligned" memory),
another one is speedup-in-B&V but slowdown-in-real-world-application,
as was reported by Pierre in another thread, and experienced by me too.
I guess the current benchmarks have the "pathological" and "abstract"
memory usage cases fairly well covered, now is probably the time
to have "good" usage scenarii get added. :)
Eric
|
|
| Back to top |
|
 |
Nathanial Woolls Guest
|
Posted: Mon May 02, 2005 11:51 am Post subject: Re: MM call for benchmarks |
|
|
| Quote: | Could those involved - or not - in the MM challenge supply
small "memory usage" benchmarks?
|
Do the current usage logger benchmarks not satisfy this?
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 02, 2005 12:59 pm Post subject: Re: MM call for benchmarks |
|
|
| Quote: | Do the current usage logger benchmarks not satisfy this?
|
They perform only of allocations, reallocations and releases,
the allocated memory is practically never accessed (only 1st
and last bytes are, and more as a verification of the block
being actually allocated than anything else).
Thus they test fragmentation and raw MM allocation performance
well, but they do not say anything about how efficiently the
allocated memory can be used by the application. It's entirely
possible that cache issues may plague them. The Double benchmarks
is plagued by cache associativity issues, other apps may encounter
more issues, like poor coherency, or cache lines shared across CPUs
in multi-CPU situations, or data spread over many pages which may
stress the Windows swap/cache too much, etc. And I guess some of
the issues may be unexpected for us, like the associativy one was :)
Eric
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 1:02 pm Post subject: Re: MM call for benchmarks |
|
|
Hi
I agree.
We could/should add usage of allocated memory.
Let us try it. It is simple to call FillChar on the allocated memory.
Regards
Dennis
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 1:03 pm Post subject: Re: MM call for benchmarks |
|
|
unit DoubleFPBenchmark3Unit;
interface
uses Windows, BenchmarkClassUnit, Classes, Math;
type
TDoubleFPThreads3 = class(TFastcodeMMBenchmark)
public
procedure RunBenchmark; override;
class function GetBenchmarkName: string; override;
class function GetBenchmarkDescription: string; override;
class function GetSpeedWeight: Double; override;
class function GetCategory: TBenchmarkCategory; override;
end;
implementation
uses SysUtils;
type
TDoubleFPThread3 = class(TThread)
FBenchmark: TFastcodeMMBenchmark;
procedure Execute; override;
end;
TRegtangularComplexD = packed record
RealPart, ImaginaryPart : Double;
end;
//Loading some double values
procedure TestFunction(var Res : TRegtangularComplexD; const X, Y :
TRegtangularComplexD);
begin
Res.RealPart := X.RealPart + Y.RealPart
+ X.RealPart + Y.RealPart
+ X.RealPart + Y.RealPart
+ X.RealPart + Y.RealPart
+ X.RealPart + Y.RealPart;
Res.ImaginaryPart := X.ImaginaryPart + Y.ImaginaryPart
+ X.ImaginaryPart + Y.ImaginaryPart
+ X.ImaginaryPart + Y.ImaginaryPart
+ X.ImaginaryPart + Y.ImaginaryPart
+ X.ImaginaryPart + Y.ImaginaryPart;
end;
procedure TDoubleFPThread3.Execute;
var
I1, I2, I3, I4, I5, I6, I7, I8, J1, J2, J3, J4, J5, J6 : Integer;
//Need many arrays because a 4 byte aligned array can be 8 byte aligned by
pure chance
Src1Array1 : array of TRegtangularComplexD;
Src2Array1 : array of TRegtangularComplexD;
ResultArray1 : array of TRegtangularComplexD;
Src1Array2 : array of TRegtangularComplexD;
Src2Array2 : array of TRegtangularComplexD;
ResultArray2 : array of TRegtangularComplexD;
Src1Array3 : array of TRegtangularComplexD;
Src2Array3 : array of TRegtangularComplexD;
ResultArray3 : array of TRegtangularComplexD;
Src1Array4 : array of TRegtangularComplexD;
Src2Array4 : array of TRegtangularComplexD;
ResultArray4 : array of TRegtangularComplexD;
Src1Array5 : array of TRegtangularComplexD;
Src2Array5 : array of TRegtangularComplexD;
ResultArray5 : array of TRegtangularComplexD;
Src1Array6 : array of TRegtangularComplexD;
Src2Array6 : array of TRegtangularComplexD;
ResultArray6 : array of TRegtangularComplexD;
BenchArraySize : Integer;
const
MINBENCHARRAYSIZE : Integer = 9500;
MAXBENCHARRAYSIZE : Integer = 10000;
NOOFRUNS : Integer = 2;
begin
for BenchArraySize := MINBENCHARRAYSIZE to MAXBENCHARRAYSIZE do
begin
SetLength(Src1Array1, BenchArraySize);
SetLength(Src2Array1, BenchArraySize);
SetLength(ResultArray1, BenchArraySize);
SetLength(Src1Array2, BenchArraySize);
SetLength(Src2Array2, BenchArraySize);
SetLength(ResultArray2, BenchArraySize);
SetLength(Src1Array3, BenchArraySize);
SetLength(Src2Array3, BenchArraySize);
SetLength(ResultArray3, BenchArraySize);
SetLength(Src1Array4, BenchArraySize);
SetLength(Src2Array4, BenchArraySize);
SetLength(ResultArray4, BenchArraySize);
SetLength(Src1Array5, BenchArraySize);
SetLength(Src2Array5, BenchArraySize);
SetLength(ResultArray5, BenchArraySize);
SetLength(Src1Array6, BenchArraySize);
SetLength(Src2Array6, BenchArraySize);
SetLength(ResultArray6, BenchArraySize);
FBenchmark.UpdateUsageStatistics;
//Fill source arrays
for I1 := 0 to BenchArraySize-1 do
begin
Src1Array1[I1].RealPart := 1;
Src1Array1[I1].ImaginaryPart := 1;
Src2Array1[I1].RealPart := 1;
Src2Array1[I1].ImaginaryPart := 1;
end;
//Run on one set of arrays at a time
//Only 6 memory blocks active at a time
for J1 := 0 to NOOFRUNS do
for I3 := 0 to BenchArraySize-1 do
begin
TestFunction(ResultArray1[I3], Src1Array1[I3], Src2Array1[I3]);
TestFunction(ResultArray2[I3], Src1Array2[I3], Src2Array2[I3]);
end;
for J2 := 0 to NOOFRUNS do
for I4 := 0 to BenchArraySize-1 do
begin
TestFunction(ResultArray3[I4], Src1Array3[I4], Src2Array3[I4]);
TestFunction(ResultArray4[I4], Src1Array4[I4], Src2Array4[I4]);
end;
for J5 := 0 to NOOFRUNS do
for I5 := 0 to BenchArraySize-1 do
begin
TestFunction(ResultArray5[I5], Src1Array5[I5], Src2Array5[I5]);
TestFunction(ResultArray6[I5], Src1Array6[I5], Src2Array6[I5]);
end;
end;
end;
class function TDoubleFPThreads3.GetBenchmarkDescription: string;
begin
Result := 'A benchmark that tests access to Double FP variables '
+ 'in a dynamic array '
+ 'Gives bonus for 8 byte aligned blocks. Also reveils set associativity
related issues.'
+ 'Benchmark submitted by Dennis Kjaer Christensen.';
end;
class function TDoubleFPThreads3.GetBenchmarkName: string;
begin
Result := 'Double Variables Access 6 arrays at a time';
end;
class function TDoubleFPThreads3.GetCategory: TBenchmarkCategory;
begin
Result := bmMemoryUsage;
end;
class function TDoubleFPThreads3.GetSpeedWeight: Double;
begin
Result := 0.9;
end;
procedure TDoubleFPThreads3.RunBenchmark;
var
DoubleFPThread3 : TDoubleFPThread3;
begin
inherited;
DoubleFPThread3 := TDoubleFPThread3.Create(True);
DoubleFPThread3.FreeOnTerminate := False;
DoubleFPThread3.FBenchmark := Self;
DoubleFPThread3.Resume;
DoubleFPThread3.WaitFor;
DoubleFPThread3.Free;
end;
end.
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 1:23 pm Post subject: Re: MM call for benchmarks |
|
|
unit MoveBenchmark1Unit;
interface
uses Windows, BenchmarkClassUnit, Classes, Math;
type
TMoveThreads1 = class(TFastcodeMMBenchmark)
public
procedure RunBenchmark; override;
class function GetBenchmarkName: string; override;
class function GetBenchmarkDescription: string; override;
class function GetSpeedWeight: Double; override;
class function GetCategory: TBenchmarkCategory; override;
end;
implementation
uses SysUtils, MoveJOHUnit9;
type
TMoveThread1 = class(TThread)
FBenchmark: TFastcodeMMBenchmark;
procedure Execute; override;
end;
procedure TMoveThread1.Execute;
var
I1, I2, I3, I4, I5 : Integer;
//Need many arrays because a 4 byte aligned array can be 16 byte aligned
by
pure chance
//Reallocs migth change alignment
SrcArray1 : array of Byte;
DestArray1 : array of Byte;
SrcArray2 : array of Byte;
DestArray2 : array of Byte;
SrcArray3 : array of Byte;
DestArray3 : array of Byte;
SrcArray4 : array of Byte;
DestArray4 : array of Byte;
SrcArray5 : array of Byte;
DestArray5 : array of Byte;
SrcArray6 : array of Byte;
DestArray6 : array of Byte;
SrcArray7 : array of Byte;
DestArray7 : array of Byte;
SrcArray8 : array of Byte;
DestArray8 : array of Byte;
BenchArraySize : Integer;
const
NOOFRUNS : Integer = 2;
MINBENCHARRAYSIZE : Integer = 2000;//16K
MAXBENCHARRAYSIZE : Integer = 2000000;//1M
STEPSIZE : Integer = 2;
NOOFMOVESPERRUN : Integer = 1000;
begin
for I1 := 0 to NOOFRUNS do
begin
BenchArraySize := MINBENCHARRAYSIZE;
while BenchArraySize <= MAXBENCHARRAYSIZE do
begin
SetLength(SrcArray1, BenchArraySize+ ;
SetLength(DestArray1, BenchArraySize+ ;
SetLength(SrcArray2, BenchArraySize+ ;
SetLength(DestArray2, BenchArraySize+ ;
SetLength(SrcArray3, BenchArraySize+ ;
SetLength(DestArray3, BenchArraySize+ ;
SetLength(SrcArray4, BenchArraySize+ ;
SetLength(DestArray4, BenchArraySize+ ;
SetLength(SrcArray5, BenchArraySize+ ;
SetLength(DestArray5, BenchArraySize+ ;
SetLength(SrcArray6, BenchArraySize+ ;
SetLength(DestArray6, BenchArraySize+ ;
SetLength(SrcArray7, BenchArraySize+ ;
SetLength(DestArray7, BenchArraySize+ ;
SetLength(SrcArray8, BenchArraySize+ ;
SetLength(DestArray8, BenchArraySize+ ;
for I2 := 1 to NOOFMOVESPERRUN do
begin
MoveJOH_SSE_9(SrcArray1[8], DestArray1[8], BenchArraySize);
MoveJOH_SSE_9(DestArray2[8], SrcArray2[8], BenchArraySize);
end;
for I3 := 1 to NOOFMOVESPERRUN do
begin
MoveJOH_SSE_9(SrcArray3[8], DestArray3[8], BenchArraySize);
MoveJOH_SSE_9(DestArray4[8], SrcArray4[8], BenchArraySize);
end;
for I4 := 1 to NOOFMOVESPERRUN do
begin
MoveJOH_SSE_9(SrcArray5[8], DestArray5[8], BenchArraySize);
MoveJOH_SSE_9(DestArray6[8], SrcArray6[8], BenchArraySize);
end;
for I5 := 1 to NOOFMOVESPERRUN do
begin
MoveJOH_SSE_9(SrcArray7[8], DestArray7[8], BenchArraySize);
MoveJOH_SSE_9(DestArray8[8], SrcArray8[8], BenchArraySize);
end;
BenchArraySize := BenchArraySize * STEPSIZE;
FBenchmark.UpdateUsageStatistics;
end;
end;
end;
class function TMoveThreads1.GetBenchmarkDescription: string;
begin
Result := 'A benchmark that tests high speed Move with SSE. '
+ 'Gives bonus for 16 byte aligned blocks. '
+ 'Benchmark submitted by Dennis Kjaer Christensen.';
end;
class function TMoveThreads1.GetBenchmarkName: string;
begin
Result := 'Move Benchmark1 4 arrays at a time';
end;
class function TMoveThreads1.GetCategory: TBenchmarkCategory;
begin
Result := bmMemoryUsage;
end;
class function TMoveThreads1.GetSpeedWeight: Double;
begin
Result := 0.9;
end;
procedure TMoveThreads1.RunBenchmark;
var
MoveThread1 : TMoveThread1;
begin
inherited;
MoveThread1 := TMoveThread1.Create(True);
MoveThread1.FreeOnTerminate := False;
MoveThread1.FBenchmark := Self;
MoveThread1.Resume;
MoveThread1.WaitFor;
MoveThread1.Free;
end;
end.
--
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at få 79 spam-mails.
Betalende brugere får ikke denne besked i deres e-mails.
Hent den gratis her: www.spamfighter.dk
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 1:23 pm Post subject: Re: MM call for benchmarks |
|
|
unit MoveBenchmark2Unit;
interface
uses Windows, BenchmarkClassUnit, Classes, Math;
type
TMoveThreads2 = class(TFastcodeMMBenchmark)
public
procedure RunBenchmark; override;
class function GetBenchmarkName: string; override;
class function GetBenchmarkDescription: string; override;
class function GetSpeedWeight: Double; override;
class function GetCategory: TBenchmarkCategory; override;
end;
implementation
uses SysUtils, MoveJOHUnit9;
type
TMoveThread2 = class(TThread)
FBenchmark: TFastcodeMMBenchmark;
procedure Execute; override;
end;
procedure TMoveThread2.Execute;
var
I1, J1, J2, J3, J4, J5, J6, J7, J8 : Integer;
//Need many arrays because a 4 byte aligned array can be 16 byte aligned by
pure chance
//Reallocs migth change alignment
SrcArray1 : array of Byte;
DestArray1 : array of Byte;
SrcArray2 : array of Byte;
DestArray2 : array of Byte;
SrcArray3 : array of Byte;
DestArray3 : array of Byte;
SrcArray4 : array of Byte;
DestArray4 : array of Byte;
SrcArray5 : array of Byte;
DestArray5 : array of Byte;
SrcArray6 : array of Byte;
DestArray6 : array of Byte;
SrcArray7 : array of Byte;
DestArray7 : array of Byte;
SrcArray8 : array of Byte;
DestArray8 : array of Byte;
BenchArraySize : Integer;
const
NOOFRUNS : Integer = 2;
MINBENCHARRAYSIZE : Integer = 2000;
MAXBENCHARRAYSIZE : Integer = 2000000;
STEPSIZE : Integer = 2;
NOOFMOVESPERRUN : Integer = 1000;
begin
for I1 := 0 to NOOFRUNS do
begin
BenchArraySize := MINBENCHARRAYSIZE;
while BenchArraySize <= MAXBENCHARRAYSIZE do
begin
SetLength(SrcArray1, BenchArraySize+ ;
SetLength(DestArray1, BenchArraySize+ ;
SetLength(SrcArray2, BenchArraySize+ ;
SetLength(DestArray2, BenchArraySize+ ;
SetLength(SrcArray3, BenchArraySize+ ;
SetLength(DestArray3, BenchArraySize+ ;
SetLength(SrcArray4, BenchArraySize+ ;
SetLength(DestArray4, BenchArraySize+ ;
SetLength(SrcArray5, BenchArraySize+ ;
SetLength(DestArray5, BenchArraySize+ ;
SetLength(SrcArray6, BenchArraySize+ ;
SetLength(DestArray6, BenchArraySize+ ;
SetLength(SrcArray7, BenchArraySize+ ;
SetLength(DestArray7, BenchArraySize+ ;
SetLength(SrcArray8, BenchArraySize+ ;
SetLength(DestArray8, BenchArraySize+ ;
for J1 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(SrcArray1[8], DestArray1[8], BenchArraySize);
for J2 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(DestArray2[8], SrcArray2[8], BenchArraySize);
for J3 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(SrcArray3[8], DestArray3[8], BenchArraySize);
for J4 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(DestArray4[8], SrcArray4[8], BenchArraySize);
for J5 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(SrcArray5[8], DestArray5[8], BenchArraySize);
for J6 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(DestArray6[8], SrcArray6[8], BenchArraySize);
for J7 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(SrcArray7[8], DestArray7[8], BenchArraySize);
for J8 := 1 to NOOFMOVESPERRUN do
MoveJOH_SSE_9(DestArray8[8], SrcArray8[8], BenchArraySize);
BenchArraySize := BenchArraySize * STEPSIZE;
FBenchmark.UpdateUsageStatistics;
end;
end;
end;
class function TMoveThreads2.GetBenchmarkDescription: string;
begin
Result := 'A benchmark that tests high speed Move with SSE. '
+ 'Gives bonus for 16 byte aligned blocks. '
+ 'Benchmark submitted by Dennis Kjaer Christensen.';
end;
class function TMoveThreads2.GetBenchmarkName: string;
begin
Result := 'Move Benchmark2 2 arrays at a time';
end;
class function TMoveThreads2.GetCategory: TBenchmarkCategory;
begin
Result := bmMemoryUsage;
end;
class function TMoveThreads2.GetSpeedWeight: Double;
begin
Result := 0.9;
end;
procedure TMoveThreads2.RunBenchmark;
var
MoveThread2 : TMoveThread2;
begin
inherited;
MoveThread2 := TMoveThread2.Create(True);
MoveThread2.FreeOnTerminate := False;
MoveThread2.FBenchmark := Self;
MoveThread2.Resume;
MoveThread2.WaitFor;
MoveThread2.Free;
end;
end.
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 02, 2005 1:40 pm Post subject: Re: MM call for benchmarks |
|
|
| Quote: | Let us try it. It is simple to call FillChar on the allocated memory.
|
But then you may not be testing usage, just initialization.
Usage means accessing the allocated memory several times,
in a fashion that depends on an algorithm. The point here would
be to reproduce specific real-world usage patterns rather than
guesstimate them.
Eric
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 2:11 pm Post subject: Re: MM call for benchmarks |
|
|
Hi Eric
| Quote: | But then you may not be testing usage, just initialization.
|
I agree.
Calling FillChar twice in replays is better than doing nothing?
And then add other realistic benchmarks.
Regards
Dennis
--
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at få 79 spam-mails.
Betalende brugere får ikke denne besked i deres e-mails.
Hent den gratis her: www.spamfighter.dk
|
|
| Back to top |
|
 |
Robert Houdart Guest
|
Posted: Mon May 02, 2005 2:40 pm Post subject: Re: MM call for benchmarks |
|
|
"Dennis" <marianndkc (AT) home3 (DOT) gvdnet.dk> wrote in message...
| Quote: | Calling FillChar twice in replays is better than doing nothing?
|
Since cache lines are 64-byte, another idea is to read or write a byte in
each individual 64-byte block. More or less like the following:
procedure UseMemory(p: PChar; Size: integer);
var
pend: PChar;
begin
pend := p + size;
while p < pend do begin
p^ := #99;
inc(p, 64);
end;
end;
|
|
| Back to top |
|
 |
Eric W. Carman Guest
|
Posted: Mon May 02, 2005 4:11 pm Post subject: Re: MM call for benchmarks |
|
|
What about allocating large arrays of various types (Integer, String, etc.)
and filling them with random values. Then perform a sort on them. That
should exercise the memory in a real-world sense.
Thoughts?
Best Regards,
Eric
"Eric Grange" <egrangeNO (AT) SPAMglscene (DOT) org> wrote
| Quote: | Could those involved - or not - in the MM challenge supply
small "memory usage" benchmarks?
|
|
|
| Back to top |
|
 |
Brian Cook Guest
|
Posted: Mon May 02, 2005 4:29 pm Post subject: Re: MM call for benchmarks |
|
|
| Quote: | Could those involved - or not - in the MM challenge supply
small "memory usage" benchmarks?
|
I'm not involved but I have a suggestion. I would be apprehensive to
use a memory manager that did not behave predictably when the operating
system did not fulfill a memory request. At a minimum, the memory
manager should not fault or return an invalid address.
I suggest adding a pass / fail test for out of memory behaviour.
- Brian
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 02, 2005 4:34 pm Post subject: Re: MM call for benchmarks |
|
|
| Quote: | Since cache lines are 64-byte, another idea is to read or write a byte in
each individual 64-byte block. More or less like the following:
|
Yep.
To mimic usage patterns we may refine it to:
- write all cache lines in a newly allocated block
- read some (random?) cache lines in a block before a realloc,
then write to all new cache lines after a realloc (if it grew)
- read all cache lines in a freed block
Adding some random accesses amidst the benchmark too, like reading
cache lines from blocks allocated at the same time or from 2+ blocks
chosen randomly.
Eric
|
|
| Back to top |
|
 |
Eric Grange Guest
|
Posted: Mon May 02, 2005 4:36 pm Post subject: Re: MM call for benchmarks |
|
|
| Quote: | I suggest adding a pass / fail test for out of memory behaviour.
|
AFAIK it's already there (Validate13 I think), it verifies that you
get an EOutOfMemory when running out of memory. Don't run it with
other programs running, the Delphi IDE f.i. doesn't take kindly
to getting an out of memory itself... even if the B&V survives,
other programs may not :p
Eric
|
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Mon May 02, 2005 5:38 pm Post subject: Re: MM call for benchmarks |
|
|
Hi
I agree.
Perhaps using Selection sort?
Not Bubble.
Regards
Dennis
|
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|