BorlandTalk.com Forum Index BorlandTalk.com
Borland discussion newsgroups
 
Archives   FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

MMX for inverse DCT

 
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Language BASM
View previous topic :: View next topic  
Author Message
Nils Haeck
Guest





PostPosted: Wed Apr 25, 2007 4:23 am    Post subject: MMX for inverse DCT Reply with quote



Hi Guys,

For my Jpeg codec, I'd like to create an MMX-enabled version of the fast
inverse Discrete Cosine transform. Here's the salient piece of code in
pascal:

// Even part

tmp0 := p0^ * Quant[QIdx ];
tmp1 := p2^ * Quant[QIdx + 16];
tmp2 := p4^ * Quant[QIdx + 32];
tmp3 := p6^ * Quant[QIdx + 48];

tmp10 := tmp0 + tmp2;
tmp11 := tmp0 - tmp2;

tmp13 := tmp1 + tmp3;
tmp12 := Multiply(tmp1 - tmp3, FIX_1_414213562) - tmp13;

tmp0 := tmp10 + tmp13;
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;

// Odd part

tmp4 := p1^ * Quant[QIdx + 8];
tmp5 := p3^ * Quant[QIdx + 24];
tmp6 := p5^ * Quant[QIdx + 40];
tmp7 := p7^ * Quant[QIdx + 56];

z13 := tmp6 + tmp5;
z10 := tmp6 - tmp5;
z11 := tmp4 + tmp7;
z12 := tmp4 - tmp7;

tmp7 := z11 + z13;
tmp11 := Multiply(z11 - z13, FIX_1_414213562);

z5 := Multiply(z10 + z12, FIX_1_847759065);
tmp10 := Multiply(z12, FIX_1_082392200) - z5;
tmp12 := Multiply(z10, - FIX_2_613125930) + z5;

tmp6 := tmp12 - tmp7;
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;

w0^ := tmp0 + tmp7;
w7^ := tmp0 - tmp7;
w1^ := tmp1 + tmp6;
w6^ := tmp1 - tmp6;
w2^ := tmp2 + tmp5;
w5^ := tmp2 - tmp5;
w4^ := tmp3 + tmp4;
w3^ := tmp3 - tmp4;

p0..p7 and w0..w7 are pointers to integers (the routine works with 4byte
integers). The Multiply function is this:

function Multiply(A, B: integer): integer;
begin
Result := (A * B) div (1 shl 9); // seems to be converted to SAR by D7
fine.
end;

The Quant variable is just an array of 64 integers. The FIX are just integer
constants.

Above does one row of the 8x8 DCT transform. So this part is repeated 8
times. I was thinking I could use MMX, doing 2 of the rows simultaneously
(thus requiring to loop 4 times). The MMX registers would each store 2 4byte
integers.

There's also some range limiting going on later in the routine, for which (I
think) MMX is also suitable.

Questions:

- Is above assumption feasable, or are there perhaps too many temp
variables, so I would end up just copying back and forth from MMX registers
to memory?

- Would SSE be a better choice? (perhaps with "single" floating points)

- Would just plain ASM also help?

- Would multi-threading be a better option? (the IDCT for the total image
can be nicely split in a number of subbands, spread out over threads, since
they're all independent). I know that this won't improve things with just a
single-core CPU, but with the multi-core CPU's that seem to become popular,
it might help?

- Anyone interested in implementing this? (I would give this person a free
copy of my total jpeg codec library). Just a partial implementation would
help too. I can send anyone interested the unit + a test unit.

Kind regards,

Nils Haeck
www.simdesign.nl
Back to top
Dennis
Guest





PostPosted: Thu Apr 26, 2007 12:02 am    Post subject: Re: MMX for inverse DCT Reply with quote



Hi Nils

I see this as a good Fastcode challenge. I planned to do it a long time ago,
but never got to it.

What do others think?

Best regards
Dennis Kjaer Christensen
Back to top
Nils Haeck
Guest





PostPosted: Fri Apr 27, 2007 3:39 am    Post subject: Re: MMX for inverse DCT Reply with quote



That would be cool!

Of course the IDCT is not only used in jpeg but also in mpeg. So creating a
*fast* IDCT routine will find uses in many places.

Nils

"Dennis" <marianndkc (AT) home3 (DOT) gvdnet.dk> schreef in bericht
news:462fa5db$1 (AT) newsgroups (DOT) borland.com...
Quote:
Hi Nils

I see this as a good Fastcode challenge. I planned to do it a long time
ago,
but never got to it.

What do others think?

Best regards
Dennis Kjaer Christensen

Back to top
Nils Haeck
Guest





PostPosted: Fri Apr 27, 2007 3:47 am    Post subject: Re: MMX for inverse DCT Reply with quote

FWIW, there are 2 versions, the fast version and the accurate version. I
post the complete routines here:

The int multiplies work with 9bit accuracy. The FIX constants can thus be
calculated as e.g.:

FIX_0_298631336 = Round(1 shl 9 * 0.298631336)

I'm interested in MMX/SSE routines for both, although the fast version is
most important. Users that want to use the accurate routine expect it to
take longer anyway.

There are also versions for 4x4 and 2x2 (in order to get downscaled jpeg
images).

Nils

procedure InverseDCTIntFast8x8(var Coef: TsdCoefBlock; var Sample:
TsdSampleBlock; var Quant, Wrksp: TsdIntArray64);
var
i, QIdx: integer;
dci: integer;
dcs: byte;
p0, p1, p2, p3, p4, p5, p6, p7: Psmallint;
w0, w1, w2, w3, w4, w5, w6, w7: Pinteger;
s0, s1, s2, s3, s4, s5, s6, s7: Pbyte;
tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7: integer;
tmp10, tmp11, tmp12, tmp13: integer;
z5, z10, z11, z12, z13: integer;
begin
QIdx := 0;
// First do the columns
p0 := @Coef[ 0]; p1 := @Coef[ 8]; p2 := @Coef[16]; p3 := @Coef[24];
p4 := @Coef[32]; p5 := @Coef[40]; p6 := @Coef[48]; p7 := @Coef[56];
w0 := @Wrksp[ 0]; w1 := @Wrksp[ 8]; w2 := @Wrksp[16]; w3 := @Wrksp[24];
w4 := @Wrksp[32]; w5 := @Wrksp[40]; w6 := @Wrksp[48]; w7 := @Wrksp[56];
for i := 0 to 7 do
begin
if (p1^ = 0) and (p2^ = 0) and (p3^ = 0) and (p4^ = 0) and
(p5^ = 0) and (p6^ = 0) and (p7^ = 0) then
begin
dci := p0^ * Quant[QIdx];
w0^ := dci; w1^ := dci; w2^ := dci; w3^ := dci;
w4^ := dci; w5^ := dci; w6^ := dci; w7^ := dci;
end else
begin
// Even part

tmp0 := p0^ * Quant[QIdx ];
tmp1 := p2^ * Quant[QIdx + 16];
tmp2 := p4^ * Quant[QIdx + 32];
tmp3 := p6^ * Quant[QIdx + 48];

tmp10 := tmp0 + tmp2; // phase 3
tmp11 := tmp0 - tmp2;

tmp13 := tmp1 + tmp3; // phases 5-3
tmp12 := Multiply(tmp1 - tmp3, FIX_1_414213562) - tmp13; // 2*c4

tmp0 := tmp10 + tmp13; // phase 2
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;

// Odd part

tmp4 := p1^ * Quant[QIdx + 8];
tmp5 := p3^ * Quant[QIdx + 24];
tmp6 := p5^ * Quant[QIdx + 40];
tmp7 := p7^ * Quant[QIdx + 56];

z13 := tmp6 + tmp5; // phase 6
z10 := tmp6 - tmp5;
z11 := tmp4 + tmp7;
z12 := tmp4 - tmp7;

tmp7 := z11 + z13; // phase 5
tmp11 := Multiply(z11 - z13, FIX_1_414213562); // 2*c4

z5 := Multiply(z10 + z12, FIX_1_847759065); // 2*c2
tmp10 := Multiply(z12, FIX_1_082392200) - z5; // 2*(c2-c6)
tmp12 := Multiply(z10, - FIX_2_613125930) + z5; // -2*(c2+c6)

tmp6 := tmp12 - tmp7; // phase 2
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;

w0^ := tmp0 + tmp7;
w7^ := tmp0 - tmp7;
w1^ := tmp1 + tmp6;
w6^ := tmp1 - tmp6;
w2^ := tmp2 + tmp5;
w5^ := tmp2 - tmp5;
w4^ := tmp3 + tmp4;
w3^ := tmp3 - tmp4;

end;
// Advance block pointers
inc(p0); inc(p1); inc(p2); inc(p3); inc(p4); inc(p5); inc(p6); inc(p7);
inc(w0); inc(w1); inc(w2); inc(w3); inc(w4); inc(w5); inc(w6); inc(w7);
inc(QIdx);
end;

// Next do the rows
w0 := @Wrksp[0]; w1 := @Wrksp[1]; w2 := @Wrksp[2]; w3 := @Wrksp[3];
w4 := @Wrksp[4]; w5 := @Wrksp[5]; w6 := @Wrksp[6]; w7 := @Wrksp[7];
s0 := @Sample[0]; s1 := @Sample[1]; s2 := @Sample[2]; s3 := @Sample[3];
s4 := @Sample[4]; s5 := @Sample[5]; s6 := @Sample[6]; s7 := @Sample[7];
for i := 0 to 7 do
begin
if (w1^ = 0) and (w2^ = 0) and (w3^ = 0) and (w4^ = 0) and
(w5^ = 0) and (w6^ = 0) and (w7^ = 0) then
begin
dcs := RangeLimit(w0^);
s0^ := dcs; s1^ := dcs; s2^ := dcs; s3^ := dcs;
s4^ := dcs; s5^ := dcs; s6^ := dcs; s7^ := dcs;
end else
begin
// Even part

tmp10 := w0^ + w4^;
tmp11 := w0^ - w4^;

tmp13 := w2^ + w6^;
tmp12 := Multiply(w2^ - w6^, FIX_1_414213562) - tmp13;

tmp0 := tmp10 + tmp13;
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;

// Odd part

z13 := w5^ + w3^;
z10 := w5^ - w3^;
z11 := w1^ + w7^;
z12 := w1^ - w7^;

tmp7 := z11 + z13; // phase 5
tmp11 := Multiply(z11 - z13, FIX_1_414213562); // 2*c4

z5 := Multiply(z10 + z12, FIX_1_847759065); // 2*c2
tmp10 := Multiply(z12, FIX_1_082392200) - z5; // 2*(c2-c6)
tmp12 := Multiply(z10, - FIX_2_613125930) + z5; // -2*(c2+c6)

tmp6 := tmp12 - tmp7; // phase 2
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;

// Final output stage: scale down by a factor of 8 and range-limit

s0^ := RangeLimit(tmp0 + tmp7);
s7^ := RangeLimit(tmp0 - tmp7);
s1^ := RangeLimit(tmp1 + tmp6);
s6^ := RangeLimit(tmp1 - tmp6);
s2^ := RangeLimit(tmp2 + tmp5);
s5^ := RangeLimit(tmp2 - tmp5);
s4^ := RangeLimit(tmp3 + tmp4);
s3^ := RangeLimit(tmp3 - tmp4);

end;
// Advance block pointers
inc(s0, Cool; inc(s1, Cool; inc(s2, Cool; inc(s3, Cool;
inc(s4, Cool; inc(s5, Cool; inc(s6, Cool; inc(s7, Cool;
inc(w0, Cool; inc(w1, Cool; inc(w2, Cool; inc(w3, Cool;
inc(w4, Cool; inc(w5, Cool; inc(w6, Cool; inc(w7, Cool;
end;
end;

procedure InverseDCTIntAccurate8x8(var Coef: TsdCoefBlock; var Sample:
TsdSampleBlock; var Quant, Wrksp: TsdIntArray64);
var
i, QIdx: integer;
dci: integer;
dcs: byte;
p0, p1, p2, p3, p4, p5, p6, p7: Psmallint;
w0, w1, w2, w3, w4, w5, w6, w7: Pinteger;
s0, s1, s2, s3, s4, s5, s6, s7: Pbyte;
z1, z2, z3, z4, z5, z10, z11, z12, z13: integer;
tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp10, tmp11, tmp12,
tmp13: integer;
begin
QIdx := 0;
// First do the columns
p0 := @Coef[ 0]; p1 := @Coef[ 8]; p2 := @Coef[16]; p3 := @Coef[24];
p4 := @Coef[32]; p5 := @Coef[40]; p6 := @Coef[48]; p7 := @Coef[56];
w0 := @Wrksp[ 0]; w1 := @Wrksp[ 8]; w2 := @Wrksp[16]; w3 := @Wrksp[24];
w4 := @Wrksp[32]; w5 := @Wrksp[40]; w6 := @Wrksp[48]; w7 := @Wrksp[56];
for i := 0 to 7 do
begin
if (p1^ = 0) and (p2^ = 0) and (p3^ = 0) and (p4^ = 0) and
(p5^ = 0) and (p6^ = 0) and (p7^ = 0) then
begin
dci := p0^ * Quant[QIdx];
w0^ := dci; w1^ := dci; w2^ := dci; w3^ := dci;
w4^ := dci; w5^ := dci; w6^ := dci; w7^ := dci;
end else
begin
{ Even part: reverse the even part of the forward DCT. }
{ The rotator is sqrt(2)*c(-6). }

z2 := p2^ * Quant[QIdx + 2 * 8];
z3 := p6^ * Quant[QIdx + 6 * 8];

z1 := MULTIPLY(z2 + z3, FIX_0_541196100);
tmp2 := z1 + MULTIPLY(z3, - FIX_1_847759065);
tmp3 := z1 + MULTIPLY(z2, FIX_0_765366865);

z2 := p0^ * Quant[QIdx + 0 * 8];
z3 := p4^ * Quant[QIdx + 4 * 8];

tmp0 := (z2 + z3);
tmp1 := (z2 - z3);

tmp10 := tmp0 + tmp3;
tmp13 := tmp0 - tmp3;
tmp11 := tmp1 + tmp2;
tmp12 := tmp1 - tmp2;

{ Odd part per figure 8; the matrix is unitary and hence its
transpose is its inverse. i0..i3 are y7,y5,y3,y1 respectively. }

tmp0 := p7^ * Quant[QIdx + 7 * 8];
tmp1 := p5^ * Quant[QIdx + 5 * 8];
tmp2 := p3^ * Quant[QIdx + 3 * 8];
tmp3 := p1^ * Quant[QIdx + 1 * 8];

z1 := tmp0 + tmp3;
z2 := tmp1 + tmp2;
z3 := tmp0 + tmp2;
z4 := tmp1 + tmp3;
z5 := MULTIPLY(z3 + z4, FIX_1_175875602); { sqrt(2) * c3 }

tmp0 := MULTIPLY(tmp0, FIX_0_298631336); { sqrt(2) * (-c1+c3+c5-c7) }
tmp1 := MULTIPLY(tmp1, FIX_2_053119869); { sqrt(2) * ( c1+c3-c5+c7) }
tmp2 := MULTIPLY(tmp2, FIX_3_072711026); { sqrt(2) * ( c1+c3+c5-c7) }
tmp3 := MULTIPLY(tmp3, FIX_1_501321110); { sqrt(2) * ( c1+c3-c5-c7) }
z1 := MULTIPLY(z1, - FIX_0_899976223); { sqrt(2) * (c7-c3) }
z2 := MULTIPLY(z2, - FIX_2_562915447); { sqrt(2) * (-c1-c3) }
z3 := MULTIPLY(z3, - FIX_1_961570560); { sqrt(2) * (-c3-c5) }
z4 := MULTIPLY(z4, - FIX_0_390180644); { sqrt(2) * (c5-c3) }

Inc(z3, z5);
Inc(z4, z5);

Inc(tmp0, z1 + z3);
Inc(tmp1, z2 + z4);
Inc(tmp2, z2 + z3);
Inc(tmp3, z1 + z4);

{ Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 }

w0^ := tmp10 + tmp3;
w7^ := tmp10 - tmp3;
w1^ := tmp11 + tmp2;
w6^ := tmp11 - tmp2;
w2^ := tmp12 + tmp1;
w5^ := tmp12 - tmp1;
w3^ := tmp13 + tmp0;
w4^ := tmp13 - tmp0;

end;
// Advance block pointers
inc(p0); inc(p1); inc(p2); inc(p3); inc(p4); inc(p5); inc(p6); inc(p7);
inc(w0); inc(w1); inc(w2); inc(w3); inc(w4); inc(w5); inc(w6); inc(w7);
inc(QIdx);
end;

// Next do the rows
w0 := @Wrksp[0]; w1 := @Wrksp[1]; w2 := @Wrksp[2]; w3 := @Wrksp[3];
w4 := @Wrksp[4]; w5 := @Wrksp[5]; w6 := @Wrksp[6]; w7 := @Wrksp[7];
s0 := @Sample[0]; s1 := @Sample[1]; s2 := @Sample[2]; s3 := @Sample[3];
s4 := @Sample[4]; s5 := @Sample[5]; s6 := @Sample[6]; s7 := @Sample[7];
for i := 0 to 7 do
begin
if (w1^ = 0) and (w2^ = 0) and (w3^ = 0) and (w4^ = 0) and
(w5^ = 0) and (w6^ = 0) and (w7^ = 0) then
begin
dcs := RangeLimit(w0^);
s0^ := dcs; s1^ := dcs; s2^ := dcs; s3^ := dcs;
s4^ := dcs; s5^ := dcs; s6^ := dcs; s7^ := dcs;
end else
begin
{ Even part: reverse the even part of the forward DCT. }
{ The rotator is sqrt(2)*c(-6). }

z2 := w2^;
z3 := w6^;

z1 := MULTIPLY(z2 + z3, FIX_0_541196100);
tmp2 := z1 + MULTIPLY(z3, - FIX_1_847759065);
tmp3 := z1 + MULTIPLY(z2, FIX_0_765366865);

tmp0 := w0^ + w4^;
tmp1 := w0^ - w4^;

tmp10 := tmp0 + tmp3;
tmp13 := tmp0 - tmp3;
tmp11 := tmp1 + tmp2;
tmp12 := tmp1 - tmp2;

{ Odd part per figure 8; the matrix is unitary and hence its
transpose is its inverse. i0..i3 are y7,y5,y3,y1 respectively. }

tmp0 := w7^;
tmp1 := w5^;
tmp2 := w3^;
tmp3 := w1^;

z1 := tmp0 + tmp3;
z2 := tmp1 + tmp2;
z3 := tmp0 + tmp2;
z4 := tmp1 + tmp3;
z5 := MULTIPLY(z3 + z4, FIX_1_175875602); { sqrt(2) * c3 }

tmp0 := MULTIPLY(tmp0, FIX_0_298631336); { sqrt(2) * (-c1+c3+c5-c7) }
tmp1 := MULTIPLY(tmp1, FIX_2_053119869); { sqrt(2) * ( c1+c3-c5+c7) }
tmp2 := MULTIPLY(tmp2, FIX_3_072711026); { sqrt(2) * ( c1+c3+c5-c7) }
tmp3 := MULTIPLY(tmp3, FIX_1_501321110); { sqrt(2) * ( c1+c3-c5-c7) }
z1 := MULTIPLY(z1, - FIX_0_899976223); { sqrt(2) * (c7-c3) }
z2 := MULTIPLY(z2, - FIX_2_562915447); { sqrt(2) * (-c1-c3) }
z3 := MULTIPLY(z3, - FIX_1_961570560); { sqrt(2) * (-c3-c5) }
z4 := MULTIPLY(z4, - FIX_0_390180644); { sqrt(2) * (c5-c3) }

Inc(z3, z5);
Inc(z4, z5);

Inc(tmp0, z1 + z3);
Inc(tmp1, z2 + z4);
Inc(tmp2, z2 + z3);
Inc(tmp3, z1 + z4);

{ Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 }

s0^ := RangeLimit(tmp10 + tmp3);
s7^ := RangeLimit(tmp10 - tmp3);
s1^ := RangeLimit(tmp11 + tmp2);
s6^ := RangeLimit(tmp11 - tmp2);
s2^ := RangeLimit(tmp12 + tmp1);
s5^ := RangeLimit(tmp12 - tmp1);
s3^ := RangeLimit(tmp13 + tmp0);
s4^ := RangeLimit(tmp13 - tmp0);

end;
// Advance block pointers
inc(s0, Cool; inc(s1, Cool; inc(s2, Cool; inc(s3, Cool;
inc(s4, Cool; inc(s5, Cool; inc(s6, Cool; inc(s7, Cool;
inc(w0, Cool; inc(w1, Cool; inc(w2, Cool; inc(w3, Cool;
inc(w4, Cool; inc(w5, Cool; inc(w6, Cool; inc(w7, Cool;
end;
end;
Back to top
Dennis
Guest





PostPosted: Sat Apr 28, 2007 3:17 pm    Post subject: Re: MMX for inverse DCT Reply with quote

Hi

Who volunteers to do the B&V?

Best regards
Dennis Kjaer Christensen
Back to top
Dennis
Guest





PostPosted: Sat Apr 28, 2007 10:59 pm    Post subject: Re: MMX for inverse DCT Reply with quote

Hi Nils

Have you posted the type declarations?

TsdCoefBlock
TsdSampleBlock
TsdIntArray64

I am working on a 0.0.1 B&V. As the numbers say I is just a very basic start
;-)

Best regards
Dennis Kjaer Christensen
Back to top
Nils Haeck
Guest





PostPosted: Mon Apr 30, 2007 4:14 am    Post subject: Re: MMX for inverse DCT Reply with quote

Hi Dennis,

Even for a very basic start: thanks a lot!

Maybe a stupid question, but what means "B&V"?

Here are the declarations:

TsdCoefBlock = array[0..63] of smallint;
PsdCoefBlock = ^TsdCoefBlock;
TsdSampleBlock = array[0..63] of byte;
PsdSampleBlock = ^TsdSampleBlock;
TsdIntArray64 = array[0..63] of integer;

And here are the constants + additional functions, just in case:

const
// we use 9 bits of precision, so must multiply by 2^9
cIAccConstScale = 1 shl 9;

const
FIX_0_298631336 = Round(cIAccConstScale * 0.298631336);
FIX_0_390180644 = Round(cIAccConstScale * 0.390180644);
FIX_0_541196100 = Round(cIAccConstScale * 0.541196100);
FIX_0_765366865 = Round(cIAccConstScale * 0.765366865);
FIX_0_899976223 = Round(cIAccConstScale * 0.899976223);
FIX_1_175875602 = Round(cIAccConstScale * 1.175875602);
FIX_1_501321110 = Round(cIAccConstScale * 1.501321110);
FIX_1_847759065 = Round(cIAccConstScale * 1.847759065);
FIX_1_961570560 = Round(cIAccConstScale * 1.961570560);
FIX_2_053119869 = Round(cIAccConstScale * 2.053119869);
FIX_2_562915447 = Round(cIAccConstScale * 2.562915447);
FIX_3_072711026 = Round(cIAccConstScale * 3.072711026);

// integer multiply with shift arithmetic right
function Multiply(A, B: integer): integer;
begin
// Delphi seems to convert the "div" here to SAR just fine (D7), so we
// don't use ASM but plain pascal
Result := (A * B) div (1 shl 9);
end;

// Descale and range limit to byte domain. We shift right over
// 12 bits: 9 bits to remove precision, and 3 bits to get rid of the
additional
// factor 8 introducted by the IDCT transform.
function RangeLimit(A: integer): integer;
begin
// Delphi seems to convert the "div" here to SAR just fine (D7), so we
// don't use ASM but plain pascal
Result := A div (1 shl 12) + 128;
if Result < 0 then
Result := 0
else
if Result > 255 then
Result := 255;
end;


"Dennis" <marianndkc (AT) home3 (DOT) gvdnet.dk> schreef in bericht
news:46338b9e$1 (AT) newsgroups (DOT) borland.com...
Quote:
Hi Nils

Have you posted the type declarations?

TsdCoefBlock
TsdSampleBlock
TsdIntArray64

I am working on a 0.0.1 B&V. As the numbers say I is just a very basic
start
;-)

Best regards
Dennis Kjaer Christensen

Back to top
Lord Crc
Guest





PostPosted: Mon Apr 30, 2007 4:34 pm    Post subject: Re: MMX for inverse DCT Reply with quote

On Mon, 30 Apr 2007 01:14:04 +0200, "Nils Haeck" <bla (AT) bla (DOT) com> wrote:

Quote:
Maybe a stupid question, but what means "B&V"?

Benchmark and Validation, used to time and verify entries :)

- Asbjørn
Back to top
Dennis
Guest





PostPosted: Tue May 01, 2007 10:41 pm    Post subject: Re: MMX for inverse DCT Reply with quote

Hi

Quote:
Benchmark and Validation, used to time and verify entries Smile

Yes. We have one B&V tool for each challenge

http://fastcode.sourceforge.net/

Best regards
Dennis Kjaer Christensen
Back to top
Sanyin
Guest





PostPosted: Mon May 07, 2007 8:04 pm    Post subject: Re: MMX for inverse DCT Reply with quote

"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:462e8f35 (AT) newsgroups (DOT) borland.com...
Quote:
Hi Guys,

please send me copy of your codec, I have some IDCT asm implementations


prevodilac @ hotmail.com
Back to top
Nils Haeck
Guest





PostPosted: Tue May 08, 2007 2:11 am    Post subject: Re: MMX for inverse DCT Reply with quote

Do you have a proper website + email address instead of a hotmail one?

Nils
www.simdesign.nl

"Sanyin" <prevodilac (AT) hotmail (DOT) com> schreef in bericht
news:463f3ff4 (AT) newsgroups (DOT) borland.com...
Quote:

"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:462e8f35 (AT) newsgroups (DOT) borland.com...
Hi Guys,

please send me copy of your codec, I have some IDCT asm implementations

prevodilac @ hotmail.com
Back to top
Sanyin
Guest





PostPosted: Tue May 08, 2007 8:12 am    Post subject: Re: MMX for inverse DCT Reply with quote

"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:463f93c8 (AT) newsgroups (DOT) borland.com...
Quote:
Do you have a proper website + email address instead of a hotmail one?

Not yet, sorry.
I wonder if this SIMD jpeg library can be used with delphi (compiling with
BC++).
I need loading croped jpegs as fast as posible (for thumbnail), and reading
CMYK images, so your library would be nice solutions.
If you cant share it, I'll try to build my own.
Thanks.
Back to top
Display posts from previous:   
Post new topic   Reply to topic    BorlandTalk.com Forum Index -> Delphi Language BASM All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2006 phpBB Group
SEO toolkit © 2004-2006 webmedic.