 |
BorlandTalk.com Borland discussion newsgroups
|
| View previous topic :: View next topic |
| Author |
Message |
Nils Haeck Guest
|
Posted: Wed Apr 25, 2007 4:23 am Post subject: MMX for inverse DCT |
|
|
Hi Guys,
For my Jpeg codec, I'd like to create an MMX-enabled version of the fast
inverse Discrete Cosine transform. Here's the salient piece of code in
pascal:
// Even part
tmp0 := p0^ * Quant[QIdx ];
tmp1 := p2^ * Quant[QIdx + 16];
tmp2 := p4^ * Quant[QIdx + 32];
tmp3 := p6^ * Quant[QIdx + 48];
tmp10 := tmp0 + tmp2;
tmp11 := tmp0 - tmp2;
tmp13 := tmp1 + tmp3;
tmp12 := Multiply(tmp1 - tmp3, FIX_1_414213562) - tmp13;
tmp0 := tmp10 + tmp13;
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;
// Odd part
tmp4 := p1^ * Quant[QIdx + 8];
tmp5 := p3^ * Quant[QIdx + 24];
tmp6 := p5^ * Quant[QIdx + 40];
tmp7 := p7^ * Quant[QIdx + 56];
z13 := tmp6 + tmp5;
z10 := tmp6 - tmp5;
z11 := tmp4 + tmp7;
z12 := tmp4 - tmp7;
tmp7 := z11 + z13;
tmp11 := Multiply(z11 - z13, FIX_1_414213562);
z5 := Multiply(z10 + z12, FIX_1_847759065);
tmp10 := Multiply(z12, FIX_1_082392200) - z5;
tmp12 := Multiply(z10, - FIX_2_613125930) + z5;
tmp6 := tmp12 - tmp7;
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;
w0^ := tmp0 + tmp7;
w7^ := tmp0 - tmp7;
w1^ := tmp1 + tmp6;
w6^ := tmp1 - tmp6;
w2^ := tmp2 + tmp5;
w5^ := tmp2 - tmp5;
w4^ := tmp3 + tmp4;
w3^ := tmp3 - tmp4;
p0..p7 and w0..w7 are pointers to integers (the routine works with 4byte
integers). The Multiply function is this:
function Multiply(A, B: integer): integer;
begin
Result := (A * B) div (1 shl 9); // seems to be converted to SAR by D7
fine.
end;
The Quant variable is just an array of 64 integers. The FIX are just integer
constants.
Above does one row of the 8x8 DCT transform. So this part is repeated 8
times. I was thinking I could use MMX, doing 2 of the rows simultaneously
(thus requiring to loop 4 times). The MMX registers would each store 2 4byte
integers.
There's also some range limiting going on later in the routine, for which (I
think) MMX is also suitable.
Questions:
- Is above assumption feasable, or are there perhaps too many temp
variables, so I would end up just copying back and forth from MMX registers
to memory?
- Would SSE be a better choice? (perhaps with "single" floating points)
- Would just plain ASM also help?
- Would multi-threading be a better option? (the IDCT for the total image
can be nicely split in a number of subbands, spread out over threads, since
they're all independent). I know that this won't improve things with just a
single-core CPU, but with the multi-core CPU's that seem to become popular,
it might help?
- Anyone interested in implementing this? (I would give this person a free
copy of my total jpeg codec library). Just a partial implementation would
help too. I can send anyone interested the unit + a test unit.
Kind regards,
Nils Haeck
www.simdesign.nl |
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Thu Apr 26, 2007 12:02 am Post subject: Re: MMX for inverse DCT |
|
|
Hi Nils
I see this as a good Fastcode challenge. I planned to do it a long time ago,
but never got to it.
What do others think?
Best regards
Dennis Kjaer Christensen |
|
| Back to top |
|
 |
Nils Haeck Guest
|
Posted: Fri Apr 27, 2007 3:39 am Post subject: Re: MMX for inverse DCT |
|
|
That would be cool!
Of course the IDCT is not only used in jpeg but also in mpeg. So creating a
*fast* IDCT routine will find uses in many places.
Nils
"Dennis" <marianndkc (AT) home3 (DOT) gvdnet.dk> schreef in bericht
news:462fa5db$1 (AT) newsgroups (DOT) borland.com...
| Quote: | Hi Nils
I see this as a good Fastcode challenge. I planned to do it a long time
ago,
but never got to it.
What do others think?
Best regards
Dennis Kjaer Christensen
|
|
|
| Back to top |
|
 |
Nils Haeck Guest
|
Posted: Fri Apr 27, 2007 3:47 am Post subject: Re: MMX for inverse DCT |
|
|
FWIW, there are 2 versions, the fast version and the accurate version. I
post the complete routines here:
The int multiplies work with 9bit accuracy. The FIX constants can thus be
calculated as e.g.:
FIX_0_298631336 = Round(1 shl 9 * 0.298631336)
I'm interested in MMX/SSE routines for both, although the fast version is
most important. Users that want to use the accurate routine expect it to
take longer anyway.
There are also versions for 4x4 and 2x2 (in order to get downscaled jpeg
images).
Nils
procedure InverseDCTIntFast8x8(var Coef: TsdCoefBlock; var Sample:
TsdSampleBlock; var Quant, Wrksp: TsdIntArray64);
var
i, QIdx: integer;
dci: integer;
dcs: byte;
p0, p1, p2, p3, p4, p5, p6, p7: Psmallint;
w0, w1, w2, w3, w4, w5, w6, w7: Pinteger;
s0, s1, s2, s3, s4, s5, s6, s7: Pbyte;
tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7: integer;
tmp10, tmp11, tmp12, tmp13: integer;
z5, z10, z11, z12, z13: integer;
begin
QIdx := 0;
// First do the columns
p0 := @Coef[ 0]; p1 := @Coef[ 8]; p2 := @Coef[16]; p3 := @Coef[24];
p4 := @Coef[32]; p5 := @Coef[40]; p6 := @Coef[48]; p7 := @Coef[56];
w0 := @Wrksp[ 0]; w1 := @Wrksp[ 8]; w2 := @Wrksp[16]; w3 := @Wrksp[24];
w4 := @Wrksp[32]; w5 := @Wrksp[40]; w6 := @Wrksp[48]; w7 := @Wrksp[56];
for i := 0 to 7 do
begin
if (p1^ = 0) and (p2^ = 0) and (p3^ = 0) and (p4^ = 0) and
(p5^ = 0) and (p6^ = 0) and (p7^ = 0) then
begin
dci := p0^ * Quant[QIdx];
w0^ := dci; w1^ := dci; w2^ := dci; w3^ := dci;
w4^ := dci; w5^ := dci; w6^ := dci; w7^ := dci;
end else
begin
// Even part
tmp0 := p0^ * Quant[QIdx ];
tmp1 := p2^ * Quant[QIdx + 16];
tmp2 := p4^ * Quant[QIdx + 32];
tmp3 := p6^ * Quant[QIdx + 48];
tmp10 := tmp0 + tmp2; // phase 3
tmp11 := tmp0 - tmp2;
tmp13 := tmp1 + tmp3; // phases 5-3
tmp12 := Multiply(tmp1 - tmp3, FIX_1_414213562) - tmp13; // 2*c4
tmp0 := tmp10 + tmp13; // phase 2
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;
// Odd part
tmp4 := p1^ * Quant[QIdx + 8];
tmp5 := p3^ * Quant[QIdx + 24];
tmp6 := p5^ * Quant[QIdx + 40];
tmp7 := p7^ * Quant[QIdx + 56];
z13 := tmp6 + tmp5; // phase 6
z10 := tmp6 - tmp5;
z11 := tmp4 + tmp7;
z12 := tmp4 - tmp7;
tmp7 := z11 + z13; // phase 5
tmp11 := Multiply(z11 - z13, FIX_1_414213562); // 2*c4
z5 := Multiply(z10 + z12, FIX_1_847759065); // 2*c2
tmp10 := Multiply(z12, FIX_1_082392200) - z5; // 2*(c2-c6)
tmp12 := Multiply(z10, - FIX_2_613125930) + z5; // -2*(c2+c6)
tmp6 := tmp12 - tmp7; // phase 2
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;
w0^ := tmp0 + tmp7;
w7^ := tmp0 - tmp7;
w1^ := tmp1 + tmp6;
w6^ := tmp1 - tmp6;
w2^ := tmp2 + tmp5;
w5^ := tmp2 - tmp5;
w4^ := tmp3 + tmp4;
w3^ := tmp3 - tmp4;
end;
// Advance block pointers
inc(p0); inc(p1); inc(p2); inc(p3); inc(p4); inc(p5); inc(p6); inc(p7);
inc(w0); inc(w1); inc(w2); inc(w3); inc(w4); inc(w5); inc(w6); inc(w7);
inc(QIdx);
end;
// Next do the rows
w0 := @Wrksp[0]; w1 := @Wrksp[1]; w2 := @Wrksp[2]; w3 := @Wrksp[3];
w4 := @Wrksp[4]; w5 := @Wrksp[5]; w6 := @Wrksp[6]; w7 := @Wrksp[7];
s0 := @Sample[0]; s1 := @Sample[1]; s2 := @Sample[2]; s3 := @Sample[3];
s4 := @Sample[4]; s5 := @Sample[5]; s6 := @Sample[6]; s7 := @Sample[7];
for i := 0 to 7 do
begin
if (w1^ = 0) and (w2^ = 0) and (w3^ = 0) and (w4^ = 0) and
(w5^ = 0) and (w6^ = 0) and (w7^ = 0) then
begin
dcs := RangeLimit(w0^);
s0^ := dcs; s1^ := dcs; s2^ := dcs; s3^ := dcs;
s4^ := dcs; s5^ := dcs; s6^ := dcs; s7^ := dcs;
end else
begin
// Even part
tmp10 := w0^ + w4^;
tmp11 := w0^ - w4^;
tmp13 := w2^ + w6^;
tmp12 := Multiply(w2^ - w6^, FIX_1_414213562) - tmp13;
tmp0 := tmp10 + tmp13;
tmp3 := tmp10 - tmp13;
tmp1 := tmp11 + tmp12;
tmp2 := tmp11 - tmp12;
// Odd part
z13 := w5^ + w3^;
z10 := w5^ - w3^;
z11 := w1^ + w7^;
z12 := w1^ - w7^;
tmp7 := z11 + z13; // phase 5
tmp11 := Multiply(z11 - z13, FIX_1_414213562); // 2*c4
z5 := Multiply(z10 + z12, FIX_1_847759065); // 2*c2
tmp10 := Multiply(z12, FIX_1_082392200) - z5; // 2*(c2-c6)
tmp12 := Multiply(z10, - FIX_2_613125930) + z5; // -2*(c2+c6)
tmp6 := tmp12 - tmp7; // phase 2
tmp5 := tmp11 - tmp6;
tmp4 := tmp10 + tmp5;
// Final output stage: scale down by a factor of 8 and range-limit
s0^ := RangeLimit(tmp0 + tmp7);
s7^ := RangeLimit(tmp0 - tmp7);
s1^ := RangeLimit(tmp1 + tmp6);
s6^ := RangeLimit(tmp1 - tmp6);
s2^ := RangeLimit(tmp2 + tmp5);
s5^ := RangeLimit(tmp2 - tmp5);
s4^ := RangeLimit(tmp3 + tmp4);
s3^ := RangeLimit(tmp3 - tmp4);
end;
// Advance block pointers
inc(s0, ; inc(s1, ; inc(s2, ; inc(s3, ;
inc(s4, ; inc(s5, ; inc(s6, ; inc(s7, ;
inc(w0, ; inc(w1, ; inc(w2, ; inc(w3, ;
inc(w4, ; inc(w5, ; inc(w6, ; inc(w7, ;
end;
end;
procedure InverseDCTIntAccurate8x8(var Coef: TsdCoefBlock; var Sample:
TsdSampleBlock; var Quant, Wrksp: TsdIntArray64);
var
i, QIdx: integer;
dci: integer;
dcs: byte;
p0, p1, p2, p3, p4, p5, p6, p7: Psmallint;
w0, w1, w2, w3, w4, w5, w6, w7: Pinteger;
s0, s1, s2, s3, s4, s5, s6, s7: Pbyte;
z1, z2, z3, z4, z5, z10, z11, z12, z13: integer;
tmp0, tmp1, tmp2, tmp3, tmp4, tmp5, tmp6, tmp7, tmp10, tmp11, tmp12,
tmp13: integer;
begin
QIdx := 0;
// First do the columns
p0 := @Coef[ 0]; p1 := @Coef[ 8]; p2 := @Coef[16]; p3 := @Coef[24];
p4 := @Coef[32]; p5 := @Coef[40]; p6 := @Coef[48]; p7 := @Coef[56];
w0 := @Wrksp[ 0]; w1 := @Wrksp[ 8]; w2 := @Wrksp[16]; w3 := @Wrksp[24];
w4 := @Wrksp[32]; w5 := @Wrksp[40]; w6 := @Wrksp[48]; w7 := @Wrksp[56];
for i := 0 to 7 do
begin
if (p1^ = 0) and (p2^ = 0) and (p3^ = 0) and (p4^ = 0) and
(p5^ = 0) and (p6^ = 0) and (p7^ = 0) then
begin
dci := p0^ * Quant[QIdx];
w0^ := dci; w1^ := dci; w2^ := dci; w3^ := dci;
w4^ := dci; w5^ := dci; w6^ := dci; w7^ := dci;
end else
begin
{ Even part: reverse the even part of the forward DCT. }
{ The rotator is sqrt(2)*c(-6). }
z2 := p2^ * Quant[QIdx + 2 * 8];
z3 := p6^ * Quant[QIdx + 6 * 8];
z1 := MULTIPLY(z2 + z3, FIX_0_541196100);
tmp2 := z1 + MULTIPLY(z3, - FIX_1_847759065);
tmp3 := z1 + MULTIPLY(z2, FIX_0_765366865);
z2 := p0^ * Quant[QIdx + 0 * 8];
z3 := p4^ * Quant[QIdx + 4 * 8];
tmp0 := (z2 + z3);
tmp1 := (z2 - z3);
tmp10 := tmp0 + tmp3;
tmp13 := tmp0 - tmp3;
tmp11 := tmp1 + tmp2;
tmp12 := tmp1 - tmp2;
{ Odd part per figure 8; the matrix is unitary and hence its
transpose is its inverse. i0..i3 are y7,y5,y3,y1 respectively. }
tmp0 := p7^ * Quant[QIdx + 7 * 8];
tmp1 := p5^ * Quant[QIdx + 5 * 8];
tmp2 := p3^ * Quant[QIdx + 3 * 8];
tmp3 := p1^ * Quant[QIdx + 1 * 8];
z1 := tmp0 + tmp3;
z2 := tmp1 + tmp2;
z3 := tmp0 + tmp2;
z4 := tmp1 + tmp3;
z5 := MULTIPLY(z3 + z4, FIX_1_175875602); { sqrt(2) * c3 }
tmp0 := MULTIPLY(tmp0, FIX_0_298631336); { sqrt(2) * (-c1+c3+c5-c7) }
tmp1 := MULTIPLY(tmp1, FIX_2_053119869); { sqrt(2) * ( c1+c3-c5+c7) }
tmp2 := MULTIPLY(tmp2, FIX_3_072711026); { sqrt(2) * ( c1+c3+c5-c7) }
tmp3 := MULTIPLY(tmp3, FIX_1_501321110); { sqrt(2) * ( c1+c3-c5-c7) }
z1 := MULTIPLY(z1, - FIX_0_899976223); { sqrt(2) * (c7-c3) }
z2 := MULTIPLY(z2, - FIX_2_562915447); { sqrt(2) * (-c1-c3) }
z3 := MULTIPLY(z3, - FIX_1_961570560); { sqrt(2) * (-c3-c5) }
z4 := MULTIPLY(z4, - FIX_0_390180644); { sqrt(2) * (c5-c3) }
Inc(z3, z5);
Inc(z4, z5);
Inc(tmp0, z1 + z3);
Inc(tmp1, z2 + z4);
Inc(tmp2, z2 + z3);
Inc(tmp3, z1 + z4);
{ Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 }
w0^ := tmp10 + tmp3;
w7^ := tmp10 - tmp3;
w1^ := tmp11 + tmp2;
w6^ := tmp11 - tmp2;
w2^ := tmp12 + tmp1;
w5^ := tmp12 - tmp1;
w3^ := tmp13 + tmp0;
w4^ := tmp13 - tmp0;
end;
// Advance block pointers
inc(p0); inc(p1); inc(p2); inc(p3); inc(p4); inc(p5); inc(p6); inc(p7);
inc(w0); inc(w1); inc(w2); inc(w3); inc(w4); inc(w5); inc(w6); inc(w7);
inc(QIdx);
end;
// Next do the rows
w0 := @Wrksp[0]; w1 := @Wrksp[1]; w2 := @Wrksp[2]; w3 := @Wrksp[3];
w4 := @Wrksp[4]; w5 := @Wrksp[5]; w6 := @Wrksp[6]; w7 := @Wrksp[7];
s0 := @Sample[0]; s1 := @Sample[1]; s2 := @Sample[2]; s3 := @Sample[3];
s4 := @Sample[4]; s5 := @Sample[5]; s6 := @Sample[6]; s7 := @Sample[7];
for i := 0 to 7 do
begin
if (w1^ = 0) and (w2^ = 0) and (w3^ = 0) and (w4^ = 0) and
(w5^ = 0) and (w6^ = 0) and (w7^ = 0) then
begin
dcs := RangeLimit(w0^);
s0^ := dcs; s1^ := dcs; s2^ := dcs; s3^ := dcs;
s4^ := dcs; s5^ := dcs; s6^ := dcs; s7^ := dcs;
end else
begin
{ Even part: reverse the even part of the forward DCT. }
{ The rotator is sqrt(2)*c(-6). }
z2 := w2^;
z3 := w6^;
z1 := MULTIPLY(z2 + z3, FIX_0_541196100);
tmp2 := z1 + MULTIPLY(z3, - FIX_1_847759065);
tmp3 := z1 + MULTIPLY(z2, FIX_0_765366865);
tmp0 := w0^ + w4^;
tmp1 := w0^ - w4^;
tmp10 := tmp0 + tmp3;
tmp13 := tmp0 - tmp3;
tmp11 := tmp1 + tmp2;
tmp12 := tmp1 - tmp2;
{ Odd part per figure 8; the matrix is unitary and hence its
transpose is its inverse. i0..i3 are y7,y5,y3,y1 respectively. }
tmp0 := w7^;
tmp1 := w5^;
tmp2 := w3^;
tmp3 := w1^;
z1 := tmp0 + tmp3;
z2 := tmp1 + tmp2;
z3 := tmp0 + tmp2;
z4 := tmp1 + tmp3;
z5 := MULTIPLY(z3 + z4, FIX_1_175875602); { sqrt(2) * c3 }
tmp0 := MULTIPLY(tmp0, FIX_0_298631336); { sqrt(2) * (-c1+c3+c5-c7) }
tmp1 := MULTIPLY(tmp1, FIX_2_053119869); { sqrt(2) * ( c1+c3-c5+c7) }
tmp2 := MULTIPLY(tmp2, FIX_3_072711026); { sqrt(2) * ( c1+c3+c5-c7) }
tmp3 := MULTIPLY(tmp3, FIX_1_501321110); { sqrt(2) * ( c1+c3-c5-c7) }
z1 := MULTIPLY(z1, - FIX_0_899976223); { sqrt(2) * (c7-c3) }
z2 := MULTIPLY(z2, - FIX_2_562915447); { sqrt(2) * (-c1-c3) }
z3 := MULTIPLY(z3, - FIX_1_961570560); { sqrt(2) * (-c3-c5) }
z4 := MULTIPLY(z4, - FIX_0_390180644); { sqrt(2) * (c5-c3) }
Inc(z3, z5);
Inc(z4, z5);
Inc(tmp0, z1 + z3);
Inc(tmp1, z2 + z4);
Inc(tmp2, z2 + z3);
Inc(tmp3, z1 + z4);
{ Final output stage: inputs are tmp10..tmp13, tmp0..tmp3 }
s0^ := RangeLimit(tmp10 + tmp3);
s7^ := RangeLimit(tmp10 - tmp3);
s1^ := RangeLimit(tmp11 + tmp2);
s6^ := RangeLimit(tmp11 - tmp2);
s2^ := RangeLimit(tmp12 + tmp1);
s5^ := RangeLimit(tmp12 - tmp1);
s3^ := RangeLimit(tmp13 + tmp0);
s4^ := RangeLimit(tmp13 - tmp0);
end;
// Advance block pointers
inc(s0, ; inc(s1, ; inc(s2, ; inc(s3, ;
inc(s4, ; inc(s5, ; inc(s6, ; inc(s7, ;
inc(w0, ; inc(w1, ; inc(w2, ; inc(w3, ;
inc(w4, ; inc(w5, ; inc(w6, ; inc(w7, ;
end;
end; |
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Sat Apr 28, 2007 3:17 pm Post subject: Re: MMX for inverse DCT |
|
|
Hi
Who volunteers to do the B&V?
Best regards
Dennis Kjaer Christensen |
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Sat Apr 28, 2007 10:59 pm Post subject: Re: MMX for inverse DCT |
|
|
Hi Nils
Have you posted the type declarations?
TsdCoefBlock
TsdSampleBlock
TsdIntArray64
I am working on a 0.0.1 B&V. As the numbers say I is just a very basic start
;-)
Best regards
Dennis Kjaer Christensen |
|
| Back to top |
|
 |
Nils Haeck Guest
|
Posted: Mon Apr 30, 2007 4:14 am Post subject: Re: MMX for inverse DCT |
|
|
Hi Dennis,
Even for a very basic start: thanks a lot!
Maybe a stupid question, but what means "B&V"?
Here are the declarations:
TsdCoefBlock = array[0..63] of smallint;
PsdCoefBlock = ^TsdCoefBlock;
TsdSampleBlock = array[0..63] of byte;
PsdSampleBlock = ^TsdSampleBlock;
TsdIntArray64 = array[0..63] of integer;
And here are the constants + additional functions, just in case:
const
// we use 9 bits of precision, so must multiply by 2^9
cIAccConstScale = 1 shl 9;
const
FIX_0_298631336 = Round(cIAccConstScale * 0.298631336);
FIX_0_390180644 = Round(cIAccConstScale * 0.390180644);
FIX_0_541196100 = Round(cIAccConstScale * 0.541196100);
FIX_0_765366865 = Round(cIAccConstScale * 0.765366865);
FIX_0_899976223 = Round(cIAccConstScale * 0.899976223);
FIX_1_175875602 = Round(cIAccConstScale * 1.175875602);
FIX_1_501321110 = Round(cIAccConstScale * 1.501321110);
FIX_1_847759065 = Round(cIAccConstScale * 1.847759065);
FIX_1_961570560 = Round(cIAccConstScale * 1.961570560);
FIX_2_053119869 = Round(cIAccConstScale * 2.053119869);
FIX_2_562915447 = Round(cIAccConstScale * 2.562915447);
FIX_3_072711026 = Round(cIAccConstScale * 3.072711026);
// integer multiply with shift arithmetic right
function Multiply(A, B: integer): integer;
begin
// Delphi seems to convert the "div" here to SAR just fine (D7), so we
// don't use ASM but plain pascal
Result := (A * B) div (1 shl 9);
end;
// Descale and range limit to byte domain. We shift right over
// 12 bits: 9 bits to remove precision, and 3 bits to get rid of the
additional
// factor 8 introducted by the IDCT transform.
function RangeLimit(A: integer): integer;
begin
// Delphi seems to convert the "div" here to SAR just fine (D7), so we
// don't use ASM but plain pascal
Result := A div (1 shl 12) + 128;
if Result < 0 then
Result := 0
else
if Result > 255 then
Result := 255;
end;
"Dennis" <marianndkc (AT) home3 (DOT) gvdnet.dk> schreef in bericht
news:46338b9e$1 (AT) newsgroups (DOT) borland.com...
| Quote: | Hi Nils
Have you posted the type declarations?
TsdCoefBlock
TsdSampleBlock
TsdIntArray64
I am working on a 0.0.1 B&V. As the numbers say I is just a very basic
start
;-)
Best regards
Dennis Kjaer Christensen
|
|
|
| Back to top |
|
 |
Lord Crc Guest
|
Posted: Mon Apr 30, 2007 4:34 pm Post subject: Re: MMX for inverse DCT |
|
|
On Mon, 30 Apr 2007 01:14:04 +0200, "Nils Haeck" <bla (AT) bla (DOT) com> wrote:
| Quote: | Maybe a stupid question, but what means "B&V"?
|
Benchmark and Validation, used to time and verify entries :)
- Asbjørn |
|
| Back to top |
|
 |
Dennis Guest
|
Posted: Tue May 01, 2007 10:41 pm Post subject: Re: MMX for inverse DCT |
|
|
Hi
| Quote: | Benchmark and Validation, used to time and verify entries
|
Yes. We have one B&V tool for each challenge
http://fastcode.sourceforge.net/
Best regards
Dennis Kjaer Christensen |
|
| Back to top |
|
 |
Sanyin Guest
|
Posted: Mon May 07, 2007 8:04 pm Post subject: Re: MMX for inverse DCT |
|
|
"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:462e8f35 (AT) newsgroups (DOT) borland.com...
| Quote: | Hi Guys,
please send me copy of your codec, I have some IDCT asm implementations |
prevodilac @ hotmail.com |
|
| Back to top |
|
 |
Nils Haeck Guest
|
Posted: Tue May 08, 2007 2:11 am Post subject: Re: MMX for inverse DCT |
|
|
Do you have a proper website + email address instead of a hotmail one?
Nils
www.simdesign.nl
"Sanyin" <prevodilac (AT) hotmail (DOT) com> schreef in bericht
news:463f3ff4 (AT) newsgroups (DOT) borland.com...
| Quote: |
"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:462e8f35 (AT) newsgroups (DOT) borland.com...
Hi Guys,
please send me copy of your codec, I have some IDCT asm implementations
prevodilac @ hotmail.com
|
|
|
| Back to top |
|
 |
Sanyin Guest
|
Posted: Tue May 08, 2007 8:12 am Post subject: Re: MMX for inverse DCT |
|
|
"Nils Haeck" <bla (AT) bla (DOT) com> wrote in message
news:463f93c8 (AT) newsgroups (DOT) borland.com...
| Quote: | Do you have a proper website + email address instead of a hotmail one?
|
Not yet, sorry.
I wonder if this SIMD jpeg library can be used with delphi (compiling with
BC++).
I need loading croped jpegs as fast as posible (for thumbnail), and reading
CMYK images, so your library would be nice solutions.
If you cant share it, I'll try to build my own.
Thanks. |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|