Post #72,433
1/4/03 8:50:00 AM
|
I've done the subtract
It returns 0 as near as I can tell.
The point is, regardless of error the calculation should be deterministic. If you do exactly the same thing on each side of the equality, you should get exactly the same result.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,461
1/4/03 12:57:27 PM
|
Assembly dump?
From my gcc .s intermediate file I get the following sequence (note I placed them side by side for comparison purposes - the sequence is to run thru the left side first): \nLC1:\n .long 0x0,0x40240000\n\n pushl $3 pushl $3\n fldl LC1 fldl LC1\n subl $8,%esp subl $8,%esp\n fstpl (%esp) fstpl (%esp)\n call _pow call _pow\n addl $16,%esp addl $16,%esp\n movl %eax,-4(%ebp) movl %eax,-4(%ebp)\n movl $123456,%ecx movl $123456,%ecx\n movl %ecx,%eax movl %ecx,%eax\n cltd cltd\n idivl -4(%ebp) idivl -4(%ebp)\n movl %eax,%ebx movl %eax,-4(%ebp)\n\n addl $-4,%esp cmpl -4(%ebp),%ebx\n Other than what registers (and offsets) are being used, the operations appear identical. On my tests, the equality test did return true.
|
Post #72,685
1/5/03 2:11:21 PM
|
I did my dump with GDB
same result. Except equality returns "false". And, when I save the results of 2 computations in variables, they have the same value, bit by bit. I tried float, double and long double.
I have a hypothesis: what if pow uses random rounding to help eliminate rounding errors? It's so random that single command difference in calling seq can affect it. Far fetched, I know.
--
We have only 2 things to worry about: That things will never get back to normal, and that they already have.
|
Post #72,893
1/6/03 10:34:38 AM
|
What I get:
\n\tmovl\t$0, 8(%esp)\n\tmovl\t$1074266112, 12(%esp)\n\tmovl\t$0, (%esp)\n\tmovl\t$1076101120, 4(%esp)\n\tcall\tpow\n\tfldl\t.LC0\n\tfdivp\t%st, %st(1)\n\tfstpl\t-8(%ebp)\n\n\tmovl\t$0, 8(%esp)\n\tmovl\t$1074266112, 12(%esp)\n\tmovl\t$0, (%esp)\n\tmovl\t$1076101120, 4(%esp)\n\tcall\tpow\n\tfldl\t.LC0\n\tfdivp\t%st, %st(1)\n\tfldl\t-8(%ebp)\n\n\tfucompp\n Looks the same to me, other than the fstdl vs. fldl.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,905
1/6/03 10:49:00 AM
|
Speculating...
...but I think that internally the FP processor is using Long Doubles (12 bytes) but it is storing the first result on the stack as a Double (8 bytes). That is, the operation that takes the first result out of the coprocessor: \tfstpl\t-8(%ebp) Results in a truncation of the result by placing it back on a stack. Notice how on the result of the second half of the compare, it is not pulled back locally - being left in the floating point register: \tfdivp\t%st, %st(1)\n\tfldl\t-8(%ebp) The compare is then done within the FPU. My guess is that the first calculation has been truncated to 8 bytes after being pulled out of the FPU, but the second calculation is never pulled out and remained in 12 byte form. Hence, you can get an inequality when the truncated bits differ. Just speculation, of course. :-)
|
Post #72,913
1/6/03 11:06:04 AM
|
gcc 2.95.4 results
\n.LM3:\n\taddl $-8,%esp\n\tfldl .LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl .LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n.LCFI3:\n\tcall pow\n\taddl $16,%esp\n\tfstpl -8(%ebp)\n\tfldl .LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl .LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall pow\n\taddl $16,%esp\n\tfldl .LC2\n\tfldl -8(%ebp)\n\tfdivrp %st,%st(1)\n\tfldl .LC2\n\tfdivp %st,%st(2)\n\tfucompp\n Being a Bear Of Little Clue when it comes to interpreting x86 assembler, I'm not quite sure what the above means... ;-) However, the comparison does work. :-P
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,925
1/6/03 11:35:48 AM
|
Re: gcc 2.95.4 results
ESP is the 32-bit stack pointer. Instructions that start with F are FPU operations. EBP is the 32-bit "base pointer" for 32-bit memory operations.
Unfortunately, the business end of things is missing, the call to "pow". That will almost certainly be an implementation of 2^(y log2 x) as mentined in the code I posted.
-drl
|
Post #72,950
1/6/03 1:18:01 PM
|
EBP is base of stack frame
EBP(-x) is a local variable, EBP(+x) is a function parameter. EBP(+4) (or may be EBP(0)), I suspect, is the function's return address.
--
We have only 2 things to worry about: That things will never get back to normal, and that they already have.
|
Post #72,941
1/6/03 12:30:53 PM
|
I86 Assembly is not my specialty...
...having done much more with the MC680x0 and MC6888x combinations. The considerations are similar, but I find x86 assembly to be harder to read and write. (Bottom line is I can guess, but it takes someone that is more up to speed on Assembly to spot the exact problem - I haven't done ASM for money in almost 10 years).
Ok, the difference between my original code and both your samples is that my code does the compare in the cpu not in the fpu. The difference between your code that returns equality and the one that doesn't is the order of the divide operations (fdivp). In the first code, the result of the left side is calculated all the way through the divide before the intermediate result is placed on the stack. Means the rounding occurs after the divide operation.
In the second case, the divides are saved as the last operation (kind of interspersing the calculation of the left and right hand sides). Means that there is some initial rounding which may occur but it is prior to the divide. Once division starts taking place, everything is done internal to the FPU.
Don't have enuf expertise to figure it out totally. It is surprising that the code has compiled out so differently - the newest code looks much cleaner (though it returns what appears to be an errant result).
|
Post #72,965
1/6/03 2:05:28 PM
1/6/03 2:12:59 PM
|
ASM Comments
Note: I've marked in red where I see the mismatch in precision being introduced. Although the 2.95 does test equal, I would speculate that if you placed some different numbers in the power function, it may also cause a failure in the equality (due to the fact that the result of the first power function is truncated to 64 bits, whereas the second one is carried on through the 80 bit FP register). ggcc 3.2.1 resultsSTEPA:\n movl $0, 8(%esp)\n movl $1074266112, 12(%esp) -- 3.0\n movl $0, (%esp)\n movl $1076101120, 4(%esp) -- 10.0\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n fldl .LC0 -- push 123456 onto stack\n -- ST(0)=123456 and ST(1)=pow(10.0, 3.0)\n fdivp %st, %st(1) -- 123456 / pow(10.0, 3.0)\n fstpl -8(%ebp) -- store result in local variable\n -- note: 80 bit ST result narrowed to 64 bits\nSTEPB:\n movl $0, 8(%esp)\n movl $1074266112, 12(%esp) -- 3.0\n movl $0, (%esp)\n movl $1076101120, 4(%esp) -- 10.0\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n fldl .LC0 -- push 123456 onto stack\n -- ST(0)=123456 and ST(1)=pow(10.0, 3.0)\n fdivp %st, %st(1) -- 123456 / pow(10.0, 3.0)\n fldl -8(%ebp) -- push result from STEPA into the FP stack\n -- FP Stack now has:\n -- ST(0) = STEPA result\n -- ST(1) = STEPB result\n -- note that STEPB result was never round to 64bits\n fucompp -- compare ST(0) with ST(1)\n gcc 2.95.4 resultsSTEPA:\n fldl .LC0 -- push 3.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=3.0 off the stack\n fldl .LC1 -- push 10.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=10.0 off the stack\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) ?\n -- 4(esp) 10.0\n -- 0(esp) ?\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n addl $16,%esp -- reclaim stack space\n fstpl -8(%ebp) -- store result in local variable\n -- Conversion to 64 bits here.\n -- but 10^^3 is less likely to have stray bits\n\nSTEPB:\n fldl .LC0 -- push 3.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=3.0 off the stack\n fldl .LC1 -- push 10.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=10.0 off the stack\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) ?\n -- 4(esp) 10.0\n -- 0(esp) ?\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n addl $16,%esp -- reclaim stack space\n\n fldl .LC2 -- push 123456.0 into FPStack ST(0)\n -- ST(0)=123456.0\n -- ST(1)=pow(10.0, 3.0) => From STEPB\n fldl -8(%ebp) -- push pow(10.0, 3.0)=>STEPA into FPStack ST(0)\n -- ST(0)=pow(10.0, 3.0) => From STEPA\n -- ST(1)=123456.0\n -- ST(2)=pow(10.0, 3.0) => From STEPB\n fdivrp %st,%st(1) -- Divide ST(1) / ST(0) and pop 2\n -- 123456.0 / pow(10.0, 3.0)\n -- ST(0)=123456.0/pow(10.0, 3.0) => From STEPA\n -- ST(1)=pow(10.0, 3.0) => From STEPB\n fldl .LC2 -- push 123456.0 into FPStack ST(0)\n -- ST(0)=123456.0\n -- ST(1)=123456.0/pow(10.0, 3.0) => From STEPA\n -- ST(2)=pow(10.0, 3.0) => From STEPB\n fdivp %st,%st(2) -- Divide ST(2) / ST(0)\n -- ST(0)=123456.0/pow(10.0, 3.0) => From STEPB\n -- ST(1)=123456.0/pow(10.0, 3.0) => From STEPA\n fucompp -- compare ST(0) with ST(1)\n
Edited by ChrisR
Jan. 6, 2003, 02:11:10 PM EST
Edited by ChrisR
Jan. 6, 2003, 02:12:59 PM EST
|
Post #72,982
1/6/03 3:08:52 PM
|
Errrr...
Wow. Thanks for the comprehensive answer. :-D
So, the FPU on a Xeon chip is 80 bits?
Anyone aware of a way to force the compiler to use IEEE compliant 64-bit only for compares? I tried a few settings (-ffloat-store and -mieee-fp) but neither seemed to make a difference in the output.
I can get this to work in gcc 3.2.1 by explicitly storing to a double before making the compare, but I have been informed that there are a number of areas in the code that will fail with this behaviour.
On a side note, does anyone know how to view the intermediary ASM from Visual C++ 6? I want to compare that, gcc 2.95, gcc 3.2.1, and gcc 2.95 Solaris (which is where the system currently runs). If I can show that the behavior can be induced on either NT or Solaris, then this ceases to be a Linux-only problem.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,999
1/6/03 4:06:35 PM
|
Getting ASM from VC6
Project/Settings/C-C++ tab/Category dropdown -> select "Listing Files", then one of "Assem. Only", "Assem, Mach Code, Source" etc. in the listing file type dropdown.
-drl
|
Post #73,005
1/6/03 4:25:49 PM
|
Results
Source:
#include <math.h>
void main(void) { (123456 / pow(10.0, 3)) == (123456 / pow(10.0, 3)); return; }
Output:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main EXTRN\t_pow:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 4 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 64\t\t\t\t\t; 00000040H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-64] \tmov\tecx, 16\t\t\t\t\t; 00000010H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 5 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tfstp\tST(0) \tadd\tesp, 16\t\t\t\t\t; 00000010H \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tfstp\tST(0) \tadd\tesp, 16\t\t\t\t\t; 00000010H ; Line 7 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 64\t\t\t\t\t; 00000040H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,008
1/6/03 4:33:29 PM
1/6/03 4:34:54 PM
|
Request...
Could you change that line to: if ((123456 / pow(10.0, 3)) == (123456 / pow(10.0, 3))) {\n printf("hello\\n");\n} As it stands, I can't if it really does a compare. Thanks.
Edited by ChrisR
Jan. 6, 2003, 04:34:54 PM EST
|
Post #73,009
1/6/03 4:36:04 PM
|
Re: Request...
OK:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main PUBLIC\t__real@8@400ff120000000000000 EXTRN\t_pow:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT __real@8@400ff120000000000000 ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp CONST\tSEGMENT __real@8@400ff120000000000000 DQ 040fe240000000000r ; 123456 CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 4 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 72\t\t\t\t\t; 00000048H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-72] \tmov\tecx, 18\t\t\t\t\t; 00000012H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 5 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfstp\tQWORD PTR -8+[ebp] \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfld\tQWORD PTR -8+[ebp] \tfcompp ; Line 7 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 72\t\t\t\t\t; 00000048H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,013
1/6/03 4:54:50 PM
1/6/03 4:57:34 PM
|
Trimming it down...
Just wondering whether it printed the "hello" - i.e. did the equality test as true or false? (Note: see [link|http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/_core_.2f.op.asp|/Op Option] for a description of the settings for VC6). \nSTEPA:\n push 1074266112 -- 3.0\n push 0\n push 1076101120 -- 10.0\n push 0\n -- cpu stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call _pow -- pow(10.0, 3.0)\n -- result left in FPU ST(0)\n add esp, 16 -- reclaim cpu stack space\n fdivr QWORD PTR __real@8@400ff120000000000000 -- 123456.0 / pow(10.0, 3.0) \n fstp QWORD PTR -8+[ebp] -- store result in local variable\n -- not sure if this rounds to 64 bits\n -- QWORD is quad-word - 8 bytes in length\n\nSTEPB:\n push 1074266112 -- 3.0\n push 0\n push 1076101120 -- 10.0\n push 0\n -- cpu stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call _pow -- pow(10.0, 3.0)\n add esp, 16 -- reclaim cpu stack space \n fdivr QWORD PTR __real@8@400ff120000000000000 -- 123456.0 / pow(10.0, 3.0)\n\n fld QWORD PTR -8+[ebp] -- load STEPA result into fpu ST(0)\n -- ST(0) = STEPA result\n -- ST(1) = STEPB result\n fcompp -- compare ST(0) and ST(1)
Edited by ChrisR
Jan. 6, 2003, 04:57:34 PM EST
|
Post #73,015
1/6/03 4:57:55 PM
|
Re: Trimming it down...
Yes, it did the braces.
/Od BTW (no opt).
-drl
|
Post #73,025
1/6/03 6:07:55 PM
|
Scratches head...
Kind of goes against my theory. :-(
From looking at the ASM again, it doesn't look like you put any instructions within the conditional brackets. The ASM does an float compare but it doesn't actually use the result for any branch or jump. Could you try it with a simple print within the conditionals?
|
Post #73,062
1/6/03 8:51:33 PM
|
Re: Scratches head...
Oh sure. It won't really be essentially different, but here goes:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT ??_C@_02ELOP@42?$AA@ CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main PUBLIC\t??_C@_02ELOP@42?$AA@\t\t\t\t; `string' PUBLIC\t__real@8@400ff120000000000000 EXTRN\t_pow:NEAR EXTRN\t_puts:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT ??_C@_02ELOP@42?$AA@ ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp CONST\tSEGMENT ??_C@_02ELOP@42?$AA@ DB '42', 00H\t\t\t; `string' CONST\tENDS ;\tCOMDAT __real@8@400ff120000000000000 CONST\tSEGMENT __real@8@400ff120000000000000 DQ 040fe240000000000r ; 123456 CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 5 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 72\t\t\t\t\t; 00000048H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-72] \tmov\tecx, 18\t\t\t\t\t; 00000012H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 6 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfstp\tQWORD PTR -8+[ebp] \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfcomp\tQWORD PTR -8+[ebp] \tfnstsw\tax \ttest\tah, 64\t\t\t\t\t; 00000040H \tje\tSHORT $L928 \tpush\tOFFSET FLAT:??_C@_02ELOP@42?$AA@\t; `string' \tcall\t_puts \tadd\tesp, 4 $L928: ; Line 8 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 72\t\t\t\t\t; 00000048H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,070
1/6/03 9:27:16 PM
|
Thanks...
...I was just wanting to see the instructions that immediately followed the float compare. In this case it was:
fnstsw ax test ah, 64 je SHORT $L928
Which is strange compared to my experience with the MC68881, where there are direct floating point branch instructions (at least that's the way I remember it).
|
Post #73,071
1/6/03 9:29:58 PM
|
BTW
Did you figure out what I put between the brackets? :)
-drl
|
Post #73,091
1/6/03 10:10:48 PM
|
Hmmmm
Well, I see the instructions: push OFFSET FLAT:??_C@_02ELOP@42?$AA@ ; `string'\ncall _puts And I know that _puts is most likely the printf function, but I can't really decode the constant that's pushed on the stack there. The assembler comment 'string is pretty useless - yes I know it's a string - tell me something I didn't know. But I can't make out the rest of the encryption (something with ELOP in it). :-)
|
Post #73,096
1/6/03 10:21:20 PM
|
puts("42");
-drl
|
Post #73,029
1/6/03 6:13:49 PM
|
gcc 2.95.26
Just as another data point, I tried this version under cygwin and it also fails to give an equality. \n\tfldl LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall _pow\n\taddl $16,%esp\n\tfldl LC2\n\tfdivp %st,%st(1)\n\tfstpl -8(%ebp)\n\tfldl LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall _pow\n\taddl $16,%esp\n\tfldl LC2\n\tfdivp %st,%st(1)\n\tfldl -8(%ebp)\n\tfucompp\n\tfnstsw %ax\n\tandb $68,%ah\n\txorb $64,%ah\n\tjne L3\n
|
Post #72,465
1/4/03 1:12:28 PM
|
Did you try it with optimizations off?
-drl
|
Post #72,470
1/4/03 1:21:38 PM
|
A really good optimizing compiler...
...would have seen that the expression could be completely evaluated at compile time and eliminated any runtime evaluation.
My personal suspicion is that there's an intermediate form of optimization where the right hand side is evaluated in the FP chip (which uses 12 byte representations) and then returns the result to the stack as a double (8 bytes in length). Then it evaluates the right hand side in the FP chip (as a 12 byte result) but does not convert it back to double format (8 bytes) before it does the FCompare. In other words, the right side is converted to and from double to the double long format, but the left hand side stays as a double long.
Would have to see the assembly to see if that's what's happening. I can't seem to emulate here though.
|
Post #72,476
1/4/03 1:35:10 PM
|
Yes..
..I was thinking along those lines. That's why I suggested to Scott that he append an L to the integer in his division.
Still, it's an interesting thing he's hit on.
-drl
|
Post #72,686
1/5/03 2:13:15 PM
|
Compiler has no way to predict how pow() is implemented
For all it knows, it can have side effects.
--
We have only 2 things to worry about: That things will never get back to normal, and that they already have.
|
Post #72,689
1/5/03 2:23:03 PM
|
IEEE
has gone far toward eliminating this type of issue. The hardware has to be compliant - that's not the responsibility of the compiler.
-drl
|
Post #72,774
1/5/03 7:14:41 PM
|
pow() is library, not hardware
it may be hardware on some architectures, but it's not standard.
--
We have only 2 things to worry about: That things will never get back to normal, and that they already have.
|
Post #72,783
1/5/03 7:32:58 PM
1/5/03 7:38:57 PM
|
True
..but that's not the point - it's standardization of representation, and denormalization.
-drl
Edited by deSitter
Jan. 5, 2003, 07:38:57 PM EST
|
Post #72,810
1/5/03 8:57:17 PM
|
Re: Compiler has no way to predict how pow() is implemented
Was poking around in my numerics code, and found the code from the Cephes math library below (attribution included) to illustrate the point I wanted to make - that pow(x,y) is exp(y log x). Note that FORTRAN has an exponentiation operator ** and that it works differently for small integers as exponents - so there would be alternate entry points.
/*\t\t\t\t\t\t\tpowf.c * *\tPower function * * * * SYNOPSIS: * * float x, y, z, powf(); * * z = powf( x, y ); * * * * DESCRIPTION: * * Computes x raised to the yth power. Analytically, * * x**y = exp( y log(x) ). * * Following Cody and Waite, this program uses a lookup table * of 2**-i/16 and pseudo extended precision arithmetic to * obtain an extra three bits of accuracy in both the logarithm * and the exponential. * * * * ACCURACY: * * Relative error: * arithmetic domain # trials peak rms * IEEE -10,10 100,000 1.4e-7 3.6e-8 * 1/10 < x < 10, x uniformly distributed. * -10 < y < 10, y uniformly distributed. * * * ERROR MESSAGES: * * message condition value returned * powf overflow x**y > MAXNUMF MAXNUMF * powf underflow x**y < 1/MAXNUMF 0.0 * powf domain x<0 and y noninteger 0.0 * */ \x0c /* Cephes Math Library Release 2.2: June, 1992 Copyright 1984, 1987, 1988 by Stephen L. Moshier Direct inquiries to 30 Frost Street, Cambridge, MA 02140 */
#include "mconf.h" static char fname[] = {"powf"};
/* 2^(-i/16) * The decimal values are rounded to 24-bit precision */ static float A[] = { 1.00000000000000000000E0, 9.57603275775909423828125E-1, 9.17004048824310302734375E-1, 8.78126084804534912109375E-1, 8.40896427631378173828125E-1, 8.05245161056518554687500E-1, 7.71105408668518066406250E-1, 7.38413095474243164062500E-1, 7.07106769084930419921875E-1, 6.77127778530120849609375E-1, 6.48419797420501708984375E-1, 6.20928883552551269531250E-1, 5.94603538513183593750000E-1, 5.69394290447235107421875E-1, 5.45253872871398925781250E-1, 5.22136867046356201171875E-1, 5.00000000000000000000E-1 }; /* continuation, for even i only * 2^(i/16) = A[i] + B[i/2] */ static float B[] = { 0.00000000000000000000E0, -5.61963907099083340520586E-9, -1.23776636307969995237668E-8, 4.03545234539989593104537E-9, 1.21016171044789693621048E-8, -2.00949968760174979411038E-8, 1.89881769396087499852802E-8, -6.53877009617774467211965E-9, 0.00000000000000000000E0 };
/* 1 / A[i] * The decimal values are full precision */ static float Ainv[] = { 1.00000000000000000000000E0, 1.04427378242741384032197E0, 1.09050773266525765920701E0, 1.13878863475669165370383E0, 1.18920711500272106671750E0, 1.24185781207348404859368E0, 1.29683955465100966593375E0, 1.35425554693689272829801E0, 1.41421356237309504880169E0, 1.47682614593949931138691E0, 1.54221082540794082361229E0, 1.61049033194925430817952E0, 1.68179283050742908606225E0, 1.75625216037329948311216E0, 1.83400808640934246348708E0, 1.91520656139714729387261E0, 2.00000000000000000000000E0 };
#ifdef DEC #define MEXP 2032.0 #define MNEXP -2032.0 #else #define MEXP 2048.0 #define MNEXP -2400.0 #endif
/* log2(e) - 1 */ #define LOG2EA 0.44269504088896340736F extern float MAXNUMF;
#define F W #define Fa Wa #define Fb Wb #define G W #define Ga Wa #define Gb u #define H W #define Ha Wb #define Hb Wb
#ifdef ANSIC float floorf( float ); float frexpf( float, int *); float ldexpf( float, int ); float powif( float, int ); #else float floorf(), frexpf(), ldexpf(), powif(); #endif
/* Find a multiple of 1/16 that is within 1/16 of x. */ #define reduc(x) 0.0625 * floorf( 16 * (x) )
#ifdef ANSIC float powf( float x, float y ) #else float powf( x, y ) float x, y; #endif { float u, w, z, W, Wa, Wb, ya, yb; /* float F, Fa, Fb, G, Ga, Gb, H, Ha, Hb */ int e, i, nflg;
nflg = 0;\t/* flag = 1 if x<0 raised to integer power */ w = floorf(y); if( w < 0 ) \tz = -w; else \tz = w; if( (w == y) && (z < 32768.0) ) \t{ \ti = w; \tw = powif( x, i ); \treturn( w ); \t}
if( x <= 0.0F ) \t{ \tif( x == 0.0 ) \t\t{ \t\tif( y == 0.0 ) \t\t\treturn( 1.0 ); /* 0**0 */ \t\telse \t\t\treturn( 0.0 ); /* 0**y */ \t\t} \telse \t\t{ \t\tif( w != y ) \t\t\t{ /* noninteger power of negative number */ \t\t\tmtherr( fname, DOMAIN ); \t\t\treturn(0.0); \t\t\t} \t\tnflg = 1; \t\tif( x < 0 ) \t\t\tx = -x; \t\t} \t}
/* separate significand from exponent */ x = frexpf( x, &e );
/* find significand in antilog table A[] */ i = 1; if( x <= A[9] ) \ti = 9; if( x <= A[i+4] ) \ti += 4; if( x <= A[i+2] ) \ti += 2; if( x >= A[1] ) \ti = -1; i += 1;
/* Find (x - A[i])/A[i] * in order to compute log(x/A[i]): * * log(x) = log( a x/a ) = log(a) + log(x/a) * * log(x/a) = log(1+v), v = x/a - 1 = (x-a)/a */ x -= A[i]; x -= B[ i >> 1 ]; x *= Ainv[i];
/* rational approximation for log(1+v): * * log(1+v) = v - 0.5 v^2 + v^3 P(v) * Theoretical relative error of the approximation is 3.5e-11 * on the interval 2^(1/16) - 1 > v > 2^(-1/16) - 1 */ z = x*x; w = (((-0.1663883081054895 * x + 0.2003770364206271) * x - 0.2500006373383951) * x + 0.3333331095506474) * x * z; w -= 0.5 * z;
/* Convert to base 2 logarithm: * multiply by log2(e) */ w = w + LOG2EA * w; /* Note x was not yet added in * to above rational approximation, * so do it now, while multiplying * by log2(e). */ z = w + LOG2EA * x; z = z + x;
/* Compute exponent term of the base 2 logarithm. */ w = -i; w *= 0.0625; /* divide by 16 */ w += e; /* Now base 2 log of x is w + z. */
/* Multiply base 2 log by y, in extended precision. */
/* separate y into large part ya * and small part yb less than 1/16 */ ya = reduc(y); yb = y - ya;
F = z * y + w * yb; Fa = reduc(F); Fb = F - Fa;
G = Fa + w * ya; Ga = reduc(G); Gb = G - Ga;
H = Fb + Gb; Ha = reduc(H); w = 16 * (Ga + Ha);
/* Test the power of 2 for overflow */ if( w > MEXP ) \t{ \tmtherr( fname, OVERFLOW ); \treturn( MAXNUMF ); \t}
if( w < MNEXP ) \t{ \tmtherr( fname, UNDERFLOW ); \treturn( 0.0 ); \t}
e = w; Hb = H - Ha;
if( Hb > 0.0 ) \t{ \te += 1; \tHb -= 0.0625; \t}
/* Now the product y * log2(x) = Hb + e/16.0. * * Compute base 2 exponential of Hb, * where -0.0625 <= Hb <= 0. * Theoretical relative error of the approximation is 2.8e-12. */ /* z = 2**Hb - 1 */ z = ((( 9.416993633606397E-003 * Hb + 5.549356188719141E-002) * Hb + 2.402262883964191E-001) * Hb + 6.931471791490764E-001) * Hb;
/* Express e/16 as an integer plus a negative number of 16ths. * Find lookup table entry for the fractional power of 2. */ if( e < 0 ) \ti = -( -e >> 4 ); else \ti = (e >> 4) + 1; e = (i << 4) - e; w = A[e]; z = w + w * z; /* 2**-e * ( 1 + (2**Hb-1) ) */ z = ldexpf( z, i ); /* multiply by integer power of 2 */
if( nflg ) \t{ /* For negative x, * find out if the integer exponent * is odd or even. */ \tw = 2 * floorf( (float) 0.5 * w ); \tif( w != y ) \t\tz = -z; /* odd exponent */ \t}
return( z ); }
-drl
|
Post #72,466
1/4/03 1:13:29 PM
|
An option of interest?
The gcc manual has the following options described:
-mieee-fp -mno-ieee-fp Control whether or not the compiler uses IEEE floating point comparisons. These handle correctly the case where the result of a comparison is unordered.
|
Post #72,539
1/4/03 5:56:33 PM
|
I'll look into that. Datum: gcc 3.2.1 on glibc 2.3.1
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,882
1/6/03 10:12:59 AM
|
Doesn't seem to make any difference.
Another oddity:
double y = 0.0; y = pow(10.0, 3); if ((123456 / y) == (123456 / 7)) { printf("works\\n"); }
prints 'works'.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,927
1/6/03 11:36:57 AM
|
Re: Doesn't seem to make any difference.
This has to be gcc 3.x in interaction with 2.95.x horkage.
Will gcc 3 ever be ready?
-drl
|