Post #72,913
1/6/03 11:06:04 AM
|
gcc 2.95.4 results
\n.LM3:\n\taddl $-8,%esp\n\tfldl .LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl .LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n.LCFI3:\n\tcall pow\n\taddl $16,%esp\n\tfstpl -8(%ebp)\n\tfldl .LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl .LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall pow\n\taddl $16,%esp\n\tfldl .LC2\n\tfldl -8(%ebp)\n\tfdivrp %st,%st(1)\n\tfldl .LC2\n\tfdivp %st,%st(2)\n\tfucompp\n Being a Bear Of Little Clue when it comes to interpreting x86 assembler, I'm not quite sure what the above means... ;-) However, the comparison does work. :-P
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,925
1/6/03 11:35:48 AM
|
Re: gcc 2.95.4 results
ESP is the 32-bit stack pointer. Instructions that start with F are FPU operations. EBP is the 32-bit "base pointer" for 32-bit memory operations.
Unfortunately, the business end of things is missing, the call to "pow". That will almost certainly be an implementation of 2^(y log2 x) as mentined in the code I posted.
-drl
|
Post #72,950
1/6/03 1:18:01 PM
|
EBP is base of stack frame
EBP(-x) is a local variable, EBP(+x) is a function parameter. EBP(+4) (or may be EBP(0)), I suspect, is the function's return address.
--
We have only 2 things to worry about: That things will never get back to normal, and that they already have.
|
Post #72,941
1/6/03 12:30:53 PM
|
I86 Assembly is not my specialty...
...having done much more with the MC680x0 and MC6888x combinations. The considerations are similar, but I find x86 assembly to be harder to read and write. (Bottom line is I can guess, but it takes someone that is more up to speed on Assembly to spot the exact problem - I haven't done ASM for money in almost 10 years).
Ok, the difference between my original code and both your samples is that my code does the compare in the cpu not in the fpu. The difference between your code that returns equality and the one that doesn't is the order of the divide operations (fdivp). In the first code, the result of the left side is calculated all the way through the divide before the intermediate result is placed on the stack. Means the rounding occurs after the divide operation.
In the second case, the divides are saved as the last operation (kind of interspersing the calculation of the left and right hand sides). Means that there is some initial rounding which may occur but it is prior to the divide. Once division starts taking place, everything is done internal to the FPU.
Don't have enuf expertise to figure it out totally. It is surprising that the code has compiled out so differently - the newest code looks much cleaner (though it returns what appears to be an errant result).
|
Post #72,965
1/6/03 2:05:28 PM
1/6/03 2:12:59 PM
|
ASM Comments
Note: I've marked in red where I see the mismatch in precision being introduced. Although the 2.95 does test equal, I would speculate that if you placed some different numbers in the power function, it may also cause a failure in the equality (due to the fact that the result of the first power function is truncated to 64 bits, whereas the second one is carried on through the 80 bit FP register). ggcc 3.2.1 resultsSTEPA:\n movl $0, 8(%esp)\n movl $1074266112, 12(%esp) -- 3.0\n movl $0, (%esp)\n movl $1076101120, 4(%esp) -- 10.0\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n fldl .LC0 -- push 123456 onto stack\n -- ST(0)=123456 and ST(1)=pow(10.0, 3.0)\n fdivp %st, %st(1) -- 123456 / pow(10.0, 3.0)\n fstpl -8(%ebp) -- store result in local variable\n -- note: 80 bit ST result narrowed to 64 bits\nSTEPB:\n movl $0, 8(%esp)\n movl $1074266112, 12(%esp) -- 3.0\n movl $0, (%esp)\n movl $1076101120, 4(%esp) -- 10.0\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n fldl .LC0 -- push 123456 onto stack\n -- ST(0)=123456 and ST(1)=pow(10.0, 3.0)\n fdivp %st, %st(1) -- 123456 / pow(10.0, 3.0)\n fldl -8(%ebp) -- push result from STEPA into the FP stack\n -- FP Stack now has:\n -- ST(0) = STEPA result\n -- ST(1) = STEPB result\n -- note that STEPB result was never round to 64bits\n fucompp -- compare ST(0) with ST(1)\n gcc 2.95.4 resultsSTEPA:\n fldl .LC0 -- push 3.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=3.0 off the stack\n fldl .LC1 -- push 10.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=10.0 off the stack\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) ?\n -- 4(esp) 10.0\n -- 0(esp) ?\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n addl $16,%esp -- reclaim stack space\n fstpl -8(%ebp) -- store result in local variable\n -- Conversion to 64 bits here.\n -- but 10^^3 is less likely to have stray bits\n\nSTEPB:\n fldl .LC0 -- push 3.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=3.0 off the stack\n fldl .LC1 -- push 10.0 into FPStack ST(0)\n subl $8,%esp -- reserve another 8 bytes on the stack\n fstpl (%esp) -- pull ST(0)=10.0 off the stack\n -- stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) ?\n -- 4(esp) 10.0\n -- 0(esp) ?\n call pow -- pow(10.0, 3.0)\n -- result left in ST(0)\n addl $16,%esp -- reclaim stack space\n\n fldl .LC2 -- push 123456.0 into FPStack ST(0)\n -- ST(0)=123456.0\n -- ST(1)=pow(10.0, 3.0) => From STEPB\n fldl -8(%ebp) -- push pow(10.0, 3.0)=>STEPA into FPStack ST(0)\n -- ST(0)=pow(10.0, 3.0) => From STEPA\n -- ST(1)=123456.0\n -- ST(2)=pow(10.0, 3.0) => From STEPB\n fdivrp %st,%st(1) -- Divide ST(1) / ST(0) and pop 2\n -- 123456.0 / pow(10.0, 3.0)\n -- ST(0)=123456.0/pow(10.0, 3.0) => From STEPA\n -- ST(1)=pow(10.0, 3.0) => From STEPB\n fldl .LC2 -- push 123456.0 into FPStack ST(0)\n -- ST(0)=123456.0\n -- ST(1)=123456.0/pow(10.0, 3.0) => From STEPA\n -- ST(2)=pow(10.0, 3.0) => From STEPB\n fdivp %st,%st(2) -- Divide ST(2) / ST(0)\n -- ST(0)=123456.0/pow(10.0, 3.0) => From STEPB\n -- ST(1)=123456.0/pow(10.0, 3.0) => From STEPA\n fucompp -- compare ST(0) with ST(1)\n
Edited by ChrisR
Jan. 6, 2003, 02:11:10 PM EST
Edited by ChrisR
Jan. 6, 2003, 02:12:59 PM EST
|
Post #72,982
1/6/03 3:08:52 PM
|
Errrr...
Wow. Thanks for the comprehensive answer. :-D
So, the FPU on a Xeon chip is 80 bits?
Anyone aware of a way to force the compiler to use IEEE compliant 64-bit only for compares? I tried a few settings (-ffloat-store and -mieee-fp) but neither seemed to make a difference in the output.
I can get this to work in gcc 3.2.1 by explicitly storing to a double before making the compare, but I have been informed that there are a number of areas in the code that will fail with this behaviour.
On a side note, does anyone know how to view the intermediary ASM from Visual C++ 6? I want to compare that, gcc 2.95, gcc 3.2.1, and gcc 2.95 Solaris (which is where the system currently runs). If I can show that the behavior can be induced on either NT or Solaris, then this ceases to be a Linux-only problem.
Regards,
-scott anderson
"Welcome to Rivendell, Mr. Anderson..."
|
Post #72,999
1/6/03 4:06:35 PM
|
Getting ASM from VC6
Project/Settings/C-C++ tab/Category dropdown -> select "Listing Files", then one of "Assem. Only", "Assem, Mach Code, Source" etc. in the listing file type dropdown.
-drl
|
Post #73,005
1/6/03 4:25:49 PM
|
Results
Source:
#include <math.h>
void main(void) { (123456 / pow(10.0, 3)) == (123456 / pow(10.0, 3)); return; }
Output:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main EXTRN\t_pow:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 4 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 64\t\t\t\t\t; 00000040H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-64] \tmov\tecx, 16\t\t\t\t\t; 00000010H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 5 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tfstp\tST(0) \tadd\tesp, 16\t\t\t\t\t; 00000010H \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tfstp\tST(0) \tadd\tesp, 16\t\t\t\t\t; 00000010H ; Line 7 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 64\t\t\t\t\t; 00000040H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,008
1/6/03 4:33:29 PM
1/6/03 4:34:54 PM
|
Request...
Could you change that line to: if ((123456 / pow(10.0, 3)) == (123456 / pow(10.0, 3))) {\n printf("hello\\n");\n} As it stands, I can't if it really does a compare. Thanks.
Edited by ChrisR
Jan. 6, 2003, 04:34:54 PM EST
|
Post #73,009
1/6/03 4:36:04 PM
|
Re: Request...
OK:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main PUBLIC\t__real@8@400ff120000000000000 EXTRN\t_pow:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT __real@8@400ff120000000000000 ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp CONST\tSEGMENT __real@8@400ff120000000000000 DQ 040fe240000000000r ; 123456 CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 4 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 72\t\t\t\t\t; 00000048H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-72] \tmov\tecx, 18\t\t\t\t\t; 00000012H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 5 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfstp\tQWORD PTR -8+[ebp] \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfld\tQWORD PTR -8+[ebp] \tfcompp ; Line 7 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 72\t\t\t\t\t; 00000048H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,013
1/6/03 4:54:50 PM
1/6/03 4:57:34 PM
|
Trimming it down...
Just wondering whether it printed the "hello" - i.e. did the equality test as true or false? (Note: see [link|http://msdn.microsoft.com/library/default.asp?url=/library/en-us/vccore/html/_core_.2f.op.asp|/Op Option] for a description of the settings for VC6). \nSTEPA:\n push 1074266112 -- 3.0\n push 0\n push 1076101120 -- 10.0\n push 0\n -- cpu stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call _pow -- pow(10.0, 3.0)\n -- result left in FPU ST(0)\n add esp, 16 -- reclaim cpu stack space\n fdivr QWORD PTR __real@8@400ff120000000000000 -- 123456.0 / pow(10.0, 3.0) \n fstp QWORD PTR -8+[ebp] -- store result in local variable\n -- not sure if this rounds to 64 bits\n -- QWORD is quad-word - 8 bytes in length\n\nSTEPB:\n push 1074266112 -- 3.0\n push 0\n push 1076101120 -- 10.0\n push 0\n -- cpu stack should look something like\n -- 12(esp) 3.0\n -- 8(esp) 0\n -- 4(esp) 10.0\n -- 0(esp) 0\n call _pow -- pow(10.0, 3.0)\n add esp, 16 -- reclaim cpu stack space \n fdivr QWORD PTR __real@8@400ff120000000000000 -- 123456.0 / pow(10.0, 3.0)\n\n fld QWORD PTR -8+[ebp] -- load STEPA result into fpu ST(0)\n -- ST(0) = STEPA result\n -- ST(1) = STEPB result\n fcompp -- compare ST(0) and ST(1)
Edited by ChrisR
Jan. 6, 2003, 04:57:34 PM EST
|
Post #73,015
1/6/03 4:57:55 PM
|
Re: Trimming it down...
Yes, it did the braces.
/Od BTW (no opt).
-drl
|
Post #73,025
1/6/03 6:07:55 PM
|
Scratches head...
Kind of goes against my theory. :-(
From looking at the ASM again, it doesn't look like you put any instructions within the conditional brackets. The ASM does an float compare but it doesn't actually use the result for any branch or jump. Could you try it with a simple print within the conditionals?
|
Post #73,062
1/6/03 8:51:33 PM
|
Re: Scratches head...
Oh sure. It won't really be essentially different, but here goes:
\tTITLE\tC:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp \t.386P include listing.inc if @Version gt 510 .model FLAT else _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS _DATA\tSEGMENT DWORD USE32 PUBLIC 'DATA' _DATA\tENDS CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS _BSS\tSEGMENT DWORD USE32 PUBLIC 'BSS' _BSS\tENDS $$SYMBOLS\tSEGMENT BYTE USE32 'DEBSYM' $$SYMBOLS\tENDS $$TYPES\tSEGMENT BYTE USE32 'DEBTYP' $$TYPES\tENDS _TLS\tSEGMENT DWORD USE32 PUBLIC 'TLS' _TLS\tENDS ;\tCOMDAT ??_C@_02ELOP@42?$AA@ CONST\tSEGMENT DWORD USE32 PUBLIC 'CONST' CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT PARA USE32 PUBLIC 'CODE' _TEXT\tENDS FLAT\tGROUP _DATA, CONST, _BSS \tASSUME\tCS: FLAT, DS: FLAT, SS: FLAT endif PUBLIC\t_main PUBLIC\t??_C@_02ELOP@42?$AA@\t\t\t\t; `string' PUBLIC\t__real@8@400ff120000000000000 EXTRN\t_pow:NEAR EXTRN\t_puts:NEAR EXTRN\t__chkesp:NEAR EXTRN\t__fltused:NEAR ;\tCOMDAT ??_C@_02ELOP@42?$AA@ ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp CONST\tSEGMENT ??_C@_02ELOP@42?$AA@ DB '42', 00H\t\t\t; `string' CONST\tENDS ;\tCOMDAT __real@8@400ff120000000000000 CONST\tSEGMENT __real@8@400ff120000000000000 DQ 040fe240000000000r ; 123456 CONST\tENDS ;\tCOMDAT _main _TEXT\tSEGMENT _main\tPROC NEAR\t\t\t\t\t; COMDAT ; File C:\\My Documents\\Visual Studio Projects\\Junk1\\Junk.cpp ; Line 5 \tpush\tebp \tmov\tebp, esp \tsub\tesp, 72\t\t\t\t\t; 00000048H \tpush\tebx \tpush\tesi \tpush\tedi \tlea\tedi, DWORD PTR [ebp-72] \tmov\tecx, 18\t\t\t\t\t; 00000012H \tmov\teax, -858993460\t\t\t\t; ccccccccH \trep stosd ; Line 6 \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfstp\tQWORD PTR -8+[ebp] \tpush\t1074266112\t\t\t\t; 40080000H \tpush\t0 \tpush\t1076101120\t\t\t\t; 40240000H \tpush\t0 \tcall\t_pow \tadd\tesp, 16\t\t\t\t\t; 00000010H \tfdivr\tQWORD PTR __real@8@400ff120000000000000 \tfcomp\tQWORD PTR -8+[ebp] \tfnstsw\tax \ttest\tah, 64\t\t\t\t\t; 00000040H \tje\tSHORT $L928 \tpush\tOFFSET FLAT:??_C@_02ELOP@42?$AA@\t; `string' \tcall\t_puts \tadd\tesp, 4 $L928: ; Line 8 \tpop\tedi \tpop\tesi \tpop\tebx \tadd\tesp, 72\t\t\t\t\t; 00000048H \tcmp\tebp, esp \tcall\t__chkesp \tmov\tesp, ebp \tpop\tebp \tret\t0 _main\tENDP _TEXT\tENDS END
-drl
|
Post #73,070
1/6/03 9:27:16 PM
|
Thanks...
...I was just wanting to see the instructions that immediately followed the float compare. In this case it was:
fnstsw ax test ah, 64 je SHORT $L928
Which is strange compared to my experience with the MC68881, where there are direct floating point branch instructions (at least that's the way I remember it).
|
Post #73,071
1/6/03 9:29:58 PM
|
BTW
Did you figure out what I put between the brackets? :)
-drl
|
Post #73,091
1/6/03 10:10:48 PM
|
Hmmmm
Well, I see the instructions: push OFFSET FLAT:??_C@_02ELOP@42?$AA@ ; `string'\ncall _puts And I know that _puts is most likely the printf function, but I can't really decode the constant that's pushed on the stack there. The assembler comment 'string is pretty useless - yes I know it's a string - tell me something I didn't know. But I can't make out the rest of the encryption (something with ELOP in it). :-)
|
Post #73,096
1/6/03 10:21:20 PM
|
puts("42");
-drl
|
Post #73,029
1/6/03 6:13:49 PM
|
gcc 2.95.26
Just as another data point, I tried this version under cygwin and it also fails to give an equality. \n\tfldl LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall _pow\n\taddl $16,%esp\n\tfldl LC2\n\tfdivp %st,%st(1)\n\tfstpl -8(%ebp)\n\tfldl LC0\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tfldl LC1\n\tsubl $8,%esp\n\tfstpl (%esp)\n\tcall _pow\n\taddl $16,%esp\n\tfldl LC2\n\tfdivp %st,%st(1)\n\tfldl -8(%ebp)\n\tfucompp\n\tfnstsw %ax\n\tandb $68,%ah\n\txorb $64,%ah\n\tjne L3\n
|