[Info-vax] VMS Cobol - GnuCOBOL

Dan Cross cross at spitfire.i.gajendra.net
Wed Mar 1 09:54:26 EST 2023


In article <ttnk0m$lg5$2 at news.misty.com>,
Johnny Billquist  <bqt at softjar.se> wrote:
>On 2023-02-28 20:11, Dan Cross wrote:
>> In article <k66d38Ftd0kU10 at mid.individual.net>,
>> bill  <bill.gunshannon at gmail.com> wrote:
>>>> Of course, BUT I feel other languages have fewer places where undefined
>>>> behaviour can occur, and so the problem is not so severe.
>>>>
>>>
>>> And then you have the case (it's been a long while but at one time
>>> was very common in my experience) where turning off optimization
>>> "fixes" the problem.
>> 
>> Indeed; this is quite common.  In the context of C and C++, this
>> is often a result of misunderstanding how "undefined behavior"
>> works.  What UB actually means is that the language _imposes no
>> requirements_ when such behavior is detected (C11, sec 3.4.3).
>> So the compiler is free to implement any behavior it chooses.
>> 
>> The issue is that compiler writers have begun taking aggressive
>> advantage of this to use UB to push occasionally surprising
>> optimizations.
>> 
>> Consider, for example, this segment of code:
>> 
>> unsigned short
>> mul(unsigned short a, unsigned short b)
>> {
>> 	return a * b;
>> }
>> 
>> Is this always well-defined?  Well, no, depending on the
>> platform.  But why?  It looks innocuous enough.
>> 
>> To understand this, consider a platform with 32-bit ints and
>> 16-bit shorts.  According to pretty much every version of the C
>> standard, before the multiplication is performed, the operands
>> will be subject to the, "usual arithmetic conversions" (C11 sec
>> 6.3.1.8).  In this case, the `unsigned short` operands have
>> lesser "rank" than `int`, and an `int` can represent the full
>> range of values representable in `unsigned short`, so the
>> "integer promotions" apply and the operands will be converted to
>> type `int` (C11 sec 6.3.1.1, para 2).  The multiplication will
>> then be performed using _signed_ integer arithemtic.  But note
>> that there exist unsigned 16-bit short values 'c' and 'd' such
>> that 'c * d' overflows a signed 32-bit int, and non-atomic
>> _signed_ integer overflow is undefined behavior in C (C11 sec
>> 6.5 para 5; in this case, the product is not in the range of
>> representable values for type `int`).
>> 
>> The result is that the compiler is free to do whatever it wants
>> here.  In practical terms, most compilers will produce code that
>> behaves as expected, but if a compiler decided to, for whatever
>> reason, emitting (say) a saturating multiplication instruction
>> instead of a a normal MUL, it would be entirely within its
>> rights to do so.  Caveat emptor.
>> 
>> An issue with C code in particular is that it's almost
>> impossible to write a non-trivial program that doesn't have
>> _some_ UB in it, often hidden in ways that are not obvious to
>> the programmer (e.g., inside of a macro perhaps).  So as
>> compilers evolve and become more aggressive, code that seems to
>> have worked for _years_ all of a sudden seems to spontaneously
>> break.  Again, caveat emptor.
>> 
>> So yeah.  Often turning off the optimizer appears to "fix"
>> programs.  Heisenbugs indeed!
>
>You overcomplicated the whole explanation.

Hmm.

>Basically, if you multiply two numbers, and the result would be bigger 
>than the types involved, the result is undefined.

Well...no.  _unsigned_ integer overflow in C is well-defined (it
has modular wrapping semantics; C11 sec 6.2.5 para 9).
Similarly, overflow of signed atomics is well-defined (C11 sec
7.17.7.5 para 3), so this is not always true.

>In this case, the type 
>is unsigned short. If the multiplication cause anything that don't fit 
>into an unsigned short, then the result is undefined.
>
>And it's fairly easy to find two unsigned short numbers that when 
>multiplied will give a result that don't fit into an unsigned short.

The range of unsigned short has little to do with it, and
truncation of the result is fine too (again, defined to use
modular wrapping for unsigned types; C11 sec 6.3.1.3).  The
problem is entirely due to the promotion to _signed_ int prior
to the multiplication.  The fix, incidentally, is easy:

unsigned short
mul(unsigned short a, unsigned short b)
{
	unsigned int aa = a, bb = b;
	return aa * bb;
}

This is well-defined, even if the range of `unsigned short` is
the same as `unsigned int`, which is permitted by the standard.

>It's really not that different from any other language. Some languages 
>will throw an exception, others will just give some result that might or 
>might not be meaningful. But there is no guarantee in almost any 
>language of some kind of specific meaningful result. Try it in FORTRAN 
>and see what you get there, or Cobol (or BASIC). :-)

The issue, and the reason for the complex initial explanation,
is the subtle interaction between the implicit type promotion
rules, arithmetic, undefined behavior, and the freedom that UB
gives to compiler writers, which is pretty unique to C.  Hence
why optimizing compilers often _appear_ to introduce bugs when
in fact they're performing perfectly legal transformations, and
turning off the optimizer can appear to "fix" the problem.

	- Dan C.




More information about the Info-vax mailing list