may_alias (thread separated from another one)

DSP, Plug-in and Host development discussion.
stratum
KVRAF
2242 posts since 29 May, 2012

Post Tue Feb 19, 2019 12:58 pm

https://www.ibm.com/support/knowledgece ... alias.html

If you look at this code (from the link above), the fact that an optimizer is allowed to make such modifications looks pretty annoying, and honestly, looks like a result of wishful thinking regarding the evolution of c++ language and what it eventually may become.

Code: Select all

#define __attribute__(x)     // Invalidates all __attribute__ declarations
typedef long __attribute__((__may_alias__)) t_long;

int main (void){
  int i = 42;
  t_long *pa = (t_long *) &i;
  *pa = 0;
  if (i == 42)
    return 1;
  return 0; 
} 
What the standard says about the subject is besides the point, the fact is that the variable i is being modified, and if the compilers cannot practically detect it, then as far as I'm concerned both the compiler and the standard itself are buggy. The reason is that this is not some crazy code in which a const variable is being cast to non-const carelessly and then modified, such type casts are very common and intentional, and they have been working correcly for decades until the "language lawyers" (for the lack of a better term) working for the c++ standards committee decided that they are no longer needed to work the way they used to work.

What do you think?
~stratum~

stratum
KVRAF
2242 posts since 29 May, 2012

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 1:43 pm

Here is another crazy rule:
From:
https://www.ibm.com/support/knowledgece ... talias.htm
In C/C++ 11, the typeless memory returned by malloc etc. receives the "effective type" of the first access to it. For example, if the address returned by malloc is cast to an int* pointer and that is dereferenced to store an initial value, then that memory's effective type becomes int, and only pointers compatible with int can be used to access it.
Looks like the whole C++ standards committee needs to be fired for making unrealistic assumptions and be freed to design another language to follow their pipe dreams and leave C++ alone.
~stratum~

Z1202
KVRian
1023 posts since 12 Apr, 2002

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 2:39 pm

Well, that obviously depends on the goals that the C++ committee is pursuing (if there is such thing as a consistent set of goals, I just don't know). However I would agree that the C++ standard strongly disregards the needs of the low-level programming, for which the language pretends to offer features, but upon a detailed look seems to do everything to make the low-level options as inaccessible, cumbersome and unreliable as possible. Seems that despite the fundamental part of the language being oriented at low-level programming, the standard tried to make a high-level language out of it and make the low-level features next to unusable.

I that regard I would happily abandon C++ and jump on a better alternative, as soon as one becomes available. Rust seemed like a potentially good candidate, although I didn't take a good look at that language, esp. at its latest versions.

Edit: I'd say that might have began with the C++ style convention of putting the star next to the type rather than next to the identifier, which clearly and unambiguously contradicts the language's syntax (inherited from C, where, at least in the Kernighan-Ritchie convention the star is attached to an identifier, consistently with the syntax). I hope this remark won't start a flame war.

Z1202
KVRian
1023 posts since 12 Apr, 2002

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 2:56 pm

Another problem, being high-level or not, is that the C++ standard, rather than being a formal expression of what is already intuitively expected from the language, is a thick manual (1000 pages?) listing tons of unintuitive exceptions, which, in principle, a C++ developer would need to know by heart in order to have control of their own code. This is getting completely unpractical.

And, of course, undefined behavior where one usually would expect platform-specific behavior... (where the former is absolutely not the same as the latter, despite that one initially might think so). But that's more for the previous post (regarding low-level features).

stratum
KVRAF
2242 posts since 29 May, 2012

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:01 pm

Who would use C++ if it is no longer useful for low level programming, I don't know. There are already a large number of high level languages that are totally useless for accessing low level facts about the hardware. C++ used to be the high level assembler, that's its sole point of existence.

I used to think that unexpected optimizer behavior is either a compiler bug that is revealed infrequently, or a bug in my own code that is difficult to find among thousands of lines of code.

Now I'm not so sure anymore, as C++ is no longer the language I have been using for about two decades. It looks like there is a probability that the code that is written in the past now has a different meaning, and this is not just about the code you have written yourself but also about any open source library you might be using. Even Boost (which is supposedly 'expertly written' and is as portable as possible) gives warnings about the fact that "this particular version was not tested with this specific compiler, please run all the unit tests and report the result" during compile time.
Last edited by stratum on Tue Feb 19, 2019 3:03 pm, edited 2 times in total.
~stratum~

User avatar
Max M.
KVRist
321 posts since 20 Apr, 2005 from Moscow, Russian Federation

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:02 pm

the whole C++ standards committee needs to be fired for making unrealistic assumptions and be freed

This all is for the sake of performance (specifically mostly allowing a compiler to make certain assumptions to generate as efficient code as possible and in the same time retain wide platform portability). It never was really a problem to eradicate all these aliasing problems and UBs by making them "fully defined" via dictating a compiler what exactly to do in this or that case regardless of how awful the predefined behavior may end at the target architecture.

Well, there're so many readings crafted since adoption of the first C and C++ standards for you to find out why they decide something this or that way. (I'm afraid to fail if I'll try to summarize this ton of papers in a single post).
They are not ideal, but it's a bit naive to decide you know what exactly is "buggy" and what is not via considering a single synthetic streamlined snippet while ignoring all the rest of zillion complex use-cases it potentially affects.
(Notice that anyone can propose his changes to the C++ committee, so if you're brave enough: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/. The Committee itself is more about managing in first place rather than deciding (though obviously they have to decide in the end if there's no consensus of certain feature in the community)).

Z1202
KVRian
1023 posts since 12 Apr, 2002

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:09 pm

Max M. wrote:
Tue Feb 19, 2019 3:02 pm
This all is for the sake of performance (specifically mostly allowing a compiler to make certain assumptions to generate as efficient code as possible and in the same time retain wide platform portability). It never was really a problem to eradicate all these aliasing problems and UBs by making them "fully defined" via dictating a compiler what exactly to do in this or that case regardless of how awful the predefined behavior may end at the target architecture.
The problem is that there is a very strong bias towards making things easy for the compiler and extremely difficult for a developer who wishes to access low-level features. To me it seemed as if there was no care at all striking the balance between the two. Aliasing rules is one of them (I think one could have worked different, more intuitive/usable definitions). Another beautiful example is the definition of the offsetof macro (I guess that's a recent change), which is so restrictive, that the compilers (e.g. clang) make special options to bypass that, simultaneously making it impossible to define this as a macro anymore. Union field access is yet another one. Initially non-existing and later cumbersome ways of precise memory allocation control in STL (which AFAIK wasn't originally designed to be a part of the standard, was just a "nice library", which was later incorporated into the standard "as is"). Etc.

User avatar
Max M.
KVRist
321 posts since 20 Apr, 2005 from Moscow, Russian Federation

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:38 pm

Z1202

Well, I have my own list of complains (specifically I always use printf stuff instead of std:cout and all that crap, and in general not very enthusiastic of "high-level" parts of C++ standard library (incl. STL) because of my embedded systems background and so. The list is endless to be honest).

But when it comes to the specific "aliasing" problem I feel like you guys overexaggerate things a bit, the problem is as old as C itself - and I would never attack something basing on just "uh-oh this rule does not fit this my specific snippet I wrote using my 'common sense gut feeling' about how memory and compiler should work" w/o at least trying to find out why this rule exist there in the first place (and this seems like what stratum does, e.g. "In C/C++ 11, the typeless memory returned by malloc etc. receives the "effective type" of the first access to it" - do you really think they come up with it just because they have nothing else to do and their only desire is to make your life harder? :) ).

mystran
KVRAF
5213 posts since 12 Feb, 2006 from Helsinki, Finland

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:42 pm

The problem with C and C++ is that they have a ridiculous definition for "undefined behaviour" that basically allows the compiler to delete the whole program if it ever invokes any UB anywhere. Yet at the same time, the languages fail to provide standard constructs for things that programmers everywhere rely on a daily basis. You literally can't even implement the C standard library in the language itself without invoking UB! Just try writing something like malloc() without doing integer arithmetic with pointers (which is UB!) and you'll see what I mean.

Production compilers will typically let you do this anyway, but then you have these silly things like signed overflow being UB; I have some code where there is an average of 2-3 casts per line between signed and unsigned integers just so I can get efficient machine code (eg. taking advantage of the wrap arounds; they are defined for unsigned integers) without invoking UB.
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

stratum
KVRAF
2242 posts since 29 May, 2012

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:44 pm

Max M. wrote:
Tue Feb 19, 2019 3:38 pm
"In C/C++ 11, the typeless memory returned by malloc etc. receives the "effective type" of the first access to it" - do you really think they come up with it just because they have nothing else to do and their only desire is to make your life harder? :) ).
Well, yes I do think exactly that, because there are two problems with that rule (1) it is not explicitly visible as a syntactic feature, one has to read that 1000 pages long standard document to see it, or somehow run into it (2) it's useless in practice unless one also checks the length of the array in every array access.
~stratum~

User avatar
Max M.
KVRist
321 posts since 20 Apr, 2005 from Moscow, Russian Federation

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 3:58 pm

mystran

Okay, how many languages do you know where you can write a malloc() at all? I'd be happy to agree if I have anything to compare with :)
(Well, let's exclude platform specific assemblers of course).

Z1202

Same for you. (Honestly I'm absolutely unfamiliar with Rust, I've only read the wikipedia article). Are you sure Rust will allow you to cast double* to int64* and vice-versa? (i.e. will it allow you to write these 2^x functions like in the sister thread at all?) What language will?

mystran
KVRAF
5213 posts since 12 Feb, 2006 from Helsinki, Finland

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 4:24 pm

Also just about 99.99% problems with the current state of C/C++ could be solved by redefining "undefined behaviour" such that any given implementation must treat it as either "implementation specific" (which implies it's specified, one way or another) or a compilation error... but this is not going to happen, because compiler writers don't care whether the code their compiler produces actually does something sensible, as long as it runs as fast as possible and it's a lot easier to write optimisations when you can cut corners whenever the standard fails to specify something.
If you'd like Signaldust to return, please ask Katinka Tuisku to resign.

Z1202
KVRian
1023 posts since 12 Apr, 2002

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 11:09 pm

@Max M.
In regards to aliasing, I highly suspect there could have been developed a more sensible set of rules for the assumptions that the compiler may make when analysing aliasing (possibly with some modifications of the language, e.g. along the lines of the "restrict" keyword, although I hope smarter solution options exist). Currently it seems that only a compiler extension (may_alias) allows to do type punning in a 100% guaranteed way (this is also what I get from the posted above cppcon talk, where the speaker was even not 100% sure about std::memcpy, plus we really become dependent on how well a compiler may optimize a call to memcpy, as this must include register optimizations of a non-trivial nature). That is, read my lips: C++ doesn't allow type punning. Period.

Note that people have been doing type-punning for years and decades. So rather than incorporating a common usage case in one or another way, the standard forbids it (and does so in an "under the carpet" manner, most people are even not aware that they are writing forbidden code).

Similarly, offsetof is pretty much unsupported by the standard (only simple cases). And pointers to members (which offsetof is mostly used to replace) are more or less a joke when it comes to realtime code (and again, some compilers e.g. MSVC attempt extensions of the standard). Not being able to dereference a NULL or unallocated pointer is a joke IMHO as well. First, in a custom offsetof macro I need to dereference a NULL pointer, but I'm not accessing it. Why is it forbidden at all? Second, there can be systems where 0 is a perfectly valid memory address. Why can't I program for those systems in C++?

Small things like not being able to static_cast in a multiple inheritance tree without NULL checks generated by the compiler. Etc.

And the worst thing, I don't know if a single person (except for a C++ compiler specialist) can remember all the exceptions and special cases of the language.

Otherwise, mystran pretty much wrapped my point.

Edit: I wouldn't have been making all those complaints about C++ if it was just a high-level language. However C++ is fundamentally based on low-level language features, at the same time almost none of these low-level features officially work when used on a low-level. Looks like a sadistic joke to me: why design a low-level language when it can't be used for low-level programming?
Last edited by Z1202 on Tue Feb 19, 2019 11:54 pm, edited 3 times in total.

Z1202
KVRian
1023 posts since 12 Apr, 2002

Re: may_alias (thread separated from another one)

Post Tue Feb 19, 2019 11:22 pm

Max M. wrote:
Tue Feb 19, 2019 3:58 pm
Z1202

Same for you. (Honestly I'm absolutely unfamiliar with Rust, I've only read the wikipedia article). Are you sure Rust will allow you to cast double* to int64* and vice-versa? (i.e. will it allow you to write these 2^x functions like in the sister thread at all?) What language will?
No, I'm not really familiar with Rust. It was just my impression of the spirit of the language (which I checked quite a few years back), that the language's author is actually aware of and cares about efficient code generation and low-level programming, not just "efficient code generation of some code at the cost of impossibility to write other code" which seems to be the case with C++

User avatar
Max M.
KVRist
321 posts since 20 Apr, 2005 from Moscow, Russian Federation

Re: may_alias (thread separated from another one)

Post Wed Feb 20, 2019 12:40 am

Z1202

Instead of pointless arguing for particular cases, I guess I'll rather stress another important thing mentioned by mystran:
... but this is not going to happen, because compiler writers ...

That is, when we say C++ in a sense of C++ Standard, do we actually remember that the standard from its very beginning was nothing but an attempt to reconcile various compiler manufactures and bring at least some order in? And since then none of compilers was bothered to actually 100% implement either of the standard versions... Why do you think they tell you if you go with "Psst! Guys, no more UBs since now?" ;)

So a perfect language w/o UBs in a perfect world? Okay. In the real world this would just tear it (any non-VM language) into endless incompatible dialects (and then different languages) like "MSC++#", "GNU (read Intel) C++$", "Apple/BSD/Whatever C++©" etc etc. - getting more and more incompatible with each other each new day - so that by today you would not even be able to compile your code for each platform (same CPU! I'm not even talking of "exotics" like GPU/ARM/Endless DSPs and MCUs) w/o rewriting half of it first (and before you tell me you'd be fine to do it now with one of them only (e.g. with GCC) I'll reply "think if it ever could get to that 'now' via that 'other way'").

I was not going to discuss anything but "aliasing" - and I'm afraid by now the thread has gone too deep into fantasy to take anything seriously (just like it was at viewtopic.php?f=33&t=481683 - yeah, everybody knows how a perfect language should look like but nobody ever gets one... surprise, surprise. The very fact that in our area of activity we still have to heavily rely on something developed 50 years ago, with all of the outdated weirdness of that time visions and priorities, should ring some dreamer's bell).

Second, there can be systems where 0 is a perfectly valid memory address. Why can't I program for those systems in C++?

Oh, come on... Loss of 1 byte at the beginning of the memory space (in real systems these regions are usually reserved for some whatever "internal hardware/system use" anyway and/or mapped to a virtual address values) is not equal to "I can't program for those systems".

- - -
P.S. In summary my position is that I'm absolutely fine with UBs if it's the price for me being able to use the same language to develop for several CPU/DSP/MCU systems for all these years (and UBs were not even in the top10 of the problems I encountered while doing that).

Return to “DSP and Plug-in Development”