user_1234's comments

user_1234 · on Feb 9, 2020

Looks like there is a Fortran compiler which is already emitting MLIR IR with few OpenMP constructs support and 2 SPEC CPU 2017 benchmarks running: https://github.com/compiler-tree-technologies/fc

Phoronix article: https://www.phoronix.com/scan.php?page=news_item&px=FC-LLVM-...

user_1234 · on Oct 25, 2018

> The next-level challenge is compiling multiple alternates into the same binary, and selecting the best at runtime.

What are the use cases for having multiple alternates in the same binary? Why not decide them during the compile time for the given architecture?

If portability is the concern here, wouldn't it lead to sub-optimal code anyway?

rcxdude · on Oct 25, 2018

If you're shipping binaries, you don't know the exact architecture in advance (because there are many extensions to x86 and you don't know if the end user is running a new enough processor to use all of them). If you don't use them you are likely leaving performance on the table. So you want to select the fastest option supported by the processor you happen to be running on. You can do this with fairly minimal peformance impact by linking in different versions of the function at runtime, but this requires some support from your compiler and runtime environment.

user_1234 · on Oct 26, 2018

I think the question here, which one is better:

1. A portable binary where only individual SIMD operations are optimized for all targets. 2. Building the optimized binary for every target architecture when needed (either by the user or by the binary distributor).

Concern with (1) is, as the number of dynamically called functions (or decided by if-else nests) increases the quality of the generated code reduces for any architecture. Basically, compiler will be left with opaque unrecognizable functions which restricts even the target independent optimizations (Like, GVN, CSE Constant propagation etc).

Let's say, if the user writes a SIMD program which contains full of dynamically called functions (which are opaque to the compiler), doesn't it impact the performance heavily?

Isn't taking the compiler support for optimizing the SIMD operations necessary rather than writing wrapper libraries ? For example, lowering the SIMD operation calls to the existing vectorized math libraries which are recognizable by the compilers ( Example: sin(), cos(), pow() in libm ).