如何在clang++中禁用矢量化

How to disable vectorization in clang++?

本文关键字:矢量化 clang++      更新时间:2023-10-16

考虑以下小型搜索函数:

template <uint32_t N>
int32_t countsearch(const uint32_t *base, uint32_t needle) {
uint32_t count = 0;
#pragma clang loop vectorize(disable)
for (const uint32_t *probe = base; probe < base + N; probe++) {
if (*probe < needle)
count++;
}
return count;
}

-O2或更高版本中,clang将此搜索矢量化,例如。生成这样的代码(针对10个元素(:

int countsearch<10u>(unsigned int const*, unsigned int):            # @int countsearch<10u>(unsigned int const*, unsigned int)
vmovd   xmm0, esi
vpbroadcastd    ymm0, xmm0
vpbroadcastd    ymm1, dword ptr [rip + .LCPI0_0] # ymm1 = [2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648,2147483648]
vpxor   ymm2, ymm1, ymmword ptr [rdi]
vpxor   ymm0, ymm0, ymm1
vpcmpgtd        ymm0, ymm0, ymm2
cmp     dword ptr [rdi + 32], esi
vpsrld  ymm1, ymm0, 31
vextracti128    xmm1, ymm1, 1
vpsubd  ymm0, ymm1, ymm0
vpshufd xmm1, xmm0, 78          # xmm1 = xmm0[2,3,0,1]
vpaddd  ymm0, ymm0, ymm1
vphaddd ymm0, ymm0, ymm0
vmovd   eax, xmm0
adc     eax, 0
cmp     dword ptr [rdi + 36], esi
adc     eax, 0
vzeroupper
ret

如何在命令行或代码中使用#pragma禁用此矢量化?

我尝试了以下命令行参数,但没有一个阻止矢量化:

-disable-loop-vectorization 
-disable-vectorization
-fno-vectorize 
-fno-tree-vectorize

正如您在上面的代码中看到的那样,我也在循环上方尝试了#pragma clang loop vectorize(disable),但没有成功。

关闭SLP矢量化:

clang++ -O2 -fno-slp-vectorize

Godbolt链接