vectorization - Conditional instructions in AVX2 -
can give list of conditional instructions available in avx2? far i've found following:
_mm256_blendv_
* selectiona
,b
based on maskc
are there conditional multiply , conditional add, etc.?
also if instructions taking imm8
count (like _mm256_blend_
*), explain how imm8
after vector comparision?
avx512 introduces optional zero-masking , merge-masking instructions.
before that, conditional add, mask 1 operand (with vandps
or vandnps
inverse) before add (instead of vblendvps
on result). why packed-compare instructions/intrinsics produce all-zero or all-one elements.
0.0
additive identity element, adding no-op. (except ieee semantics of -0.0 , +0.0, forget how works exactly).
masking constant input instead of blending result avoids making critical path longer, conditionally adding 1.0
.
conditional multiply more cumbersome because 0.0
not multiplicative identity. need multiply 1.0
keep value unchanged, , can't produce , or andn compare result. can blendv input, or can multiply , blendv output.
the alternative blendv @ least 3 booleans, and/andn/or, that's not worth it. although note haswell runs vblendvps
, vpblendvb
2 uops port 5, it's potential bottleneck compared using integer booleans can run on port. skylake runs them vblendvps
2 uops port. make sense avoid having blendv on critical path, though.
masking input operand or blending result how branchless simd conditionals.
blendv @ least 2 uops, it's slower and.
immediate blends more efficient, can't use them, because the imm8
blend control has compile-time constant embedded instruction's machine code. that's immediate means in assembly-language context.
wiki
Comments
Post a Comment