vectorization - Conditional instructions in AVX2 -
can give list of conditional instructions available in avx2? far i've found following:
- _mm256_blendv_* selection- a,- bbased on mask- c
are there conditional multiply , conditional add, etc.?
also if instructions taking imm8 count (like _mm256_blend_*), explain how imm8 after vector comparision?
avx512 introduces optional zero-masking , merge-masking instructions.
before that, conditional add, mask 1 operand (with vandps or vandnps inverse) before add (instead of vblendvps on result).  why packed-compare instructions/intrinsics produce all-zero or all-one elements.
0.0 additive identity element, adding no-op.  (except ieee semantics of -0.0 , +0.0, forget how works exactly).
masking constant input instead of blending result avoids making critical path longer, conditionally adding 1.0.
conditional multiply more cumbersome because 0.0 not multiplicative identity.  need multiply 1.0 keep value unchanged, , can't produce , or andn compare result.  can blendv input, or can multiply , blendv output.
the alternative blendv @ least 3 booleans, and/andn/or, that's not worth it.  although note haswell runs vblendvps , vpblendvb 2 uops port 5, it's potential bottleneck compared using integer booleans can run on port.  skylake runs them vblendvps 2 uops port.  make sense avoid having blendv on critical path, though.
masking input operand or blending result how branchless simd conditionals.
blendv @ least 2 uops, it's slower and.
immediate blends more efficient, can't use them, because the imm8 blend control has compile-time constant embedded instruction's machine code.  that's immediate means in assembly-language context.
wiki
Comments
Post a Comment