I used for years the Félix Cloutier's repository, but when it was done, I was kinda stuck and so decided to provide 
 a backup with a different view of the index table and
  where all information that I need are accessible at first sight.
  
Online assembler and disassembler for multiple architectures are also available.
| Instruction | Opcode | CPU Ext. | Description | 
|---|---|---|---|
| AAA | 37 | ASCII adjust AL after addition. | |
| AAD | D5 0A | ASCII adjust AX before division. | |
| AAD imm8 | D5 ib | Adjust AX before division to number base imm8. | |
| AAM | D4 0A | ASCII adjust AX after multiply. | |
| AAM imm8 | D4 ib | Adjust AX after multiply to number base imm8. | |
| AAS | 3F | ASCII adjust AL after subtraction. | |
| ADC AL, imm8 | 14 ib | Add with carry imm8 to AL. | |
| ADC AX, imm16 | 15 iw | Add with carry imm16 to AX. | |
| ADC EAX, imm32 | 15 id | Add with carry imm32 to EAX. | |
| ADC RAX, imm32 | REX.W + 15 id | Add with carry imm32 sign extended to 64-bits to RAX. | |
| ADC r/m8, imm8 | 80 /2 ib | Add with carry imm8 to r/m8. | |
| ADC r/m8, imm8 | REX + 80 /2 ib | Add with carry imm8 to r/m8. | |
| ADC r/m16, imm16 | 81 /2 iw | Add with carry imm16 to r/m16. | |
| ADC r/m32, imm32 | 81 /2 id | Add with CF imm32 to r/m32. | |
| ADC r/m64, imm32 | REX.W + 81 /2 id | Add with CF imm32 sign extended to 64-bits to r/m64. | |
| ADC r/m16, imm8 | 83 /2 ib | Add with CF sign-extended imm8 to r/m16. | |
| ADC r/m32, imm8 | 83 /2 ib | Add with CF sign-extended imm8 into r/m32. | |
| ADC r/m64, imm8 | REX.W + 83 /2 ib | Add with CF sign-extended imm8 into r/m64. | |
| ADC r/m8, r8 | 10 /r | Add with carry byte register to r/m8. | |
| ADC r/m8, r8 | REX + 10 /r | Add with carry byte register to r/m64. | |
| ADC r/m16, r16 | 11 /r | Add with carry r16 to r/m16. | |
| ADC r/m32, r32 | 11 /r | Add with CF r32 to r/m32. | |
| ADC r/m64, r64 | REX.W + 11 /r | Add with CF r64 to r/m64. | |
| ADC r8, r/m8 | 12 /r | Add with carry r/m8 to byte register. | |
| ADC r8, r/m8 | REX + 12 /r | Add with carry r/m64 to byte register. | |
| ADC r16, r/m16 | 13 /r | Add with carry r/m16 to r16. | |
| ADC r32, r/m32 | 13 /r | Add with CF r/m32 to r32. | |
| ADC r64, r/m64 | REX.W + 13 /r | Add with CF r/m64 to r64. | |
| ADCX r32, r/m32 | 66 0F 38 F6 /r | adx | Unsigned addition of r32 with CF, r/m32 to r32, writes CF. | 
| ADCX r64, r/m64 | 66 REX.w 0F 38 F6 /r | adx | Unsigned addition of r64 with CF, r/m64 to r64, writes CF. | 
| ADD AL, imm8 | 04 ib | Add imm8 to AL. | |
| ADD AX, imm16 | 05 iw | Add imm16 to AX. | |
| ADD EAX, imm32 | 05 id | Add imm32 to EAX. | |
| ADD RAX, imm32 | REX.W + 05 id | Add imm32 sign-extended to 64-bits to RAX. | |
| ADD r/m8, imm8 | 80 /0 ib | Add imm8 to r/m8. | |
| ADD r/m8, imm8 | REX + 80 /0 ib | Add sign-extended imm8 to r/m64. | |
| ADD r/m16, imm16 | 81 /0 iw | Add imm16 to r/m16. | |
| ADD r/m32, imm32 | 81 /0 id | Add imm32 to r/m32. | |
| ADD r/m64, imm32 | REX.W + 81 /0 id | Add imm32 sign-extended to 64-bits to r/m64. | |
| ADD r/m16, imm8 | 83 /0 ib | Add sign-extended imm8 to r/m16. | |
| ADD r/m32, imm8 | 83 /0 ib | Add sign-extended imm8 to r/m32. | |
| ADD r/m64, imm8 | REX.W + 83 /0 ib | Add sign-extended imm8 to r/m64. | |
| ADD r/m8, r8 | 00 /r | Add r8 to r/m8. | |
| ADD r/m8, r8 | REX + 00 /r | Add r8 to r/m8. | |
| ADD r/m16, r16 | 01 /r | Add r16 to r/m16. | |
| ADD r/m32, r32 | 01 /r | Add r32 to r/m32. | |
| ADD r/m64, r64 | REX.W + 01 /r | Add r64 to r/m64. | |
| ADD r8, r/m8 | 02 /r | Add r/m8 to r8. | |
| ADD r8, r/m8 | REX + 02 /r | Add r/m8 to r8. | |
| ADD r16, r/m16 | 03 /r | Add r/m16 to r16. | |
| ADD r32, r/m32 | 03 /r | Add r/m32 to r32. | |
| ADD r64, r/m64 | REX.W + 03 /r | Add r/m64 to r64. | |
| ADDPD xmm1, xmm2/m128 | 66 0F 58 /r | sse2 | Add packed double-precision floating-point values from xmm2/mem to xmm1 and store result in xmm1. | 
| VADDPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 58 /r | avx | Add packed double-precision floating-point values from xmm3/mem to xmm2 and store result in xmm1. | 
| VADDPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 58 /r | avx | Add packed double-precision floating-point values from ymm3/mem to ymm2 and store result in ymm1. | 
| VADDPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 58 /r | avx512 | Add packed double-precision floating-point values from xmm3/m128/m64bcst to xmm2 and store result in xmm1 with writemask k1. | 
| VADDPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 58 /r | avx512 | Add packed double-precision floating-point values from ymm3/m256/m64bcst to ymm2 and store result in ymm1 with writemask k1. | 
| VADDPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F.W1 58 /r | avx512 | Add packed double-precision floating-point values from zmm3/m512/m64bcst to zmm2 and store result in zmm1 with writemask k1. | 
| ADDPS xmm1, xmm2/m128 | 0F 58 /r | sse | Add packed single-precision floating-point values from xmm2/m128 to xmm1 and store result in xmm1. | 
| VADDPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 58 /r | avx | Add packed single-precision floating-point values from xmm3/m128 to xmm2 and store result in xmm1. | 
| VADDPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 58 /r | avx | Add packed single-precision floating-point values from ymm3/m256 to ymm2 and store result in ymm1. | 
| VADDPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 58 /r | avx512 | Add packed single-precision floating-point values from xmm3/m128/m32bcst to xmm2 and store result in xmm1 with writemask k1. | 
| VADDPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 58 /r | avx512 | Add packed single-precision floating-point values from ymm3/m256/m32bcst to ymm2 and store result in ymm1 with writemask k1. | 
| VADDPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst {er} | EVEX.NDS.512.0F.W0 58 /r | avx512 | Add packed single-precision floating-point values from zmm3/m512/m32bcst to zmm2 and store result in zmm1 with writemask k1. | 
| ADDSD xmm1, xmm2/m64 | F2 0F 58 /r | sse2 | Add the low double-precision floating-point value from xmm2/mem to xmm1 and store the result in xmm1. | 
| VADDSD xmm1, xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 58 /r | avx | Add the low double-precision floating-point value from xmm3/mem to xmm2 and store the result in xmm1. | 
| VADDSD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.F2.0F.W1 58 /r | avx512 | Add the low double-precision floating-point value from xmm3/m64 to xmm2 and store the result in xmm1 with writemask k1. | 
| ADDSS xmm1, xmm2/m32 | F3 0F 58 /r | sse | Add the low single-precision floating-point value from xmm2/mem to xmm1 and store the result in xmm1. | 
| VADDSS xmm1,xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 58 /r | avx | Add the low single-precision floating-point value from xmm3/mem to xmm2 and store the result in xmm1. | 
| VADDSS xmm1{k1}{z}, xmm2, xmm3/m32{er} | EVEX.NDS.LIG.F3.0F.W0 58 /r | avx512 | Add the low single-precision floating-point value from xmm3/m32 to xmm2 and store the result in xmm1with writemask k1. | 
| ADDSUBPD xmm1, xmm2/m128 | 66 0F D0 /r | sse3 | Add/subtract double-precision floating-point values from xmm2/m128 to xmm1. | 
| VADDSUBPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D0 /r | avx | Add/subtract packed double-precision floating-point values from xmm3/mem to xmm2 and stores result in xmm1. | 
| VADDSUBPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG D0 /r | avx | Add / subtract packed double-precision floating-point values from ymm3/mem to ymm2 and stores result in ymm1. | 
| ADDSUBPS xmm1, xmm2/m128 | F2 0F D0 /r | sse3 | Add/subtract single-precision floating-point values from xmm2/m128 to xmm1. | 
| VADDSUBPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.F2.0F.WIG D0 /r | avx | Add/subtract single-precision floating-point values from xmm3/mem to xmm2 and stores result in xmm1. | 
| VADDSUBPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.F2.0F.WIG D0 /r | avx | Add / subtract single-precision floating-point values from ymm3/mem to ymm2 and stores result in ymm1. | 
| ADOX r32, r/m32 | F3 0F 38 F6 /r | adx | Unsigned addition of r32 with OF, r/m32 to r32, writes OF. | 
| ADOX r64, r/m64 | F3 REX.w 0F 38 F6 /r | adx | Unsigned addition of r64 with OF, r/m64 to r64, writes OF. | 
| AESDEC xmm1, xmm2/m128 | 66 0F 38 DE /r | aes | Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. | 
| VAESDEC xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG DE /r | aes | Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1. | 
| AESDECLAST xmm1, xmm2/m128 | 66 0F 38 DF /r | aes | Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. | 
| VAESDECLAST xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG DF /r | aes | Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1. | 
| AESENC xmm1, xmm2/m128 | 66 0F 38 DC /r | aes | Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. | 
| VAESENC xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG DC /r | aes | Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from the xmm3/m128; store the result in xmm1. | 
| AESENCLAST xmm1, xmm2/m128 | 66 0F 38 DD /r | aes | Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. | 
| VAESENCLAST xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG DD /r | aes | Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128 bit round key from xmm3/m128; store the result in xmm1. | 
| AESIMC xmm1, xmm2/m128 | 66 0F 38 DB /r | aes | Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1. | 
| VAESIMC xmm1, xmm2/m128 | VEX.128.66.0F38.WIG DB /r | aes | Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1. | 
| AESKEYGENASSIST xmm1, xmm2/m128, imm8 | 66 0F 3A DF /r ib | aes | Assist in AES round key generation using an 8 bits Round Constant (RCON) specified in the immediate byte, operating on 128 bits of data specified in xmm2/m128 and stores the result in xmm1. | 
| VAESKEYGENASSIST xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG DF /r ib | aes | Assist in AES round key generation using 8 bits Round Constant (RCON) specified in the immediate byte, operating on 128 bits of data specified in xmm2/m128 and stores the result in xmm1. | 
| AND AL, imm8 | 24 ib | AL AND imm8. | |
| AND AX, imm16 | 25 iw | AX AND imm16. | |
| AND EAX, imm32 | 25 id | EAX AND imm32. | |
| AND RAX, imm32 | REX.W + 25 id | RAX AND imm32 sign-extended to 64-bits. | |
| AND r/m8, imm8 | 80 /4 ib | r/m8 AND imm8. | |
| AND r/m8, imm8 | REX + 80 /4 ib | r/m8 AND imm8. | |
| AND r/m16, imm16 | 81 /4 iw | r/m16 AND imm16. | |
| AND r/m32, imm32 | 81 /4 id | r/m32 AND imm32. | |
| AND r/m64, imm32 | REX.W + 81 /4 id | r/m64 AND imm32 sign extended to 64-bits. | |
| AND r/m16, imm8 | 83 /4 ib | r/m16 AND imm8 (sign-extended). | |
| AND r/m32, imm8 | 83 /4 ib | r/m32 AND imm8 (sign-extended). | |
| AND r/m64, imm8 | REX.W + 83 /4 ib | r/m64 AND imm8 (sign-extended). | |
| AND r/m8, r8 | 20 /r | r/m8 AND r8. | |
| AND r/m8, r8 | REX + 20 /r | r/m64 AND r8 (sign-extended). | |
| AND r/m16, r16 | 21 /r | r/m16 AND r16. | |
| AND r/m32, r32 | 21 /r | r/m32 AND r32. | |
| AND r/m64, r64 | REX.W + 21 /r | r/m64 AND r32. | |
| AND r8, r/m8 | 22 /r | r8 AND r/m8. | |
| AND r8, r/m8 | REX + 22 /r | r/m64 AND r8 (sign-extended). | |
| AND r16, r/m16 | 23 /r | r16 AND r/m16. | |
| AND r32, r/m32 | 23 /r | r32 AND r/m32. | |
| AND r64, r/m64 | REX.W + 23 /r | r64 AND r/m64. | |
| ANDN r32a, r32b, r/m32 | VEX.NDS.LZ.0F38.W0 F2 /r | bmi1 | Bitwise AND of inverted r32b with r/m32, store result in r32a. | 
| ANDN r64a, r64b, r/m64 | VEX.NDS.LZ. 0F38.W1 F2 /r | bmi1 | Bitwise AND of inverted r64b with r/m64, store result in r64a. | 
| ANDNPD xmm1, xmm2/m128 | 66 0F 55 /r | sse2 | Return the bitwise logical AND NOT of packed double-precision floating-point values in xmm1 and xmm2/mem. | 
| VANDNPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F 55 /r | avx | Return the bitwise logical AND NOT of packed double-precision floating-point values in xmm2 and xmm3/mem. | 
| VANDNPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F 55/r | avx | Return the bitwise logical AND NOT of packed double-precision floating-point values in ymm2 and ymm3/mem. | 
| VANDNPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 55 /r | avx512 | Return the bitwise logical AND NOT of packed double-precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1. | 
| VANDNPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 55 /r | avx512 | Return the bitwise logical AND NOT of packed double-precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1. | 
| VANDNPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 55 /r | avx512 | Return the bitwise logical AND NOT of packed double-precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1. | 
| ANDNPS xmm1, xmm2/m128 | 0F 55 /r | sse | Return the bitwise logical AND NOT of packed single-precision floating-point values in xmm1 and xmm2/mem. | 
| VANDNPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.0F 55 /r | avx | Return the bitwise logical AND NOT of packed single-precision floating-point values in xmm2 and xmm3/mem. | 
| VANDNPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F 55 /r | avx | Return the bitwise logical AND NOT of packed single-precision floating-point values in ymm2 and ymm3/mem. | 
| VANDNPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 55 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1. | 
| VANDNPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 55 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1. | 
| VANDNPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.0F.W0 55 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1. | 
| ANDPD xmm1, xmm2/m128 | 66 0F 54 /r | sse2 | Return the bitwise logical AND of packed double-precision floating-point values in xmm1 and xmm2/mem. | 
| VANDPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F 54 /r | avx | Return the bitwise logical AND of packed double-precision floating-point values in xmm2 and xmm3/mem. | 
| VANDPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F 54 /r | avx | Return the bitwise logical AND of packed double-precision floating-point values in ymm2 and ymm3/mem. | 
| VANDPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 54 /r | avx512 | Return the bitwise logical AND of packed double-precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1. | 
| VANDPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 54 /r | avx512 | Return the bitwise logical AND of packed double-precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1. | 
| VANDPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 54 /r | avx512 | Return the bitwise logical AND of packed double-precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1. | 
| ANDPS xmm1, xmm2/m128 | 0F 54 /r | sse | Return the bitwise logical AND of packed single-precision floating-point values in xmm1 and xmm2/mem. | 
| VANDPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F 54 /r | avx | Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/mem. | 
| VANDPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F 54 /r | avx | Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/mem. | 
| VANDPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 54 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1. | 
| VANDPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 54 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1. | 
| VANDPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.0F.W0 54 /r | avx512 | Return the bitwise logical AND of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1. | 
| ARPL r/m16, r16 | 63 /r | Adjust RPL of r/m16 to not less than RPL of r16. | |
| BEXTR r32a, r/m32, r32b | VEX.NDS.LZ.0F38.W0 F7 /r | Contiguous bitwise extract from r/m32 using r32b as control; store result in r32a. | |
| BEXTR r64a, r/m64, r64b | VEX.NDS.LZ.0F38.W1 F7 /r | Contiguous bitwise extract from r/m64 using r64b as control; store result in r64a | |
| BLENDPD xmm1, xmm2/m128, imm8 | 66 0F 3A 0D /r ib | sse4.1 | Select packed DP-FP values from xmm1 and xmm2/m128 from mask specified in imm8 and store the values into xmm1. | 
| VBLENDPD xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 0D /r ib | avx | Select packed double-precision floating-point Values from xmm2 and xmm3/m128 from mask in imm8 and store the values in xmm1. | 
| VBLENDPD ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 0D /r ib | avx | Select packed double-precision floating-point Values from ymm2 and ymm3/m256 from mask in imm8 and store the values in ymm1. | 
| BLENDPS xmm1, xmm2/m128, imm8 | 66 0F 3A 0C /r ib | sse4.1 | Select packed single precision floating-point values from xmm1 and xmm2/m128 from mask specified in imm8 and store the values into xmm1. | 
| VBLENDPS xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 0C /r ib | avx | Select packed single-precision floating-point values from xmm2 and xmm3/m128 from mask in imm8 and store the values in xmm1. | 
| VBLENDPS ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 0C /r ib | avx | Select packed single-precision floating-point values from ymm2 and ymm3/m256 from mask in imm8 and store the values in ymm1. | 
| BLENDVPD xmm1, xmm2/m128 , <XMM0> | 66 0F 38 15 /r | sse4.1 | Select packed DP FP values from xmm1 and xmm2 from mask specified in XMM0 and store the values in xmm1. | 
| VBLENDVPD xmm1, xmm2, xmm3/m128, xmm4 | VEX.NDS.128.66.0F3A.W0 4B /r /is4 | avx | Conditionally copy double-precision floating-point values from xmm2 or xmm3/m128 to xmm1, based on mask bits in the mask operand, xmm4. | 
| VBLENDVPD ymm1, ymm2, ymm3/m256, ymm4 | VEX.NDS.256.66.0F3A.W0 4B /r /is4 | avx | Conditionally copy double-precision floating-point values from ymm2 or ymm3/m256 to ymm1, based on mask bits in the mask operand, ymm4. | 
| BLENDVPS xmm1, xmm2/m128, <XMM0> | 66 0F 38 14 /r | sse4.1 | Select packed single precision floating-point values from xmm1 and xmm2/m128 from mask specified in XMM0 and store the values into xmm1. | 
| VBLENDVPS xmm1, xmm2, xmm3/m128, xmm4 | VEX.NDS.128.66.0F3A.W0 4A /r /is4 | avx | Conditionally copy single-precision floating-point values from xmm2 or xmm3/m128 to xmm1, based on mask bits in the specified mask operand, xmm4. | 
| VBLENDVPS ymm1, ymm2, ymm3/m256, ymm4 | VEX.NDS.256.66.0F3A.W0 4A /r /is4 | avx | Conditionally copy single-precision floating-point values from ymm2 or ymm3/m256 to ymm1, based on mask bits in the specified mask register, ymm4. | 
| BLSI r32, r/m32 | VEX.NDD.LZ.0F38.W0 F3 /3 | bmi1 | Extract lowest set bit from r/m32 and set that bit in r32. | 
| BLSI r64, r/m64 | VEX.NDD.LZ.0F38.W1 F3 /3 | bmi1 | Extract lowest set bit from r/m64, and set that bit in r64. | 
| BLSMSK r32, r/m32 | VEX.NDD.LZ.0F38.W0 F3 /2 | bmi1 | Set all lower bits in r32 to “1” starting from bit 0 to lowest set bit in r/m32. | 
| BLSMSK r64, r/m64 | VEX.NDD.LZ.0F38.W1 F3 /2 | bmi1 | Set all lower bits in r64 to “1” starting from bit 0 to lowest set bit in r/m64. | 
| BLSR r32, r/m32 | VEX.NDD.LZ.0F38.W0 F3 /1 | bmi1 | Reset lowest set bit of r/m32, keep all other bits of r/m32 and write result to r32. | 
| BLSR r64, r/m64 | VEX.NDD.LZ.0F38.W1 F3 /1 | bmi1 | Reset lowest set bit of r/m64, keep all other bits of r/m64 and write result to r64. | 
| BNDCL bnd, r/m32 | F3 0F 1A /r | mpx | Generate a #BR if the address in r/m32 is lower than the lower bound in bnd.LB. | 
| BNDCL bnd, r/m64 | F3 0F 1A /r | mpx | Generate a #BR if the address in r/m64 is lower than the lower bound in bnd.LB. | 
| BNDCU bnd, r/m32 | F2 0F 1A /r | mpx | Generate a #BR if the address in r/m32 is higher than the upper bound in bnd.UB (bnb.UB in 1's complement form). | 
| BNDCU bnd, r/m64 | F2 0F 1A /r | mpx | Generate a #BR if the address in r/m64 is higher than the upper bound in bnd.UB (bnb.UB in 1's complement form). | 
| BNDCN bnd, r/m32 | F2 0F 1B /r | mpx | Generate a #BR if the address in r/m32 is higher than the upper bound in bnd.UB (bnb.UB not in 1's complement form). | 
| BNDCN bnd, r/m64 | F2 0F 1B /r | mpx | Generate a #BR if the address in r/m64 is higher than the upper bound in bnd.UB (bnb.UB not in 1's complement form). | 
| BNDLDX bnd, mib | 0F 1A /r | mpx | Load the bounds stored in a bound table entry (BTE) into bnd with address translation using the base of mib and conditional on the index of mib matching the pointer value in the BTE. | 
| BNDMK bnd, m32 | F3 0F 1B /r | mpx | Make lower and upper bounds from m32 and store them in bnd. | 
| BNDMK bnd, m64 | F3 0F 1B /r | mpx | Make lower and upper bounds from m64 and store them in bnd. | 
| BNDMOV bnd1, bnd2/m64 | 66 0F 1A /r | mpx | Move lower and upper bound from bnd2/m64 to bound register bnd1. | 
| BNDMOV bnd1, bnd2/m128 | 66 0F 1A /r | mpx | Move lower and upper bound from bnd2/m128 to bound register bnd1. | 
| BNDMOV bnd1/m64, bnd2 | 66 0F 1B /r | mpx | Move lower and upper bound from bnd2 to bnd1/m64. | 
| BNDMOV bnd1/m128, bnd2 | 66 0F 1B /r | mpx | Move lower and upper bound from bnd2 to bound register bnd1/m128. | 
| BNDSTX mib, bnd | 0F 1B /r | mpx | Store the bounds in bnd and the pointer value in the index regis-ter of mib to a bound table entry (BTE) with address translation using the base of mib. | 
| BOUND r16, m16&16 | 62 /r | Check if r16 (array index) is within bounds specified by m16&16. | |
| BOUND r32, m32&32 | 62 /r | Check if r32 (array index) is within bounds specified by m32&32. | |
| BSF r16, r/m16 | 0F BC /r | Bit scan forward on r/m16. | |
| BSF r32, r/m32 | 0F BC /r | Bit scan forward on r/m32. | |
| BSF r64, r/m64 | REX.W + 0F BC /r | Bit scan forward on r/m64. | |
| BSR r16, r/m16 | 0F BD /r | Bit scan reverse on r/m16. | |
| BSR r32, r/m32 | 0F BD /r | Bit scan reverse on r/m32. | |
| BSR r64, r/m64 | REX.W + 0F BD /r | Bit scan reverse on r/m64. | |
| BSWAP r32 | 0F C8+rd | Reverses the byte order of a 32-bit register. | |
| BSWAP r64 | REX.W + 0F C8+rd | Reverses the byte order of a 64-bit register. | |
| BT r/m16, r16 | 0F A3 /r | Store selected bit in CF flag. | |
| BT r/m32, r32 | 0F A3 /r | Store selected bit in CF flag. | |
| BT r/m64, r64 | REX.W + 0F A3 /r | Store selected bit in CF flag. | |
| BT r/m16, imm8 | 0F BA /4 ib | Store selected bit in CF flag. | |
| BT r/m32, imm8 | 0F BA /4 ib | Store selected bit in CF flag. | |
| BT r/m64, imm8 | REX.W + 0F BA /4 ib | Store selected bit in CF flag. | |
| BTC r/m16, r16 | 0F BB /r | Store selected bit in CF flag and complement. | |
| BTC r/m32, r32 | 0F BB /r | Store selected bit in CF flag and complement. | |
| BTC r/m64, r64 | REX.W + 0F BB /r | Store selected bit in CF flag and complement. | |
| BTC r/m16, imm8 | 0F BA /7 ib | Store selected bit in CF flag and complement. | |
| BTC r/m32, imm8 | 0F BA /7 ib | Store selected bit in CF flag and complement. | |
| BTC r/m64, imm8 | REX.W + 0F BA /7 ib | Store selected bit in CF flag and complement. | |
| BTR r/m16, r16 | 0F B3 /r | Store selected bit in CF flag and clear. | |
| BTR r/m32, r32 | 0F B3 /r | Store selected bit in CF flag and clear. | |
| BTR r/m64, r64 | REX.W + 0F B3 /r | Store selected bit in CF flag and clear. | |
| BTR r/m16, imm8 | 0F BA /6 ib | Store selected bit in CF flag and clear. | |
| BTR r/m32, imm8 | 0F BA /6 ib | Store selected bit in CF flag and clear. | |
| BTR r/m64, imm8 | REX.W + 0F BA /6 ib | Store selected bit in CF flag and clear. | |
| BTS r/m16, r16 | 0F AB /r | Store selected bit in CF flag and set. | |
| BTS r/m32, r32 | 0F AB /r | Store selected bit in CF flag and set. | |
| BTS r/m64, r64 | REX.W + 0F AB /r | Store selected bit in CF flag and set. | |
| BTS r/m16, imm8 | 0F BA /5 ib | Store selected bit in CF flag and set. | |
| BTS r/m32, imm8 | 0F BA /5 ib | Store selected bit in CF flag and set. | |
| BTS r/m64, imm8 | REX.W + 0F BA /5 ib | Store selected bit in CF flag and set. | |
| BZHI r32a, r/m32, r32b | VEX.NDS.LZ.0F38.W0 F5 /r | bmi2 | Zero bits in r/m32 starting with the position in r32b, write result to r32a. | 
| BZHI r64a, r/m64, r64b | VEX.NDS.LZ.0F38.W1 F5 /r | bmi2 | Zero bits in r/m64 starting with the position in r64b, write result to r64a. | 
| CALL rel16 | E8 cw | Call near, relative, displacement relative to next instruction. | |
| CALL rel32 | E8 cd | Call near, relative, displacement relative to next instruction. 32-bit displacement sign extended to 64-bits in 64-bit mode. | |
| CALL r/m16 | FF /2 | Call near, absolute indirect, address given in r/m16. | |
| CALL r/m32 | FF /2 | Call near, absolute indirect, address given in r/m32. | |
| CALL r/m64 | FF /2 | Call near, absolute indirect, address given in r/m64. | |
| CALL ptr16:16 | 9A cd | Call far, absolute, address given in operand. | |
| CALL ptr16:32 | 9A cp | Call far, absolute, address given in operand. | |
| CALL m16:16 | FF /3 | Call far, absolute indirect address given in m16:16. In 32-bit mode: if selector points to a gate, then RIP = 32-bit zero extended displacement taken from gate; else RIP = zero extended 16-bit offset from far pointer referenced in the instruction. | |
| CALL m16:32 | FF /3 | In 64-bit mode: If selector points to a gate, then RIP = 64-bit displacement taken from gate; else RIP = zero extended 32-bit offset from far pointer referenced in the instruction. | |
| CALL m16:64 | REX.W + FF /3 | In 64-bit mode: If selector points to a gate, then RIP = 64-bit displacement taken from gate; else RIP = 64-bit offset from far pointer referenced in the instruction. | |
| CBW | 98 | AX ← sign-extend of AL. | |
| CWDE | 98 | EAX ← sign-extend of AX. | |
| CDQE | REX.W + 98 | RAX ← sign-extend of EAX. | |
| CLAC | 0F 01 CA | Clear the AC flag in the EFLAGS register. | |
| CLC | F8 | Clear CF flag. | |
| CLD | FC | Clear DF flag. | |
| CLFLUSH m8 | 0F AE /7 | Flushes cache line containing m8. | |
| CLFLUSHOPT m8 | 66 0F AE /7 | Flushes cache line containing m8. | |
| CLI | FA | Clear interrupt flag; interrupts disabled when interrupt flag cleared. | |
| CLTS | 0F 06 | Clears TS flag in CR0. | |
| CMC | F5 | Complement CF flag. | |
| CMOVA r16, r/m16 | 0F 47 /r | Move if above (CF=0 and ZF=0). | |
| CMOVA r32, r/m32 | 0F 47 /r | Move if above (CF=0 and ZF=0). | |
| CMOVA r64, r/m64 | REX.W + 0F 47 /r | Move if above (CF=0 and ZF=0). | |
| CMOVAE r16, r/m16 | 0F 43 /r | Move if above or equal (CF=0). | |
| CMOVAE r32, r/m32 | 0F 43 /r | Move if above or equal (CF=0). | |
| CMOVAE r64, r/m64 | REX.W + 0F 43 /r | Move if above or equal (CF=0). | |
| CMOVB r16, r/m16 | 0F 42 /r | Move if below (CF=1). | |
| CMOVB r32, r/m32 | 0F 42 /r | Move if below (CF=1). | |
| CMOVB r64, r/m64 | REX.W + 0F 42 /r | Move if below (CF=1). | |
| CMOVBE r16, r/m16 | 0F 46 /r | Move if below or equal (CF=1 or ZF=1). | |
| CMOVBE r32, r/m32 | 0F 46 /r | Move if below or equal (CF=1 or ZF=1). | |
| CMOVBE r64, r/m64 | REX.W + 0F 46 /r | Move if below or equal (CF=1 or ZF=1). | |
| CMOVC r16, r/m16 | 0F 42 /r | Move if carry (CF=1). | |
| CMOVC r32, r/m32 | 0F 42 /r | Move if carry (CF=1). | |
| CMOVC r64, r/m64 | REX.W + 0F 42 /r | Move if carry (CF=1). | |
| CMOVE r16, r/m16 | 0F 44 /r | Move if equal (ZF=1). | |
| CMOVE r32, r/m32 | 0F 44 /r | Move if equal (ZF=1). | |
| CMOVE r64, r/m64 | REX.W + 0F 44 /r | Move if equal (ZF=1). | |
| CMOVG r16, r/m16 | 0F 4F /r | Move if greater (ZF=0 and SF=OF). | |
| CMOVG r32, r/m32 | 0F 4F /r | Move if greater (ZF=0 and SF=OF). | |
| CMOVG r64, r/m64 | REX.W + 0F 4F /r | Move if greater (ZF=0 and SF=OF). | |
| CMOVGE r16, r/m16 | 0F 4D /r | Move if greater or equal (SF=OF). | |
| CMOVGE r32, r/m32 | 0F 4D /r | Move if greater or equal (SF=OF). | |
| CMOVGE r64, r/m64 | REX.W + 0F 4D /r | Move if greater or equal (SF=OF). | |
| CMOVL r16, r/m16 | 0F 4C /r | Move if less (SF≠ OF). | |
| CMOVL r32, r/m32 | 0F 4C /r | Move if less (SF≠ OF). | |
| CMOVL r64, r/m64 | REX.W + 0F 4C /r | Move if less (SF≠ OF). | |
| CMOVLE r16, r/m16 | 0F 4E /r | Move if less or equal (ZF=1 or SF≠ OF). | |
| CMOVLE r32, r/m32 | 0F 4E /r | Move if less or equal (ZF=1 or SF≠ OF). | |
| CMOVLE r64, r/m64 | REX.W + 0F 4E /r | Move if less or equal (ZF=1 or SF≠ OF). | |
| CMOVNA r16, r/m16 | 0F 46 /r | Move if not above (CF=1 or ZF=1). | |
| CMOVNA r32, r/m32 | 0F 46 /r | Move if not above (CF=1 or ZF=1). | |
| CMOVNA r64, r/m64 | REX.W + 0F 46 /r | Move if not above (CF=1 or ZF=1). | |
| CMOVNAE r16, r/m16 | 0F 42 /r | Move if not above or equal (CF=1). | |
| CMOVNAE r32, r/m32 | 0F 42 /r | Move if not above or equal (CF=1). | |
| CMOVNAE r64, r/m64 | REX.W + 0F 42 /r | Move if not above or equal (CF=1). | |
| CMOVNB r16, r/m16 | 0F 43 /r | Move if not below (CF=0). | |
| CMOVNB r32, r/m32 | 0F 43 /r | Move if not below (CF=0). | |
| CMOVNB r64, r/m64 | REX.W + 0F 43 /r | Move if not below (CF=0). | |
| CMOVNBE r16, r/m16 | 0F 47 /r | Move if not below or equal (CF=0 and ZF=0). | |
| CMOVNBE r32, r/m32 | 0F 47 /r | Move if not below or equal (CF=0 and ZF=0). | |
| CMOVNBE r64, r/m64 | REX.W + 0F 47 /r | Move if not below or equal (CF=0 and ZF=0). | |
| CMOVNC r16, r/m16 | 0F 43 /r | Move if not carry (CF=0). | |
| CMOVNC r32, r/m32 | 0F 43 /r | Move if not carry (CF=0). | |
| CMOVNC r64, r/m64 | REX.W + 0F 43 /r | Move if not carry (CF=0). | |
| CMOVNE r16, r/m16 | 0F 45 /r | Move if not equal (ZF=0). | |
| CMOVNE r32, r/m32 | 0F 45 /r | Move if not equal (ZF=0). | |
| CMOVNE r64, r/m64 | REX.W + 0F 45 /r | Move if not equal (ZF=0). | |
| CMOVNG r16, r/m16 | 0F 4E /r | Move if not greater (ZF=1 or SF≠ OF). | |
| CMOVNG r32, r/m32 | 0F 4E /r | Move if not greater (ZF=1 or SF≠ OF). | |
| CMOVNG r64, r/m64 | REX.W + 0F 4E /r | Move if not greater (ZF=1 or SF≠ OF). | |
| CMOVNGE r16, r/m16 | 0F 4C /r | Move if not greater or equal (SF≠ OF). | |
| CMOVNGE r32, r/m32 | 0F 4C /r | Move if not greater or equal (SF≠ OF). | |
| CMOVNGE r64, r/m64 | REX.W + 0F 4C /r | Move if not greater or equal (SF≠ OF). | |
| CMOVNL r16, r/m16 | 0F 4D /r | Move if not less (SF=OF). | |
| CMOVNL r32, r/m32 | 0F 4D /r | Move if not less (SF=OF). | |
| CMOVNL r64, r/m64 | REX.W + 0F 4D /r | Move if not less (SF=OF). | |
| CMOVNLE r16, r/m16 | 0F 4F /r | Move if not less or equal (ZF=0 and SF=OF). | |
| CMOVNLE r32, r/m32 | 0F 4F /r | Move if not less or equal (ZF=0 and SF=OF). | |
| CMOVNLE r64, r/m64 | REX.W + 0F 4F /r | Move if not less or equal (ZF=0 and SF=OF). | |
| CMOVNO r16, r/m16 | 0F 41 /r | Move if not overflow (OF=0). | |
| CMOVNO r32, r/m32 | 0F 41 /r | Move if not overflow (OF=0). | |
| CMOVNO r64, r/m64 | REX.W + 0F 41 /r | Move if not overflow (OF=0). | |
| CMOVNP r16, r/m16 | 0F 4B /r | Move if not parity (PF=0). | |
| CMOVNP r32, r/m32 | 0F 4B /r | Move if not parity (PF=0). | |
| CMOVNP r64, r/m64 | REX.W + 0F 4B /r | Move if not parity (PF=0). | |
| CMOVNS r16, r/m16 | 0F 49 /r | Move if not sign (SF=0). | |
| CMOVNS r32, r/m32 | 0F 49 /r | Move if not sign (SF=0). | |
| CMOVNS r64, r/m64 | REX.W + 0F 49 /r | Move if not sign (SF=0). | |
| CMOVNZ r16, r/m16 | 0F 45 /r | Move if not zero (ZF=0). | |
| CMOVNZ r32, r/m32 | 0F 45 /r | Move if not zero (ZF=0). | |
| CMOVNZ r64, r/m64 | REX.W + 0F 45 /r | Move if not zero (ZF=0). | |
| CMOVO r16, r/m16 | 0F 40 /r | Move if overflow (OF=1). | |
| CMOVO r32, r/m32 | 0F 40 /r | Move if overflow (OF=1). | |
| CMOVO r64, r/m64 | REX.W + 0F 40 /r | Move if overflow (OF=1). | |
| CMOVP r16, r/m16 | 0F 4A /r | Move if parity (PF=1). | |
| CMOVP r32, r/m32 | 0F 4A /r | Move if parity (PF=1). | |
| CMOVP r64, r/m64 | REX.W + 0F 4A /r | Move if parity (PF=1). | |
| CMOVPE r16, r/m16 | 0F 4A /r | Move if parity even (PF=1). | |
| CMOVPE r32, r/m32 | 0F 4A /r | Move if parity even (PF=1). | |
| CMOVPE r64, r/m64 | REX.W + 0F 4A /r | Move if parity even (PF=1). | |
| CMOVPO r16, r/m16 | 0F 4B /r | Move if parity odd (PF=0). | |
| CMOVPO r32, r/m32 | 0F 4B /r | Move if parity odd (PF=0). | |
| CMOVPO r64, r/m64 | REX.W + 0F 4B /r | Move if parity odd (PF=0). | |
| CMOVS r16, r/m16 | 0F 48 /r | Move if sign (SF=1). | |
| CMOVS r32, r/m32 | 0F 48 /r | Move if sign (SF=1). | |
| CMOVS r64, r/m64 | REX.W + 0F 48 /r | Move if sign (SF=1). | |
| CMOVZ r16, r/m16 | 0F 44 /r | Move if zero (ZF=1). | |
| CMOVZ r32, r/m32 | 0F 44 /r | Move if zero (ZF=1). | |
| CMOVZ r64, r/m64 | REX.W + 0F 44 /r | Move if zero (ZF=1). | |
| CMP AL, imm8 | 3C ib | Compare imm8 with AL. | |
| CMP AX, imm16 | 3D iw | Compare imm16 with AX. | |
| CMP EAX, imm32 | 3D id | Compare imm32 with EAX. | |
| CMP RAX, imm32 | REX.W + 3D id | Compare imm32 sign-extended to 64-bits with RAX. | |
| CMP r/m8, imm8 | 80 /7 ib | Compare imm8 with r/m8. | |
| CMP r/m8, imm8 | REX + 80 /7 ib | Compare imm8 with r/m8. | |
| CMP r/m16, imm16 | 81 /7 iw | Compare imm16 with r/m16. | |
| CMP r/m32, imm32 | 81 /7 id | Compare imm32 with r/m32. | |
| CMP r/m64, imm32 | REX.W + 81 /7 id | Compare imm32 sign-extended to 64-bits with r/m64. | |
| CMP r/m16, imm8 | 83 /7 ib | Compare imm8 with r/m16. | |
| CMP r/m32, imm8 | 83 /7 ib | Compare imm8 with r/m32. | |
| CMP r/m64, imm8 | REX.W + 83 /7 ib | Compare imm8 with r/m64. | |
| CMP r/m8, r8 | 38 /r | Compare r8 with r/m8. | |
| CMP r/m8, r8 | REX + 38 /r | Compare r8 with r/m8. | |
| CMP r/m16, r16 | 39 /r | Compare r16 with r/m16. | |
| CMP r/m32, r32 | 39 /r | Compare r32 with r/m32. | |
| CMP r/m64,r64 | REX.W + 39 /r | Compare r64 with r/m64. | |
| CMP r8, r/m8 | 3A /r | Compare r/m8 with r8. | |
| CMP r8, r/m8 | REX + 3A /r | Compare r/m8 with r8. | |
| CMP r16, r/m16 | 3B /r | Compare r/m16 with r16. | |
| CMP r32, r/m32 | 3B /r | Compare r/m32 with r32. | |
| CMP r64, r/m64 | REX.W + 3B /r | Compare r/m64 with r64. | |
| CMPPD xmm1, xmm2/m128, imm8 | 66 0F C2 /r ib | sse2 | Compare packed double-precision floating-point values in xmm2/m128 and xmm1 using bits 2:0 of imm8 as a comparison predicate. | 
| VCMPPD xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F.WIG C2 /r ib | avx | Compare packed double-precision floating-point values in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a comparison predicate. | 
| VCMPPD ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F.WIG C2 /r ib | avx | Compare packed double-precision floating-point values in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a comparison predicate. | 
| VCMPPD k1 {k2}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F.W1 C2 /r ib | avx512 | Compare packed double-precision floating-point values in xmm3/m128/m64bcst and xmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VCMPPD k1 {k2}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F.W1 C2 /r ib | avx512 | Compare packed double-precision floating-point values in ymm3/m256/m64bcst and ymm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VCMPPD k1 {k2}, zmm2, zmm3/m512/m64bcst{sae}, imm8 | EVEX.NDS.512.66.0F.W1 C2 /r ib | avx512 | Compare packed double-precision floating-point values in zmm3/m512/m64bcst and zmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| CMPPS xmm1, xmm2/m128, imm8 | 0F C2 /r ib | sse | Compare packed single-precision floating-point values in xmm2/m128 and xmm1 using bits 2:0 of imm8 as a comparison predicate. | 
| VCMPPS xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.0F.WIG C2 /r ib | avx | Compare packed single-precision floating-point values in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a comparison predicate. | 
| VCMPPS ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.0F.WIG C2 /r ib | avx | Compare packed single-precision floating-point values in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a comparison predicate. | 
| VCMPPS k1 {k2}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.0F.W0 C2 /r ib | avx512 | Compare packed single-precision floating-point values in xmm3/m128/m32bcst and xmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VCMPPS k1 {k2}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.0F.W0 C2 /r ib | avx512 | Compare packed single-precision floating-point values in ymm3/m256/m32bcst and ymm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VCMPPS k1 {k2}, zmm2, zmm3/m512/m32bcst{sae}, imm8 | EVEX.NDS.512.0F.W0 C2 /r ib | avx512 | Compare packed single-precision floating-point values in zmm3/m512/m32bcst and zmm2 using bits 4:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| CMPSD xmm1, xmm2/m64, imm8 | F2 0F C2 /r ib | sse2 | Compare low double-precision floating-point value in xmm2/m64 and xmm1 using bits 2:0 of imm8 as comparison predicate. | 
| VCMPSD xmm1, xmm2, xmm3/m64, imm8 | VEX.NDS.128.F2.0F.WIG C2 /r ib | avx | Compare low double-precision floating-point value in xmm3/m64 and xmm2 using bits 4:0 of imm8 as comparison predicate. | 
| VCMPSD k1 {k2}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.F2.0F.W1 C2 /r ib | avx512 | Compare low double-precision floating-point value in xmm3/m64 and xmm2 using bits 4:0 of imm8 as comparison predicate with writemask k2 and leave the result in mask register k1. | 
| CMPSS xmm1, xmm2/m32, imm8 | F3 0F C2 /r ib | sse | Compare low single-precision floating-point value in xmm2/m32 and xmm1 using bits 2:0 of imm8 as comparison predicate. | 
| VCMPSS xmm1, xmm2, xmm3/m32, imm8 | VEX.NDS.128.F3.0F.WIG C2 /r ib | avx | Compare low single-precision floating-point value in xmm3/m32 and xmm2 using bits 4:0 of imm8 as comparison predicate. | 
| VCMPSS k1 {k2}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.F3.0F.W0 C2 /r ib | avx512 | Compare low single-precision floating-point value in xmm3/m32 and xmm2 using bits 4:0 of imm8 as comparison predicate with writemask k2 and leave the result in mask register k1. | 
| CMPS m8, m8 | A6 | For legacy mode, compare byte at address DS:(E)SI with byte at address ES:(E)DI; For 64-bit mode compare byte at address (R|E)SI to byte at address (R|E)DI. The status flags are set accordingly. | |
| CMPS m16, m16 | A7 | For legacy mode, compare word at address DS:(E)SI with word at address ES:(E)DI; For 64-bit mode compare word at address (R|E)SI with word at address (R|E)DI. The status flags are set accordingly. | |
| CMPS m32, m32 | A7 | For legacy mode, compare dword at address DS:(E)SI at dword at address ES:(E)DI; For 64-bit mode compare dword at address (R|E)SI at dword at address (R|E)DI. The status flags are set accordingly. | |
| CMPS m64, m64 | REX.W + A7 | Compares quadword at address (R|E)SI with quadword at address (R|E)DI and sets the status flags accordingly. | |
| CMPSB | A6 | For legacy mode, compare byte at address DS:(E)SI with byte at address ES:(E)DI; For 64-bit mode compare byte at address (R|E)SI with byte at address (R|E)DI. The status flags are set accordingly. | |
| CMPSW | A7 | For legacy mode, compare word at address DS:(E)SI with word at address ES:(E)DI; For 64-bit mode compare word at address (R|E)SI with word at address (R|E)DI. The status flags are set accordingly. | |
| CMPSD | A7 | For legacy mode, compare dword at address DS:(E)SI with dword at address ES:(E)DI; For 64-bit mode compare dword at address (R|E)SI with dword at address (R|E)DI. The status flags are set accordingly. | |
| CMPSQ | REX.W + A7 | Compares quadword at address (R|E)SI with quadword at address (R|E)DI and sets the status flags accordingly. | |
| CMPXCHG r/m8, r8 | 0F B0/r | Compare AL with r/m8. If equal, ZF is set and r8 is loaded into r/m8. Else, clear ZF and load r/m8 into AL. | |
| CMPXCHG r/m8,r8 | REX + 0F B0/r | Compare AL with r/m8. If equal, ZF is set and r8 is loaded into r/m8. Else, clear ZF and load r/m8 into AL. | |
| CMPXCHG r/m16, r16 | 0F B1/r | Compare AX with r/m16. If equal, ZF is set and r16 is loaded into r/m16. Else, clear ZF and load r/m16 into AX. | |
| CMPXCHG r/m32, r32 | 0F B1/r | Compare EAX with r/m32. If equal, ZF is set and r32 is loaded into r/m32. Else, clear ZF and load r/m32 into EAX. | |
| CMPXCHG r/m64, r64 | REX.W + 0F B1/r | Compare RAX with r/m64. If equal, ZF is set and r64 is loaded into r/m64. Else, clear ZF and load r/m64 into RAX. | |
| CMPXCHG8B m64 | 0F C7 /1 m64 | Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. | |
| CMPXCHG16B m128 | REX.W + 0F C7 /1 m128 | Compare RDX:RAX with m128. If equal, set ZF and load RCX:RBX into m128. Else, clear ZF and load m128 into RDX:RAX. | |
| COMISD xmm1, xmm2/m64 | 66 0F 2F /r | sse2 | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | 
| VCOMISD xmm1, xmm2/m64 | VEX.128.66.0F.WIG 2F /r | avx | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | 
| VCOMISD xmm1, xmm2/m64{sae} | EVEX.LIG.66.0F.W1 2F /r | avx512 | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | 
| COMISS xmm1, xmm2/m32 | 0F 2F /r | sse | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| VCOMISS xmm1, xmm2/m32 | VEX.128.0F.WIG 2F /r | avx | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| VCOMISS xmm1, xmm2/m32{sae} | EVEX.LIG.0F.W0 2F /r | avx512 | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| CPUID | 0F A2 | Returns processor identification and feature information to the EAX, EBX, ECX, and EDX registers, as determined by input entered in EAX (in some cases, ECX as well). | |
| CRC32 r32, r/m8 | F2 0F 38 F0 /r | Accumulate CRC32 on r/m8. | |
| CRC32 r32, r/m8 | F2 REX 0F 38 F0 /r | Accumulate CRC32 on r/m8. | |
| CRC32 r32, r/m16 | F2 0F 38 F1 /r | Accumulate CRC32 on r/m16. | |
| CRC32 r32, r/m32 | F2 0F 38 F1 /r | Accumulate CRC32 on r/m32. | |
| CRC32 r64, r/m8 | F2 REX.W 0F 38 F0 /r | Accumulate CRC32 on r/m8. | |
| CRC32 r64, r/m64 | F2 REX.W 0F 38 F1 /r | Accumulate CRC32 on r/m64. | |
| CVTDQ2PD xmm1, xmm2/m64 | F3 0F E6 /r | sse2 | Convert two packed signed doubleword integers from xmm2/mem to two packed double-precision floating-point values in xmm1. | 
| VCVTDQ2PD xmm1, xmm2/m64 | VEX.128.F3.0F.WIG E6 /r | avx | Convert two packed signed doubleword integers from xmm2/mem to two packed double-precision floating-point values in xmm1. | 
| VCVTDQ2PD ymm1, xmm2/m128 | VEX.256.F3.0F.WIG E6 /r | avx | Convert four packed signed doubleword integers from xmm2/mem to four packed double-precision floating-point values in ymm1. | 
| VCVTDQ2PD xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.F3.0F.W0 E6 /r | avx512 | Convert 2 packed signed doubleword integers from xmm2/m128/m32bcst to eight packed double-precision floating-point values in xmm1 with writemask k1. | 
| VCVTDQ2PD ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.F3.0F.W0 E6 /r | avx512 | Convert 4 packed signed doubleword integers from xmm2/m128/m32bcst to 4 packed double-precision floating-point values in ymm1 with writemask k1. | 
| VCVTDQ2PD zmm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.512.F3.0F.W0 E6 /r | avx512 | Convert eight packed signed doubleword integers from ymm2/m256/m32bcst to eight packed double-precision floating-point values in zmm1 with writemask k1. | 
| CVTDQ2PS xmm1, xmm2/m128 | 0F 5B /r | sse2 | Convert four packed signed doubleword integers from xmm2/mem to four packed single-precision floating-point values in xmm1. | 
| VCVTDQ2PS xmm1, xmm2/m128 | VEX.128.0F.WIG 5B /r | avx | Convert four packed signed doubleword integers from xmm2/mem to four packed single-precision floating-point values in xmm1. | 
| VCVTDQ2PS ymm1, ymm2/m256 | VEX.256.0F.WIG 5B /r | avx | Convert eight packed signed doubleword integers from ymm2/mem to eight packed single-precision floating-point values in ymm1. | 
| VCVTDQ2PS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.0F.W0 5B /r | avx512 | Convert four packed signed doubleword integers from xmm2/m128/m32bcst to four packed single-precision floating-point values in xmm1with writemask k1. | 
| VCVTDQ2PS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.0F.W0 5B /r | avx512 | Convert eight packed signed doubleword integers from ymm2/m256/m32bcst to eight packed single-precision floating-point values in ymm1with writemask k1. | 
| VCVTDQ2PS zmm1 {k1}{z}, zmm2/m512/m32bcst{er} | EVEX.512.0F.W0 5B /r | avx512 | Convert sixteen packed signed doubleword integers from zmm2/m512/m32bcst to sixteen packed single-precision floating-point values in zmm1with writemask k1. | 
| CVTPD2DQ xmm1, xmm2/m128 | F2 0F E6 /r | sse2 | Convert two packed double-precision floating-point values in xmm2/mem to two signed doubleword integers in xmm1. | 
| VCVTPD2DQ xmm1, xmm2/m128 | VEX.128.F2.0F.WIG E6 /r | avx | Convert two packed double-precision floating-point values in xmm2/mem to two signed doubleword integers in xmm1. | 
| VCVTPD2DQ xmm1, ymm2/m256 | VEX.256.F2.0F.WIG E6 /r | avx | Convert four packed double-precision floating-point values in ymm2/mem to four signed doubleword integers in xmm1. | 
| VCVTPD2DQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.F2.0F.W1 E6 /r | avx512 | Convert two packed double-precision floating-point values in xmm2/m128/m64bcst to two signed doubleword integers in xmm1 subject to writemask k1. | 
| VCVTPD2DQ xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.F2.0F.W1 E6 /r | avx512 | Convert four packed double-precision floating-point values in ymm2/m256/m64bcst to four signed doubleword integers in xmm1 subject to writemask k1. | 
| VCVTPD2DQ ymm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.F2.0F.W1 E6 /r | avx512 | Convert eight packed double-precision floating-point values in zmm2/m512/m64bcst to eight signed doubleword integers in ymm1 subject to writemask k1. | 
| CVTPD2PI mm, xmm/m128 | 66 0F 2D /r | Convert two packed double-precision floating-point values from xmm/m128 to two packed signed doubleword integers in mm. | |
| CVTPD2PS xmm1, xmm2/m128 | 66 0F 5A /r | sse2 | Convert two packed double-precision floating-point values in xmm2/mem to two single-precision floating-point values in xmm1. | 
| VCVTPD2PS xmm1, xmm2/m128 | VEX.128.66.0F.WIG 5A /r | avx | Convert two packed double-precision floating-point values in xmm2/mem to two single-precision floating-point values in xmm1. | 
| VCVTPD2PS xmm1, ymm2/m256 | VEX.256.66.0F.WIG 5A /r | avx | Convert four packed double-precision floating-point values in ymm2/mem to four single-precision floating-point values in xmm1. | 
| VCVTPD2PS xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 5A /r | avx512 | Convert two packed double-precision floating-point values in xmm2/m128/m64bcst to two single-precision floating-point values in xmm1with writemask k1. | 
| VCVTPD2PS xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 5A /r | avx512 | Convert four packed double-precision floating-point values in ymm2/m256/m64bcst to four single-precision floating-point values in xmm1with writemask k1. | 
| VCVTPD2PS ymm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.66.0F.W1 5A /r | avx512 | Convert eight packed double-precision floating-point values in zmm2/m512/m64bcst to eight single-precision floating-point values in ymm1with writemask k1. | 
| CVTPI2PD xmm, mm/m64 | 66 0F 2A /r | Convert two packed signed doubleword integers from mm/mem64 to two packed double-precision floating-point values in xmm. | |
| CVTPI2PS xmm, mm/m64 | 0F 2A /r | Convert two signed doubleword integers from mm/m64 to two single-precision floating-point values in xmm. | |
| CVTPS2DQ xmm1, xmm2/m128 | 66 0F 5B /r | sse2 | Convert four packed single-precision floating-point values from xmm2/mem to four packed signed doubleword values in xmm1. | 
| VCVTPS2DQ xmm1, xmm2/m128 | VEX.128.66.0F.WIG 5B /r | avx | Convert four packed single-precision floating-point values from xmm2/mem to four packed signed doubleword values in xmm1. | 
| VCVTPS2DQ ymm1, ymm2/m256 | VEX.256.66.0F.WIG 5B /r | avx | Convert eight packed single-precision floating-point values from ymm2/mem to eight packed signed doubleword values in ymm1. | 
| VCVTPS2DQ xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F.W0 5B /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed signed doubleword values in xmm1 subject to writemask k1. | 
| VCVTPS2DQ ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F.W0 5B /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed signed doubleword values in ymm1 subject to writemask k1. | 
| VCVTPS2DQ zmm1 {k1}{z}, zmm2/m512/m32bcst{er} | EVEX.512.66.0F.W0 5B /r | avx512 | Convert sixteen packed single-precision floating-point values from zmm2/m512/m32bcst to sixteen packed signed doubleword values in zmm1 subject to writemask k1. | 
| CVTPS2PD xmm1, xmm2/m64 | 0F 5A /r | sse2 | Convert two packed single-precision floating-point values in xmm2/m64 to two packed double-precision floating-point values in xmm1. | 
| VCVTPS2PD xmm1, xmm2/m64 | VEX.128.0F.WIG 5A /r | avx | Convert two packed single-precision floating-point values in xmm2/m64 to two packed double-precision floating-point values in xmm1. | 
| VCVTPS2PD ymm1, xmm2/m128 | VEX.256.0F.WIG 5A /r | avx | Convert four packed single-precision floating-point values in xmm2/m128 to four packed double-precision floating-point values in ymm1. | 
| VCVTPS2PD xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.0F.W0 5A /r | avx512 | Convert two packed single-precision floating-point values in xmm2/m64/m32bcst to packed double-precision floating-point values in xmm1 with writemask k1. | 
| VCVTPS2PD ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.0F.W0 5A /r | avx512 | Convert four packed single-precision floating-point values in xmm2/m128/m32bcst to packed double-precision floating-point values in ymm1 with writemask k1. | 
| VCVTPS2PD zmm1 {k1}{z}, ymm2/m256/m32bcst{sae} | EVEX.512.0F.W0 5A /r | avx512 | Convert eight packed single-precision floating-point values in ymm2/m256/b32bcst to eight packed double-precision floating-point values in zmm1 with writemask k1. | 
| CVTPS2PI mm, xmm/m64 | 0F 2D /r | Convert two packed single-precision floating-point values from xmm/m64 to two packed signed doubleword integers in mm. | |
| CVTSD2SI r32, xmm1/m64 | F2 0F 2D /r | sse2 | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer r32. | 
| CVTSD2SI r64, xmm1/m64 | F2 REX.W 0F 2D /r | sse2 | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer sign-extended into r64. | 
| VCVTSD2SI r32, xmm1/m64 | VEX.128.F2.0F.W0 2D /r | avx | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer r32. | 
| VCVTSD2SI r64, xmm1/m64 | VEX.128.F2.0F.W1 2D /r | avx | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer sign-extended into r64. | 
| VCVTSD2SI r32, xmm1/m64{er} | EVEX.LIG.F2.0F.W0 2D /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer r32. | 
| VCVTSD2SI r64, xmm1/m64{er} | EVEX.LIG.F2.0F.W1 2D /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer sign-extended into r64. | 
| CVTSD2SS xmm1, xmm2/m64 | F2 0F 5A /r | sse2 | Convert one double-precision floating-point value in xmm2/m64 to one single-precision floating-point value in xmm1. | 
| VCVTSD2SS xmm1,xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 5A /r | avx | Convert one double-precision floating-point value in xmm3/m64 to one single-precision floating-point value and merge with high bits in xmm2. | 
| VCVTSD2SS xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.F2.0F.W1 5A /r | avx512 | Convert one double-precision floating-point value in xmm3/m64 to one single-precision floating-point value and merge with high bits in xmm2 under writemask k1. | 
| CVTSI2SD xmm1, r32/m32 | F2 0F 2A /r | sse2 | Convert one signed doubleword integer from r32/m32 to one double-precision floating-point value in xmm1. | 
| CVTSI2SD xmm1, r/m64 | F2 REX.W 0F 2A /r | sse2 | Convert one signed quadword integer from r/m64 to one double-precision floating-point value in xmm1. | 
| VCVTSI2SD xmm1, xmm2, r/m32 | VEX.NDS.128.F2.0F.W0 2A /r | avx | Convert one signed doubleword integer from r/m32 to one double-precision floating-point value in xmm1. | 
| VCVTSI2SD xmm1, xmm2, r/m64 | VEX.NDS.128.F2.0F.W1 2A /r | avx | Convert one signed quadword integer from r/m64 to one double-precision floating-point value in xmm1. | 
| VCVTSI2SD xmm1, xmm2, r/m32 | EVEX.NDS.LIG.F2.0F.W0 2A /r | avx512 | Convert one signed doubleword integer from r/m32 to one double-precision floating-point value in xmm1. | 
| VCVTSI2SD xmm1, xmm2, r/m64{er} | EVEX.NDS.LIG.F2.0F.W1 2A /r | avx512 | Convert one signed quadword integer from r/m64 to one double-precision floating-point value in xmm1. | 
| CVTSI2SS xmm1, r/m32 | F3 0F 2A /r | sse | Convert one signed doubleword integer from r/m32 to one single-precision floating-point value in xmm1. | 
| CVTSI2SS xmm1, r/m64 | F3 REX.W 0F 2A /r | sse | Convert one signed quadword integer from r/m64 to one single-precision floating-point value in xmm1. | 
| VCVTSI2SS xmm1, xmm2, r/m32 | VEX.NDS.128.F3.0F.W0 2A /r | avx | Convert one signed doubleword integer from r/m32 to one single-precision floating-point value in xmm1. | 
| VCVTSI2SS xmm1, xmm2, r/m64 | VEX.NDS.128.F3.0F.W1 2A /r | avx | Convert one signed quadword integer from r/m64 to one single-precision floating-point value in xmm1. | 
| VCVTSI2SS xmm1, xmm2, r/m32{er} | EVEX.NDS.LIG.F3.0F.W0 2A /r | avx512 | Convert one signed doubleword integer from r/m32 to one single-precision floating-point value in xmm1. | 
| VCVTSI2SS xmm1, xmm2, r/m64{er} | EVEX.NDS.LIG.F3.0F.W1 2A /r | avx512 | Convert one signed quadword integer from r/m64 to one single-precision floating-point value in xmm1. | 
| CVTSS2SD xmm1, xmm2/m32 | F3 0F 5A /r | sse2 | Convert one single-precision floating-point value in xmm2/m32 to one double-precision floating-point value in xmm1. | 
| VCVTSS2SD xmm1, xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 5A /r | avx | Convert one single-precision floating-point value in xmm3/m32 to one double-precision floating-point value and merge with high bits of xmm2. | 
| VCVTSS2SD xmm1 {k1}{z}, xmm2, xmm3/m32{sae} | EVEX.NDS.LIG.F3.0F.W0 5A /r | avx512 | Convert one single-precision floating-point value in xmm3/m32 to one double-precision floating-point value and merge with high bits of xmm2 under writemask k1. | 
| CVTSS2SI r32, xmm1/m32 | F3 0F 2D /r | sse | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32. | 
| CVTSS2SI r64, xmm1/m32 | F3 REX.W 0F 2D /r | sse | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64. | 
| VCVTSS2SI r32, xmm1/m32 | VEX.128.F3.0F.W0 2D /r | avx | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32. | 
| VCVTSS2SI r64, xmm1/m32 | VEX.128.F3.0F.W1 2D /r | avx | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64. | 
| VCVTSS2SI r32, xmm1/m32{er} | EVEX.LIG.F3.0F.W0 2D /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32. | 
| VCVTSS2SI r64, xmm1/m32{er} | EVEX.LIG.F3.0F.W1 2D /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64. | 
| CVTTPD2DQ xmm1, xmm2/m128 | 66 0F E6 /r | sse2 | Convert two packed double-precision floating-point values in xmm2/mem to two signed doubleword integers in xmm1 using truncation. | 
| VCVTTPD2DQ xmm1, xmm2/m128 | VEX.128.66.0F.WIG E6 /r | avx | Convert two packed double-precision floating-point values in xmm2/mem to two signed doubleword integers in xmm1 using truncation. | 
| VCVTTPD2DQ xmm1, ymm2/m256 | VEX.256.66.0F.WIG E6 /r | avx | Convert four packed double-precision floating-point values in ymm2/mem to four signed doubleword integers in xmm1 using truncation. | 
| VCVTTPD2DQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 E6 /r | avx512 | Convert two packed double-precision floating-point values in xmm2/m128/m64bcst to two signed doubleword integers in xmm1 using truncation subject to writemask k1. | 
| VCVTTPD2DQ xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 E6 /r | avx512 | Convert four packed double-precision floating-point values in ymm2/m256/m64bcst to four signed doubleword integers in xmm1 using truncation subject to writemask k1. | 
| VCVTTPD2DQ ymm1 {k1}{z}, zmm2/m512/m64bcst{sae} | EVEX.512.66.0F.W1 E6 /r | avx512 | Convert eight packed double-precision floating-point values in zmm2/m512/m64bcst to eight signed doubleword integers in ymm1 using truncation subject to writemask k1. | 
| CVTTPD2PI mm, xmm/m128 | 66 0F 2C /r | Convert two packer double-precision floating-point values from xmm/m128 to two packed signed doubleword integers in mm using truncation. | |
| CVTTPS2DQ xmm1, xmm2/m128 | F3 0F 5B /r | sse2 | Convert four packed single-precision floating-point values from xmm2/mem to four packed signed doubleword values in xmm1 using truncation. | 
| VCVTTPS2DQ xmm1, xmm2/m128 | VEX.128.F3.0F.WIG 5B /r | avx | Convert four packed single-precision floating-point values from xmm2/mem to four packed signed doubleword values in xmm1 using truncation. | 
| VCVTTPS2DQ ymm1, ymm2/m256 | VEX.256.F3.0F.WIG 5B /r | avx | Convert eight packed single-precision floating-point values from ymm2/mem to eight packed signed doubleword values in ymm1 using truncation. | 
| VCVTTPS2DQ xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.F3.0F.W0 5B /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed signed doubleword values in xmm1 using truncation subject to writemask k1. | 
| VCVTTPS2DQ ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.F3.0F.W0 5B /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed signed doubleword values in ymm1 using truncation subject to writemask k1. | 
| VCVTTPS2DQ zmm1 {k1}{z}, zmm2/m512/m32bcst {sae} | EVEX.512.F3.0F.W0 5B /r | avx512 | Convert sixteen packed single-precision floating-point values from zmm2/m512/m32bcst to sixteen packed signed doubleword values in zmm1 using truncation subject to writemask k1. | 
| CVTTPS2PI mm, xmm/m64 | 0F 2C /r | Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm using truncation. | |
| CVTTSD2SI r32, xmm1/m64 | F2 0F 2C /r | sse2 | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer in r32 using truncation. | 
| CVTTSD2SI r64, xmm1/m64 | F2 REX.W 0F 2C /r | sse2 | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer in r64 using truncation. | 
| VCVTTSD2SI r32, xmm1/m64 | VEX.128.F2.0F.W0 2C /r | avx | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer in r32 using truncation. | 
| VCVTTSD2SI r64, xmm1/m64 | VEX.128.F2.0F.W1 2C /r | avx | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer in r64 using truncation. | 
| VCVTTSD2SI r32, xmm1/m64{sae} | EVEX.LIG.F2.0F.W0 2C /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one signed doubleword integer in r32 using truncation. | 
| VCVTTSD2SI r64, xmm1/m64{sae} | EVEX.LIG.F2.0F.W1 2C /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one signed quadword integer in r64 using truncation. | 
| CVTTSS2SI r32, xmm1/m32 | F3 0F 2C /r | sse | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32 using truncation. | 
| CVTTSS2SI r64, xmm1/m32 | F3 REX.W 0F 2C /r | sse | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64 using truncation. | 
| VCVTTSS2SI r32, xmm1/m32 | VEX.128.F3.0F.W0 2C /r | avx | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32 using truncation. | 
| VCVTTSS2SI r64, xmm1/m32 | VEX.128.F3.0F.W1 2C /r | avx | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64 using truncation. | 
| VCVTTSS2SI r32, xmm1/m32{sae} | EVEX.LIG.F3.0F.W0 2C /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one signed doubleword integer in r32 using truncation. | 
| VCVTTSS2SI r64, xmm1/m32{sae} | EVEX.LIG.F3.0F.W1 2C /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one signed quadword integer in r64 using truncation. | 
| CWD | 99 | DX:AX ← sign-extend of AX. | |
| CDQ | 99 | EDX:EAX ← sign-extend of EAX. | |
| CQO | REX.W + 99 | RDX:RAX← sign-extend of RAX. | |
| DAA | 27 | Decimal adjust AL after addition. | |
| DAS | 2F | Decimal adjust AL after subtraction. | |
| DEC r/m8 | FE /1 | Decrement r/m8 by 1. | |
| DEC r/m8 | REX + FE /1 | Decrement r/m8 by 1. | |
| DEC r/m16 | FF /1 | Decrement r/m16 by 1. | |
| DEC r/m32 | FF /1 | Decrement r/m32 by 1. | |
| DEC r/m64 | REX.W + FF /1 | Decrement r/m64 by 1. | |
| DEC r16 | 48+rw | Decrement r16 by 1. | |
| DEC r32 | 48+rd | Decrement r32 by 1. | |
| DIV r/m8 | F6 /6 | Unsigned divide AX by r/m8, with result stored in AL ← Quotient, AH ← Remainder. | |
| DIV r/m8 | REX + F6 /6 | Unsigned divide AX by r/m8, with result stored in AL ← Quotient, AH ← Remainder. | |
| DIV r/m16 | F7 /6 | Unsigned divide DX:AX by r/m16, with result stored in AX ← Quotient, DX ← Remainder. | |
| DIV r/m32 | F7 /6 | Unsigned divide EDX:EAX by r/m32, with result stored in EAX ← Quotient, EDX ← Remainder. | |
| DIV r/m64 | REX.W + F7 /6 | Unsigned divide RDX:RAX by r/m64, with result stored in RAX ← Quotient, RDX ← Remainder. | |
| DIVPD xmm1, xmm2/m128 | 66 0F 5E /r | sse2 | Divide packed double-precision floating-point values in xmm1 by packed double-precision floating-point values in xmm2/mem. | 
| VDIVPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 5E /r | avx | Divide packed double-precision floating-point values in xmm2 by packed double-precision floating-point values in xmm3/mem. | 
| VDIVPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 5E /r | avx | Divide packed double-precision floating-point values in ymm2 by packed double-precision floating-point values in ymm3/mem. | 
| VDIVPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 5E /r | avx512 | Divide packed double-precision floating-point values in xmm2 by packed double-precision floating-point values in xmm3/m128/m64bcst and write results to xmm1 subject to writemask k1. | 
| VDIVPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 5E /r | avx512 | Divide packed double-precision floating-point values in ymm2 by packed double-precision floating-point values in ymm3/m256/m64bcst and write results to ymm1 subject to writemask k1. | 
| VDIVPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F.W1 5E /r | avx512 | Divide packed double-precision floating-point values in zmm2 by packed double-precision FP values in zmm3/m512/m64bcst and write results to zmm1 subject to writemask k1. | 
| DIVPS xmm1, xmm2/m128 | 0F 5E /r | sse | Divide packed single-precision floating-point values in xmm1 by packed single-precision floating-point values in xmm2/mem. | 
| VDIVPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 5E /r | avx | Divide packed single-precision floating-point values in xmm2 by packed single-precision floating-point values in xmm3/mem. | 
| VDIVPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 5E /r | avx | Divide packed single-precision floating-point values in ymm2 by packed single-precision floating-point values in ymm3/mem. | 
| VDIVPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 5E /r | avx512 | Divide packed single-precision floating-point values in xmm2 by packed single-precision floating-point values in xmm3/m128/m32bcst and write results to xmm1 subject to writemask k1. | 
| VDIVPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 5E /r | avx512 | Divide packed single-precision floating-point values in ymm2 by packed single-precision floating-point values in ymm3/m256/m32bcst and write results to ymm1 subject to writemask k1. | 
| VDIVPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.0F.W0 5E /r | avx512 | Divide packed single-precision floating-point values in zmm2 by packed single-precision floating-point values in zmm3/m512/m32bcst and write results to zmm1 subject to writemask k1. | 
| DIVSD xmm1, xmm2/m64 | F2 0F 5E /r | sse2 | Divide low double-precision floating-point value in xmm1 by low double-precision floating-point value in xmm2/m64. | 
| VDIVSD xmm1, xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 5E /r | avx | Divide low double-precision floating-point value in xmm2 by low double-precision floating-point value in xmm3/m64. | 
| VDIVSD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.F2.0F.W1 5E /r | avx512 | Divide low double-precision floating-point value in xmm2 by low double-precision floating-point value in xmm3/m64. | 
| DIVSS xmm1, xmm2/m32 | F3 0F 5E /r | sse | Divide low single-precision floating-point value in xmm1 by low single-precision floating-point value in xmm2/m32. | 
| VDIVSS xmm1, xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 5E /r | avx | Divide low single-precision floating-point value in xmm2 by low single-precision floating-point value in xmm3/m32. | 
| VDIVSS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.NDS.LIG.F3.0F.W0 5E /r | avx512 | Divide low single-precision floating-point value in xmm2 by low single-precision floating-point value in xmm3/m32. | 
| DPPD xmm1, xmm2/m128, imm8 | 66 0F 3A 41 /r ib | sse4.1 | Selectively multiply packed DP floating-point values from xmm1 with packed DP floating-point values from xmm2, add and selectively store the packed DP floating-point values to xmm1. | 
| VDPPD xmm1,xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 41 /r ib | avx | Selectively multiply packed DP floating-point values from xmm2 with packed DP floating-point values from xmm3, add and selectively store the packed DP floating-point values to xmm1. | 
| DPPS xmm1, xmm2/m128, imm8 | 66 0F 3A 40 /r ib | sse4.1 | Selectively multiply packed SP floating-point values from xmm1 with packed SP floating-point values from xmm2, add and selectively store the packed SP floating-point values or zero values to xmm1. | 
| VDPPS xmm1,xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 40 /r ib | avx | Multiply packed SP floating point values from xmm1 with packed SP floating point values from xmm2/mem selectively add and store to xmm1. | 
| VDPPS ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 40 /r ib | avx | Multiply packed single-precision floating-point values from ymm2 with packed SP floating point values from ymm3/mem, selectively add pairs of elements and store to ymm1. | 
| EMMS | 0F 77 | Set the x87 FPU tag word to empty. | |
| ENTER imm16, 0 | C8 iw 00 | Create a stack frame for a procedure. | |
| ENTER imm16,1 | C8 iw 01 | Create a stack frame with a nested pointer for a procedure. | |
| ENTER imm16, imm8 | C8 iw ib | Create a stack frame with nested pointers for a procedure. | |
| EXTRACTPS reg/m32, xmm1, imm8 | 66 0F 3A 17 /r ib | sse4.1 | Extract one single-precision floating-point value from xmm1 at the offset specified by imm8 and store the result in reg or m32. Zero extend the results in 64-bit register if applicable. | 
| VEXTRACTPS reg/m32, xmm1, imm8 | VEX.128.66.0F3A.WIG 17 /r ib | avx | Extract one single-precision floating-point value from xmm1 at the offset specified by imm8 and store the result in reg or m32. Zero extend the results in 64-bit register if applicable. | 
| VEXTRACTPS reg/m32, xmm1, imm8 | EVEX.128.66.0F3A.WIG 17 /r ib | avx512 | Extract one single-precision floating-point value from xmm1 at the offset specified by imm8 and store the result in reg or m32. Zero extend the results in 64-bit register if applicable. | 
| F2XM1 | D9 F0 | Replace ST(0) with (2ST(0) – 1). | |
| FABS | D9 E1 | Replace ST with its absolute value. | |
| FADD m32fp | D8 /0 | Add m32fp to ST(0) and store result in ST(0). | |
| FADD m64fp | DC /0 | Add m64fp to ST(0) and store result in ST(0). | |
| FADD ST(0), ST(i) | D8 C0+i | Add ST(0) to ST(i) and store result in ST(0). | |
| FADD ST(i), ST(0) | DC C0+i | Add ST(i) to ST(0) and store result in ST(i). | |
| FADDP ST(i), ST(0) | DE C0+i | Add ST(0) to ST(i), store result in ST(i), and pop the register stack. | |
| FADDP | DE C1 | Add ST(0) to ST(1), store result in ST(1), and pop the register stack. | |
| FIADD m32int | DA /0 | Add m32int to ST(0) and store result in ST(0). | |
| FIADD m16int | DE /0 | Add m16int to ST(0) and store result in ST(0). | |
| FBLD m80dec | DF /4 | Convert BCD value to floating-point and push onto the FPU stack. | |
| FBSTP m80bcd | DF /6 | Store ST(0) in m80bcd and pop ST(0). | |
| FCHS | D9 E0 | Complements sign of ST(0). | |
| FCLEX | 9B DB E2 | Clear floating-point exception flags after checking for pending unmasked floating-point exceptions. | |
| FNCLEX | DB E2 | Clear floating-point exception flags without checking for pending unmasked floating-point exceptions. | |
| FCMOVB ST(0), ST(i) | DA C0+i | Move if below (CF=1). | |
| FCMOVE ST(0), ST(i) | DA C8+i | Move if equal (ZF=1). | |
| FCMOVBE ST(0), ST(i) | DA D0+i | Move if below or equal (CF=1 or ZF=1). | |
| FCMOVU ST(0), ST(i) | DA D8+i | Move if unordered (PF=1). | |
| FCMOVNB ST(0), ST(i) | DB C0+i | Move if not below (CF=0). | |
| FCMOVNE ST(0), ST(i) | DB C8+i | Move if not equal (ZF=0). | |
| FCMOVNBE ST(0), ST(i) | DB D0+i | Move if not below or equal (CF=0 and ZF=0). | |
| FCMOVNU ST(0), ST(i) | DB D8+i | Move if not unordered (PF=0). | |
| FCOMI ST, ST(i) | DB F0+i | Compare ST(0) with ST(i) and set status flags accordingly. | |
| FCOMIP ST, ST(i) | DF F0+i | Compare ST(0) with ST(i), set status flags accordingly, and pop register stack. | |
| FUCOMI ST, ST(i) | DB E8+i | Compare ST(0) with ST(i), check for ordered values, and set status flags accordingly. | |
| FUCOMIP ST, ST(i) | DF E8+i | Compare ST(0) with ST(i), check for ordered values, set status flags accordingly, and pop register stack. | |
| FCOM m32fp | D8 /2 | Compare ST(0) with m32fp. | |
| FCOM m64fp | DC /2 | Compare ST(0) with m64fp. | |
| FCOM ST(i) | D8 D0+i | Compare ST(0) with ST(i). | |
| FCOM | D8 D1 | Compare ST(0) with ST(1). | |
| FCOMP m32fp | D8 /3 | Compare ST(0) with m32fp and pop register stack. | |
| FCOMP m64fp | DC /3 | Compare ST(0) with m64fp and pop register stack. | |
| FCOMP ST(i) | D8 D8+i | Compare ST(0) with ST(i) and pop register stack. | |
| FCOMP | D8 D9 | Compare ST(0) with ST(1) and pop register stack. | |
| FCOMPP | DE D9 | Compare ST(0) with ST(1) and pop register stack twice. | |
| FCOS | D9 FF | Replace ST(0) with its approximate cosine. | |
| FDECSTP | D9 F6 | Decrement TOP field in FPU status word. | |
| FDIVR m32fp | D8 /7 | Divide m32fp by ST(0) and store result in ST(0). | |
| FDIVR m64fp | DC /7 | Divide m64fp by ST(0) and store result in ST(0). | |
| FDIVR ST(0), ST(i) | D8 F8+i | Divide ST(i) by ST(0) and store result in ST(0). | |
| FDIVR ST(i), ST(0) | DC F0+i | Divide ST(0) by ST(i) and store result in ST(i). | |
| FDIVRP ST(i), ST(0) | DE F0+i | Divide ST(0) by ST(i), store result in ST(i), and pop the register stack. | |
| FDIVRP | DE F1 | Divide ST(0) by ST(1), store result in ST(1), and pop the register stack. | |
| FIDIVR m32int | DA /7 | Divide m32int by ST(0) and store result in ST(0). | |
| FIDIVR m16int | DE /7 | Divide m16int by ST(0) and store result in ST(0). | |
| FDIV m32fp | D8 /6 | Divide ST(0) by m32fp and store result in ST(0). | |
| FDIV m64fp | DC /6 | Divide ST(0) by m64fp and store result in ST(0). | |
| FDIV ST(0), ST(i) | D8 F0+i | Divide ST(0) by ST(i) and store result in ST(0). | |
| FDIV ST(i), ST(0) | DC F8+i | Divide ST(i) by ST(0) and store result in ST(i). | |
| FDIVP ST(i), ST(0) | DE F8+i | Divide ST(i) by ST(0), store result in ST(i), and pop the register stack. | |
| FDIVP | DE F9 | Divide ST(1) by ST(0), store result in ST(1), and pop the register stack. | |
| FIDIV m32int | DA /6 | Divide ST(0) by m32int and store result in ST(0). | |
| FIDIV m16int | DE /6 | Divide ST(0) by m16int and store result in ST(0). | |
| FFREE ST(i) | DD C0+i | Sets tag for ST(i) to empty. | |
| FICOM m16int | DE /2 | Compare ST(0) with m16int. | |
| FICOM m32int | DA /2 | Compare ST(0) with m32int. | |
| FICOMP m16int | DE /3 | Compare ST(0) with m16int and pop stack register. | |
| FICOMP m32int | DA /3 | Compare ST(0) with m32int and pop stack register. | |
| FILD m16int | DF /0 | Push m16int onto the FPU register stack. | |
| FILD m32int | DB /0 | Push m32int onto the FPU register stack. | |
| FILD m64int | DF /5 | Push m64int onto the FPU register stack. | |
| FINCSTP | D9 F7 | Increment the TOP field in the FPU status register. | |
| FINIT | 9B DB E3 | Initialize FPU after checking for pending unmasked floating-point exceptions. | |
| FNINIT | DB E3 | Initialize FPU without checking for pending unmasked floating-point exceptions. | |
| FISTTP m16int | DF /1 | Store ST(0) in m16int with truncation. | |
| FISTTP m32int | DB /1 | Store ST(0) in m32int with truncation. | |
| FISTTP m64int | DD /1 | Store ST(0) in m64int with truncation. | |
| FIST m16int | DF /2 | Store ST(0) in m16int. | |
| FIST m32int | DB /2 | Store ST(0) in m32int. | |
| FISTP m16int | DF /3 | Store ST(0) in m16int and pop register stack. | |
| FISTP m32int | DB /3 | Store ST(0) in m32int and pop register stack. | |
| FISTP m64int | DF /7 | Store ST(0) in m64int and pop register stack. | |
| FLD m32fp | D9 /0 | Push m32fp onto the FPU register stack. | |
| FLD m64fp | DD /0 | Push m64fp onto the FPU register stack. | |
| FLD m80fp | DB /5 | Push m80fp onto the FPU register stack. | |
| FLD ST(i) | D9 C0+i | Push ST(i) onto the FPU register stack. | |
| FLD1 | D9 E8 | Push +1.0 onto the FPU register stack. | |
| FLDL2T | D9 E9 | Push log210 onto the FPU register stack. | |
| FLDL2E | D9 EA | Push log2e onto the FPU register stack. | |
| FLDPI | D9 EB | Push π onto the FPU register stack. | |
| FLDLG2 | D9 EC | Push log102 onto the FPU register stack. | |
| FLDLN2 | D9 ED | Push loge2 onto the FPU register stack. | |
| FLDZ | D9 EE | Push +0.0 onto the FPU register stack. | |
| FLDCW m2byte | D9 /5 | Load FPU control word from m2byte. | |
| FLDENV m14/28byte | D9 /4 | Load FPU environment from m14byte or m28byte. | |
| FMUL m32fp | D8 /1 | Multiply ST(0) by m32fp and store result in ST(0). | |
| FMUL m64fp | DC /1 | Multiply ST(0) by m64fp and store result in ST(0). | |
| FMUL ST(0), ST(i) | D8 C8+i | Multiply ST(0) by ST(i) and store result in ST(0). | |
| FMUL ST(i), ST(0) | DC C8+i | Multiply ST(i) by ST(0) and store result in ST(i). | |
| FMULP ST(i), ST(0) | DE C8+i | Multiply ST(i) by ST(0), store result in ST(i), and pop the register stack. | |
| FMULP | DE C9 | Multiply ST(1) by ST(0), store result in ST(1), and pop the register stack. | |
| FIMUL m32int | DA /1 | Multiply ST(0) by m32int and store result in ST(0). | |
| FIMUL m16int | DE /1 | Multiply ST(0) by m16int and store result in ST(0). | |
| FNOP | D9 D0 | No operation is performed. | |
| FPATAN | D9 F3 | Replace ST(1) with arctan(ST(1)/ST(0)) and pop the register stack. | |
| FPREM | D9 F8 | Replace ST(0) with the remainder obtained from dividing ST(0) by ST(1). | |
| FPREM1 | D9 F5 | Replace ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1). | |
| FPTAN | D9 F2 | Replace ST(0) with its approximate tangent and push 1 onto the FPU stack. | |
| FRNDINT | D9 FC | Round ST(0) to an integer. | |
| FRSTOR m94/108byte | DD /4 | Load FPU state from m94byte or m108byte. | |
| FSAVE m94/108byte | 9B DD /6 | Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
| FNSAVE m94/108byte | DD /6 | Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
| FSCALE | D9 FD | Scale ST(0) by ST(1). | |
| FSIN | D9 FE | Replace ST(0) with the approximate of its sine. | |
| FSINCOS | D9 FB | Compute the sine and cosine of ST(0); replace ST(0) with the approximate sine, and push the approximate cosine onto the register stack. | |
| FSQRT | D9 FA | Computes square root of ST(0) and stores the result in ST(0). | |
| FSTCW m2byte | 9B D9 /7 | Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions. | |
| FNSTCW m2byte | D9 /7 | Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions. | |
| FSTENV m14/28byte | 9B D9 /6 | Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
| FNSTENV m14/28byte | D9 /6 | Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
| FSTSW m2byte | 9B DD /7 | Store FPU status word at m2byte after checking for pending unmasked floating-point exceptions. | |
| FSTSW AX | 9B DF E0 | Store FPU status word in AX register after checking for pending unmasked floating-point exceptions. | |
| FNSTSW m2byte | DD /7 | Store FPU status word at m2byte without checking for pending unmasked floating-point exceptions. | |
| FNSTSW AX | DF E0 | Store FPU status word in AX register without checking for pending unmasked floating-point exceptions. | |
| FST m32fp | D9 /2 | Copy ST(0) to m32fp. | |
| FST m64fp | DD /2 | Copy ST(0) to m64fp. | |
| FST ST(i) | DD D0+i | Copy ST(0) to ST(i). | |
| FSTP m32fp | D9 /3 | Copy ST(0) to m32fp and pop register stack. | |
| FSTP m64fp | DD /3 | Copy ST(0) to m64fp and pop register stack. | |
| FSTP m80fp | DB /7 | Copy ST(0) to m80fp and pop register stack. | |
| FSTP ST(i) | DD D8+i | Copy ST(0) to ST(i) and pop register stack. | |
| FSUBR m32fp | D8 /5 | Subtract ST(0) from m32fp and store result in ST(0). | |
| FSUBR m64fp | DC /5 | Subtract ST(0) from m64fp and store result in ST(0). | |
| FSUBR ST(0), ST(i) | D8 E8+i | Subtract ST(0) from ST(i) and store result in ST(0). | |
| FSUBR ST(i), ST(0) | DC E0+i | Subtract ST(i) from ST(0) and store result in ST(i). | |
| FSUBRP ST(i), ST(0) | DE E0+i | Subtract ST(i) from ST(0), store result in ST(i), and pop register stack. | |
| FSUBRP | DE E1 | Subtract ST(1) from ST(0), store result in ST(1), and pop register stack. | |
| FISUBR m32int | DA /5 | Subtract ST(0) from m32int and store result in ST(0). | |
| FISUBR m16int | DE /5 | Subtract ST(0) from m16int and store result in ST(0). | |
| FSUB m32fp | D8 /4 | Subtract m32fp from ST(0) and store result in ST(0). | |
| FSUB m64fp | DC /4 | Subtract m64fp from ST(0) and store result in ST(0). | |
| FSUB ST(0), ST(i) | D8 E0+i | Subtract ST(i) from ST(0) and store result in ST(0). | |
| FSUB ST(i), ST(0) | DC E8+i | Subtract ST(0) from ST(i) and store result in ST(i). | |
| FSUBP ST(i), ST(0) | DE E8+i | Subtract ST(0) from ST(i), store result in ST(i), and pop register stack. | |
| FSUBP | DE E9 | Subtract ST(0) from ST(1), store result in ST(1), and pop register stack. | |
| FISUB m32int | DA /4 | Subtract m32int from ST(0) and store result in ST(0). | |
| FISUB m16int | DE /4 | Subtract m16int from ST(0) and store result in ST(0). | |
| FTST | D9 E4 | Compare ST(0) with 0.0. | |
| FUCOM ST(i) | DD E0+i | Compare ST(0) with ST(i). | |
| FUCOM | DD E1 | Compare ST(0) with ST(1). | |
| FUCOMP ST(i) | DD E8+i | Compare ST(0) with ST(i) and pop register stack. | |
| FUCOMP | DD E9 | Compare ST(0) with ST(1) and pop register stack. | |
| FUCOMPP | DA E9 | Compare ST(0) with ST(1) and pop register stack twice. | |
| FXAM | D9 E5 | Classify value or number in ST(0). | |
| FXCH ST(i) | D9 C8+i | Exchange the contents of ST(0) and ST(i). | |
| FXCH | D9 C9 | Exchange the contents of ST(0) and ST(1). | |
| FXRSTOR m512byte | 0F AE /1 | Restore the x87 FPU, MMX, XMM, and MXCSR register state from m512byte. | |
| FXRSTOR64 m512byte | REX.W+ 0F AE /1 | Restore the x87 FPU, MMX, XMM, and MXCSR register state from m512byte. | |
| FXSAVE m512byte | 0F AE /0 | Save the x87 FPU, MMX, XMM, and MXCSR register state to m512byte. | |
| FXSAVE64 m512byte | REX.W+ 0F AE /0 | Save the x87 FPU, MMX, XMM, and MXCSR register state to m512byte. | |
| FXTRACT | D9 F4 | Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack. | |
| FYL2X | D9 F1 | Replace ST(1) with (ST(1) ∗ log2ST(0)) and pop the register stack. | |
| FYL2XP1 | D9 F9 | Replace ST(1) with ST(1) ∗ log2(ST(0) + 1.0) and pop the register stack. | |
| HADDPD xmm1, xmm2/m128 | 66 0F 7C /r | sse3 | Horizontal add packed double-precision floating-point values from xmm2/m128 to xmm1. | 
| VHADDPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 7C /r | avx | Horizontal add packed double-precision floating-point values from xmm2 and xmm3/mem. | 
| VHADDPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 7C /r | avx | Horizontal add packed double-precision floating-point values from ymm2 and ymm3/mem. | 
| HADDPS xmm1, xmm2/m128 | F2 0F 7C /r | sse3 | Horizontal add packed single-precision floating-point values from xmm2/m128 to xmm1. | 
| VHADDPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.F2.0F.WIG 7C /r | avx | Horizontal add packed single-precision floating-point values from xmm2 and xmm3/mem. | 
| VHADDPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.F2.0F.WIG 7C /r | avx | Horizontal add packed single-precision floating-point values from ymm2 and ymm3/mem. | 
| HLT | F4 | Halt | |
| HSUBPD xmm1, xmm2/m128 | 66 0F 7D /r | sse3 | Horizontal subtract packed double-precision floating-point values from xmm2/m128 to xmm1. | 
| VHSUBPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 7D /r | avx | Horizontal subtract packed double-precision floating-point values from xmm2 and xmm3/mem. | 
| VHSUBPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 7D /r | avx | Horizontal subtract packed double-precision floating-point values from ymm2 and ymm3/mem. | 
| HSUBPS xmm1, xmm2/m128 | F2 0F 7D /r | sse3 | Horizontal subtract packed single-precision floating-point values from xmm2/m128 to xmm1. | 
| VHSUBPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.F2.0F.WIG 7D /r | avx | Horizontal subtract packed single-precision floating-point values from xmm2 and xmm3/mem. | 
| VHSUBPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.F2.0F.WIG 7D /r | avx | Horizontal subtract packed single-precision floating-point values from ymm2 and ymm3/mem. | 
| IDIV r/m8 | F6 /7 | Signed divide AX by r/m8, with result stored in: AL ← Quotient, AH ← Remainder. | |
| IDIV r/m8 | REX + F6 /7 | Signed divide AX by r/m8, with result stored in AL ← Quotient, AH ← Remainder. | |
| IDIV r/m16 | F7 /7 | Signed divide DX:AX by r/m16, with result stored in AX ← Quotient, DX ← Remainder. | |
| IDIV r/m32 | F7 /7 | Signed divide EDX:EAX by r/m32, with result stored in EAX ← Quotient, EDX ← Remainder. | |
| IDIV r/m64 | REX.W + F7 /7 | Signed divide RDX:RAX by r/m64, with result stored in RAX ← Quotient, RDX ← Remainder. | |
| IMUL r/m8 | F6 /5 | AX← AL ∗ r/m byte. | |
| IMUL r/m16 | F7 /5 | DX:AX ← AX ∗ r/m word. | |
| IMUL r/m32 | F7 /5 | EDX:EAX ← EAX ∗ r/m32. | |
| IMUL r/m64 | REX.W + F7 /5 | RDX:RAX ← RAX ∗ r/m64. | |
| IMUL r16, r/m16 | 0F AF /r | word register ← word register ∗ r/m16. | |
| IMUL r32, r/m32 | 0F AF /r | doubleword register ← doubleword register ∗ r/m32. | |
| IMUL r64, r/m64 | REX.W + 0F AF /r | Quadword register ← Quadword register ∗ r/m64. | |
| IMUL r16, r/m16, imm8 | 6B /r ib | word register ← r/m16 ∗ sign-extended immediate byte. | |
| IMUL r32, r/m32, imm8 | 6B /r ib | doubleword register ← r/m32 ∗ sign-extended immediate byte. | |
| IMUL r64, r/m64, imm8 | REX.W + 6B /r ib | Quadword register ← r/m64 ∗ sign-extended immediate byte. | |
| IMUL r16, r/m16, imm16 | 69 /r iw | word register ← r/m16 ∗ immediate word. | |
| IMUL r32, r/m32, imm32 | 69 /r id | doubleword register ← r/m32 ∗ immediate doubleword. | |
| IMUL r64, r/m64, imm32 | REX.W + 69 /r id | Quadword register ← r/m64 ∗ immediate doubleword. | |
| IN AL, imm8 | E4 ib | Input byte from imm8 I/O port address into AL. | |
| IN AX, imm8 | E5 ib | Input word from imm8 I/O port address into AX. | |
| IN EAX, imm8 | E5 ib | Input dword from imm8 I/O port address into EAX. | |
| IN AL,DX | EC | Input byte from I/O port in DX into AL. | |
| IN AX,DX | ED | Input word from I/O port in DX into AX. | |
| IN EAX,DX | ED | Input doubleword from I/O port in DX into EAX. | |
| INC r/m8 | FE /0 | Increment r/m byte by 1. | |
| INC r/m8 | REX + FE /0 | Increment r/m byte by 1. | |
| INC r/m16 | FF /0 | Increment r/m word by 1. | |
| INC r/m32 | FF /0 | Increment r/m doubleword by 1. | |
| INC r/m64 | REX.W + FF /0 | Increment r/m quadword by 1. | |
| INC r16 | 40+ rw | Increment word register by 1. | |
| INC r32 | 40+ rd | Increment doubleword register by 1. | |
| INSERTPS xmm1, xmm2/m32, imm8 | 66 0F 3A 21 /r ib | sse4.1 | Insert a single-precision floating-point value selected by imm8 from xmm2/m32 into xmm1 at the specified destination element specified by imm8 and zero out destination elements in xmm1 as indicated in imm8. | 
| VINSERTPS xmm1, xmm2, xmm3/m32, imm8 | VEX.NDS.128.66.0F3A.WIG 21 /r ib | avx | Insert a single-precision floating-point value selected by imm8 from xmm3/m32 and merge with values in xmm2 at the specified destination element specified by imm8 and write out the result and zero out destination elements in xmm1 as indicated in imm8. | 
| VINSERTPS xmm1, xmm2, xmm3/m32, imm8 | EVEX.NDS.128.66.0F3A.W0 21 /r ib | avx512 | Insert a single-precision floating-point value selected by imm8 from xmm3/m32 and merge with values in xmm2 at the specified destination element specified by imm8 and write out the result and zero out destination elements in xmm1 as indicated in imm8. | 
| INS m8, DX | 6C | Input byte from I/O port specified in DX into memory location specified in ES:(E)DI or RDI. | |
| INS m16, DX | 6D | Input word from I/O port specified in DX into memory location specified in ES:(E)DI or RDI. | |
| INS m32, DX | 6D | Input doubleword from I/O port specified in DX into memory location specified in ES:(E)DI or RDI. | |
| INSB | 6C | Input byte from I/O port specified in DX into memory location specified with ES:(E)DI or RDI. | |
| INSW | 6D | Input word from I/O port specified in DX into memory location specified in ES:(E)DI or RDI. | |
| INSD | 6D | Input doubleword from I/O port specified in DX into memory location specified in ES:(E)DI or RDI. | |
| INT 3 | CC | Interrupt 3—trap to debugger. | |
| INT imm8 | CD ib | Interrupt vector specified by immediate byte. | |
| INTO | CE | Interrupt 4—if overflow flag is 1. | |
| INVD | 0F 08 | Flush internal caches; initiate flushing of external caches. | |
| INVLPG m | 0F 01/7 | Invalidate TLB entries for page containing m. | |
| INVPCID r32, m128 | 66 0F 38 82 /r | invpcid | Invalidates entries in the TLBs and paging-structure caches based on invalidation type in r32 and descrip-tor in m128. | 
| INVPCID r64, m128 | 66 0F 38 82 /r | invpcid | Invalidates entries in the TLBs and paging-structure caches based on invalidation type in r64 and descrip-tor in m128. | 
| IRET | CF | Interrupt return (16-bit operand size). | |
| IRETD | CF | Interrupt return (32-bit operand size). | |
| IRETQ | REX.W + CF | Interrupt return (64-bit operand size). | |
| JMP rel8 | EB cb | Jump short, RIP = RIP + 8-bit displacement sign extended to 64-bits | |
| JMP rel16 | E9 cw | Jump near, relative, displacement relative to next instruction. Not supported in 64-bit mode. | |
| JMP rel32 | E9 cd | Jump near, relative, RIP = RIP + 32-bit displacement sign extended to 64-bits | |
| JMP r/m16 | FF /4 | Jump near, absolute indirect, address = zero-extended r/m16. Not supported in 64-bit mode. | |
| JMP r/m32 | FF /4 | Jump near, absolute indirect, address given in r/m32. Not supported in 64-bit mode. | |
| JMP r/m64 | FF /4 | Jump near, absolute indirect, RIP = 64-Bit offset from register or memory | |
| JMP ptr16:16 | EA cd | Jump far, absolute, address given in operand | |
| JMP ptr16:32 | EA cp | Jump far, absolute, address given in operand | |
| JMP m16:16 | FF /5 | Jump far, absolute indirect, address given in m16:16 | |
| JMP m16:32 | FF /5 | Jump far, absolute indirect, address given in m16:32. | |
| JMP m16:64 | REX.W + FF /5 | Jump far, absolute indirect, address given in m16:64. | |
| JA rel8 | 77 cb | Jump short if above (CF=0 and ZF=0). | |
| JAE rel8 | 73 cb | Jump short if above or equal (CF=0). | |
| JB rel8 | 72 cb | Jump short if below (CF=1). | |
| JBE rel8 | 76 cb | Jump short if below or equal (CF=1 or ZF=1). | |
| JC rel8 | 72 cb | Jump short if carry (CF=1). | |
| JCXZ rel8 | E3 cb | Jump short if CX register is 0. | |
| JECXZ rel8 | E3 cb | Jump short if ECX register is 0. | |
| JRCXZ rel8 | E3 cb | Jump short if RCX register is 0. | |
| JE rel8 | 74 cb | Jump short if equal (ZF=1). | |
| JG rel8 | 7F cb | Jump short if greater (ZF=0 and SF=OF). | |
| JGE rel8 | 7D cb | Jump short if greater or equal (SF=OF). | |
| JL rel8 | 7C cb | Jump short if less (SF≠ OF). | |
| JLE rel8 | 7E cb | Jump short if less or equal (ZF=1 or SF≠ OF). | |
| JNA rel8 | 76 cb | Jump short if not above (CF=1 or ZF=1). | |
| JNAE rel8 | 72 cb | Jump short if not above or equal (CF=1). | |
| JNB rel8 | 73 cb | Jump short if not below (CF=0). | |
| JNBE rel8 | 77 cb | Jump short if not below or equal (CF=0 and ZF=0). | |
| JNC rel8 | 73 cb | Jump short if not carry (CF=0). | |
| JNE rel8 | 75 cb | Jump short if not equal (ZF=0). | |
| JNG rel8 | 7E cb | Jump short if not greater (ZF=1 or SF≠ OF). | |
| JNGE rel8 | 7C cb | Jump short if not greater or equal (SF≠ OF). | |
| JNL rel8 | 7D cb | Jump short if not less (SF=OF). | |
| JNLE rel8 | 7F cb | Jump short if not less or equal (ZF=0 and SF=OF). | |
| JNO rel8 | 71 cb | Jump short if not overflow (OF=0). | |
| JNP rel8 | 7B cb | Jump short if not parity (PF=0). | |
| JNS rel8 | 79 cb | Jump short if not sign (SF=0). | |
| JNZ rel8 | 75 cb | Jump short if not zero (ZF=0). | |
| JO rel8 | 70 cb | Jump short if overflow (OF=1). | |
| JP rel8 | 7A cb | Jump short if parity (PF=1). | |
| JPE rel8 | 7A cb | Jump short if parity even (PF=1). | |
| JPO rel8 | 7B cb | Jump short if parity odd (PF=0). | |
| JS rel8 | 78 cb | Jump short if sign (SF=1). | |
| JZ rel8 | 74 cb | Jump short if zero (ZF = 1). | |
| JA rel16 | 0F 87 cw | Jump near if above (CF=0 and ZF=0). Not supported in 64-bit mode. | |
| JA rel32 | 0F 87 cd | Jump near if above (CF=0 and ZF=0). | |
| JAE rel16 | 0F 83 cw | Jump near if above or equal (CF=0). Not supported in 64-bit mode. | |
| JAE rel32 | 0F 83 cd | Jump near if above or equal (CF=0). | |
| JB rel16 | 0F 82 cw | Jump near if below (CF=1). Not supported in 64-bit mode. | |
| JB rel32 | 0F 82 cd | Jump near if below (CF=1). | |
| JBE rel16 | 0F 86 cw | Jump near if below or equal (CF=1 or ZF=1). Not supported in 64-bit mode. | |
| JBE rel32 | 0F 86 cd | Jump near if below or equal (CF=1 or ZF=1). | |
| JC rel16 | 0F 82 cw | Jump near if carry (CF=1). Not supported in 64-bit mode. | |
| JC rel32 | 0F 82 cd | Jump near if carry (CF=1). | |
| JE rel16 | 0F 84 cw | Jump near if equal (ZF=1). Not supported in 64-bit mode. | |
| JE rel32 | 0F 84 cd | Jump near if equal (ZF=1). | |
| JZ rel16 | 0F 84 cw | Jump near if 0 (ZF=1). Not supported in 64-bit mode. | |
| JZ rel32 | 0F 84 cd | Jump near if 0 (ZF=1). | |
| JG rel16 | 0F 8F cw | Jump near if greater (ZF=0 and SF=OF). Not supported in 64-bit mode. | |
| JG rel32 | 0F 8F cd | Jump near if greater (ZF=0 and SF=OF). | |
| JGE rel16 | 0F 8D cw | Jump near if greater or equal (SF=OF). Not supported in 64-bit mode. | |
| JGE rel32 | 0F 8D cd | Jump near if greater or equal (SF=OF). | |
| JL rel16 | 0F 8C cw | Jump near if less (SF≠ OF). Not supported in 64-bit mode. | |
| JL rel32 | 0F 8C cd | Jump near if less (SF≠ OF). | |
| JLE rel16 | 0F 8E cw | Jump near if less or equal (ZF=1 or SF≠ OF). Not supported in 64-bit mode. | |
| JLE rel32 | 0F 8E cd | Jump near if less or equal (ZF=1 or SF≠ OF). | |
| JNA rel16 | 0F 86 cw | Jump near if not above (CF=1 or ZF=1). Not supported in 64-bit mode. | |
| JNA rel32 | 0F 86 cd | Jump near if not above (CF=1 or ZF=1). | |
| JNAE rel16 | 0F 82 cw | Jump near if not above or equal (CF=1). Not supported in 64-bit mode. | |
| JNAE rel32 | 0F 82 cd | Jump near if not above or equal (CF=1). | |
| JNB rel16 | 0F 83 cw | Jump near if not below (CF=0). Not supported in 64-bit mode. | |
| JNB rel32 | 0F 83 cd | Jump near if not below (CF=0). | |
| JNBE rel16 | 0F 87 cw | Jump near if not below or equal (CF=0 and ZF=0). Not supported in 64-bit mode. | |
| JNBE rel32 | 0F 87 cd | Jump near if not below or equal (CF=0 and ZF=0). | |
| JNC rel16 | 0F 83 cw | Jump near if not carry (CF=0). Not supported in 64-bit mode. | |
| JNC rel32 | 0F 83 cd | Jump near if not carry (CF=0). | |
| JNE rel16 | 0F 85 cw | Jump near if not equal (ZF=0). Not supported in 64-bit mode. | |
| JNE rel32 | 0F 85 cd | Jump near if not equal (ZF=0). | |
| JNG rel16 | 0F 8E cw | Jump near if not greater (ZF=1 or SF≠ OF). Not supported in 64-bit mode. | |
| JNG rel32 | 0F 8E cd | Jump near if not greater (ZF=1 or SF≠ OF). | |
| JNGE rel16 | 0F 8C cw | Jump near if not greater or equal (SF≠ OF). Not supported in 64-bit mode. | |
| JNGE rel32 | 0F 8C cd | Jump near if not greater or equal (SF≠ OF). | |
| JNL rel16 | 0F 8D cw | Jump near if not less (SF=OF). Not supported in 64-bit mode. | |
| JNL rel32 | 0F 8D cd | Jump near if not less (SF=OF). | |
| JNLE rel16 | 0F 8F cw | Jump near if not less or equal (ZF=0 and SF=OF). Not supported in 64-bit mode. | |
| JNLE rel32 | 0F 8F cd | Jump near if not less or equal (ZF=0 and SF=OF). | |
| JNO rel16 | 0F 81 cw | Jump near if not overflow (OF=0). Not supported in 64-bit mode. | |
| JNO rel32 | 0F 81 cd | Jump near if not overflow (OF=0). | |
| JNP rel16 | 0F 8B cw | Jump near if not parity (PF=0). Not supported in 64-bit mode. | |
| JNP rel32 | 0F 8B cd | Jump near if not parity (PF=0). | |
| JNS rel16 | 0F 89 cw | Jump near if not sign (SF=0). Not supported in 64-bit mode. | |
| JNS rel32 | 0F 89 cd | Jump near if not sign (SF=0). | |
| JNZ rel16 | 0F 85 cw | Jump near if not zero (ZF=0). Not supported in 64-bit mode. | |
| JNZ rel32 | 0F 85 cd | Jump near if not zero (ZF=0). | |
| JO rel16 | 0F 80 cw | Jump near if overflow (OF=1). Not supported in 64-bit mode. | |
| JO rel32 | 0F 80 cd | Jump near if overflow (OF=1). | |
| JP rel16 | 0F 8A cw | Jump near if parity (PF=1). Not supported in 64-bit mode. | |
| JP rel32 | 0F 8A cd | Jump near if parity (PF=1). | |
| JPE rel16 | 0F 8A cw | Jump near if parity even (PF=1). Not supported in 64-bit mode. | |
| JPE rel32 | 0F 8B cd | Jump near if parity even (PF=1). | |
| JPO rel16 | 0F 8A cw | Jump near if parity odd (PF=0). Not supported in 64-bit mode. | |
| JPO rel32 | 0F 8B cd | Jump near if parity odd (PF=0). | |
| JS rel16 | 0F 88 cw | Jump near if sign (SF=1). Not supported in 64-bit mode. | |
| JS rel32 | 0F 88 cd | Jump near if sign (SF=1). | |
| JZ rel16 | 0F 84 cw | Jump near if 0 (ZF=1). Not supported in 64-bit mode. | |
| JZ rel32 | 0F 84 cd | Jump near if 0 (ZF=1). | |
| KADDW k1, k2, k3 | VEX.L1.0F.W0 4A /r | avx512 | Add 16 bits masks in k2 and k3 and place result in k1. | 
| KADDB k1, k2, k3 | VEX.L1.66.0F.W0 4A /r | avx512 | Add 8 bits masks in k2 and k3 and place result in k1. | 
| KADDQ k1, k2, k3 | VEX.L1.0F.W1 4A /r | avx512 | Add 64 bits masks in k2 and k3 and place result in k1. | 
| KADDD k1, k2, k3 | VEX.L1.66.0F.W1 4A /r | avx512 | Add 32 bits masks in k2 and k3 and place result in k1. | 
| KANDNW k1, k2, k3 | VEX.NDS.L1.0F.W0 42 /r | avx512 | Bitwise AND NOT 16 bits masks k2 and k3 and place result in k1. | 
| KANDNB k1, k2, k3 | VEX.L1.66.0F.W0 42 /r | avx512 | Bitwise AND NOT 8 bits masks k1 and k2 and place result in k1. | 
| KANDNQ k1, k2, k3 | VEX.L1.0F.W1 42 /r | avx512 | Bitwise AND NOT 64 bits masks k2 and k3 and place result in k1. | 
| KANDND k1, k2, k3 | VEX.L1.66.0F.W1 42 /r | avx512 | Bitwise AND NOT 32 bits masks k2 and k3 and place result in k1. | 
| KANDW k1, k2, k3 | VEX.NDS.L1.0F.W0 41 /r | avx512 | Bitwise AND 16 bits masks k2 and k3 and place result in k1. | 
| KANDB k1, k2, k3 | VEX.L1.66.0F.W0 41 /r | avx512 | Bitwise AND 8 bits masks k2 and k3 and place result in k1. | 
| KANDQ k1, k2, k3 | VEX.L1.0F.W1 41 /r | avx512 | Bitwise AND 64 bits masks k2 and k3 and place result in k1. | 
| KANDD k1, k2, k3 | VEX.L1.66.0F.W1 41 /r | avx512 | Bitwise AND 32 bits masks k2 and k3 and place result in k1. | 
| KMOVW k1, k2/m16 | VEX.L0.0F.W0 90 /r | avx512 | Move 16 bits mask from k2/m16 and store the result in k1. | 
| KMOVB k1, k2/m8 | VEX.L0.66.0F.W0 90 /r | avx512 | Move 8 bits mask from k2/m8 and store the result in k1. | 
| KMOVQ k1, k2/m64 | VEX.L0.0F.W1 90 /r | avx512 | Move 64 bits mask from k2/m64 and store the result in k1. | 
| KMOVD k1, k2/m32 | VEX.L0.66.0F.W1 90 /r | avx512 | Move 32 bits mask from k2/m32 and store the result in k1. | 
| KMOVW m16, k1 | VEX.L0.0F.W0 91 /r | avx512 | Move 16 bits mask from k1 and store the result in m16. | 
| KMOVB m8, k1 | VEX.L0.66.0F.W0 91 /r | avx512 | Move 8 bits mask from k1 and store the result in m8. | 
| KMOVQ m64, k1 | VEX.L0.0F.W1 91 /r | avx512 | Move 64 bits mask from k1 and store the result in m64. | 
| KMOVD m32, k1 | VEX.L0.66.0F.W1 91 /r | avx512 | Move 32 bits mask from k1 and store the result in m32. | 
| KMOVW k1, r32 | VEX.L0.0F.W0 92 /r | avx512 | Move 16 bits mask from r32 to k1. | 
| KMOVB k1, r32 | VEX.L0.66.0F.W0 92 /r | avx512 | Move 8 bits mask from r32 to k1. | 
| KMOVQ k1, r64 | VEX.L0.F2.0F.W1 92 /r | avx512 | Move 64 bits mask from r64 to k1. | 
| KMOVD k1, r32 | VEX.L0.F2.0F.W0 92 /r | avx512 | Move 32 bits mask from r32 to k1. | 
| KMOVW r32, k1 | VEX.L0.0F.W0 93 /r | avx512 | Move 16 bits mask from k1 to r32. | 
| KMOVB r32, k1 | VEX.L0.66.0F.W0 93 /r | avx512 | Move 8 bits mask from k1 to r32. | 
| KMOVQ r64, k1 | VEX.L0.F2.0F.W1 93 /r | avx512 | Move 64 bits mask from k1 to r64. | 
| KMOVD r32, k1 | VEX.L0.F2.0F.W0 93 /r | avx512 | Move 32 bits mask from k1 to r32. | 
| KNOTW k1, k2 | VEX.L0.0F.W0 44 /r | avx512 | Bitwise NOT of 16 bits mask k2. | 
| KNOTB k1, k2 | VEX.L0.66.0F.W0 44 /r | avx512 | Bitwise NOT of 8 bits mask k2. | 
| KNOTQ k1, k2 | VEX.L0.0F.W1 44 /r | avx512 | Bitwise NOT of 64 bits mask k2. | 
| KNOTD k1, k2 | VEX.L0.66.0F.W1 44 /r | avx512 | Bitwise NOT of 32 bits mask k2. | 
| KORTESTW k1, k2 | VEX.L0.0F.W0 98 /r | avx512 | Bitwise OR 16 bits masks k1 and k2 and update ZF and CF accordingly. | 
| KORTESTB k1, k2 | VEX.L0.66.0F.W0 98 /r | avx512 | Bitwise OR 8 bits masks k1 and k2 and update ZF and CF accordingly. | 
| KORTESTQ k1, k2 | VEX.L0.0F.W1 98 /r | avx512 | Bitwise OR 64 bits masks k1 and k2 and update ZF and CF accordingly. | 
| KORTESTD k1, k2 | VEX.L0.66.0F.W1 98 /r | avx512 | Bitwise OR 32 bits masks k1 and k2 and update ZF and CF accordingly. | 
| KORW k1, k2, k3 | VEX.NDS.L1.0F.W0 45 /r | avx512 | Bitwise OR 16 bits masks k2 and k3 and place result in k1. | 
| KORB k1, k2, k3 | VEX.L1.66.0F.W0 45 /r | avx512 | Bitwise OR 8 bits masks k2 and k3 and place result in k1. | 
| KORQ k1, k2, k3 | VEX.L1.0F.W1 45 /r | avx512 | Bitwise OR 64 bits masks k2 and k3 and place result in k1. | 
| KORD k1, k2, k3 | VEX.L1.66.0F.W1 45 /r | avx512 | Bitwise OR 32 bits masks k2 and k3 and place result in k1. | 
| KSHIFTLW k1, k2, imm8 | VEX.L0.66.0F3A.W1 32 /r | avx512 | Shift left 16 bits in k2 by immediate and write result in k1. | 
| KSHIFTLB k1, k2, imm8 | VEX.L0.66.0F3A.W0 32 /r | avx512 | Shift left 8 bits in k2 by immediate and write result in k1. | 
| KSHIFTLQ k1, k2, imm8 | VEX.L0.66.0F3A.W1 33 /r | avx512 | Shift left 64 bits in k2 by immediate and write result in k1. | 
| KSHIFTLD k1, k2, imm8 | VEX.L0.66.0F3A.W0 33 /r | avx512 | Shift left 32 bits in k2 by immediate and write result in k1. | 
| KSHIFTRW k1, k2, imm8 | VEX.L0.66.0F3A.W1 30 /r | avx512 | Shift right 16 bits in k2 by immediate and write result in k1. | 
| KSHIFTRB k1, k2, imm8 | VEX.L0.66.0F3A.W0 30 /r | avx512 | Shift right 8 bits in k2 by immediate and write result in k1. | 
| KSHIFTRQ k1, k2, imm8 | VEX.L0.66.0F3A.W1 31 /r | avx512 | Shift right 64 bits in k2 by immediate and write result in k1. | 
| KSHIFTRD k1, k2, imm8 | VEX.L0.66.0F3A.W0 31 /r | avx512 | Shift right 32 bits in k2 by immediate and write result in k1. | 
| KTESTW k1, k2 | VEX.L0.0F.W0 99 /r | avx512 | Set ZF and CF depending on sign bit AND and ANDN of 16 bits mask register sources. | 
| KTESTB k1, k2 | VEX.L0.66.0F.W0 99 /r | avx512 | Set ZF and CF depending on sign bit AND and ANDN of 8 bits mask register sources. | 
| KTESTQ k1, k2 | VEX.L0.0F.W1 99 /r | avx512 | Set ZF and CF depending on sign bit AND and ANDN of 64 bits mask register sources. | 
| KTESTD k1, k2 | VEX.L0.66.0F.W1 99 /r | avx512 | Set ZF and CF depending on sign bit AND and ANDN of 32 bits mask register sources. | 
| KUNPCKBW k1, k2, k3 | VEX.NDS.L1.66.0F.W0 4B /r | avx512 | Unpack and interleave 8 bits masks in k2 and k3 and write word result in k1. | 
| KUNPCKWD k1, k2, k3 | VEX.NDS.L1.0F.W0 4B /r | avx512 | Unpack and interleave 16 bits in k2 and k3 and write double-word result in k1. | 
| KUNPCKDQ k1, k2, k3 | VEX.NDS.L1.0F.W1 4B /r | avx512 | Unpack and interleave 32 bits masks in k2 and k3 and write quadword result in k1. | 
| KXNORW k1, k2, k3 | VEX.NDS.L1.0F.W0 46 /r | avx512 | Bitwise XNOR 16 bits masks k2 and k3 and place result in k1. | 
| KXNORB k1, k2, k3 | VEX.L1.66.0F.W0 46 /r | avx512 | Bitwise XNOR 8 bits masks k2 and k3 and place result in k1. | 
| KXNORQ k1, k2, k3 | VEX.L1.0F.W1 46 /r | avx512 | Bitwise XNOR 64 bits masks k2 and k3 and place result in k1. | 
| KXNORD k1, k2, k3 | VEX.L1.66.0F.W1 46 /r | avx512 | Bitwise XNOR 32 bits masks k2 and k3 and place result in k1. | 
| KXORW k1, k2, k3 | VEX.NDS.L1.0F.W0 47 /r | avx512 | Bitwise XOR 16 bits masks k2 and k3 and place result in k1. | 
| KXORB k1, k2, k3 | VEX.L1.66.0F.W0 47 /r | avx512 | Bitwise XOR 8 bits masks k2 and k3 and place result in k1. | 
| KXORQ k1, k2, k3 | VEX.L1.0F.W1 47 /r | avx512 | Bitwise XOR 64 bits masks k2 and k3 and place result in k1. | 
| KXORD k1, k2, k3 | VEX.L1.66.0F.W1 47 /r | avx512 | Bitwise XOR 32 bits masks k2 and k3 and place result in k1. | 
| LAHF | 9F | Load: AH ← EFLAGS(SF:ZF:0:AF:0:PF:1:CF). | |
| LAR r16, r16/m16 | 0F 02 /r | r16 ← access rights referenced by r16/m16 | |
| LAR reg, r32/m16 | 0F 02 /r | reg ← access rights referenced by r32/m16 | |
| LDDQU xmm1, mem | F2 0F F0 /r | sse3 | Load unaligned data from mem and return double quadword in xmm1. | 
| VLDDQU xmm1, m128 | VEX.128.F2.0F.WIG F0 /r | avx | Load unaligned packed integer values from mem to xmm1. | 
| VLDDQU ymm1, m256 | VEX.256.F2.0F.WIG F0 /r | avx | Load unaligned packed integer values from mem to ymm1. | 
| LDMXCSR m32 | 0F AE /2 | sse | Load MXCSR register from m32. | 
| VLDMXCSR m32 | VEX.LZ.0F.WIG AE /2 | avx | Load MXCSR register from m32. | 
| LDS r16,m16:16 | C5 /r | Load DS:r16 with far pointer from memory. | |
| LDS r32,m16:32 | C5 /r | Load DS:r32 with far pointer from memory. | |
| LSS r16,m16:16 | 0F B2 /r | Load SS:r16 with far pointer from memory. | |
| LSS r32,m16:32 | 0F B2 /r | Load SS:r32 with far pointer from memory. | |
| LSS r64,m16:64 | REX + 0F B2 /r | Load SS:r64 with far pointer from memory. | |
| LES r16,m16:16 | C4 /r | Load ES:r16 with far pointer from memory. | |
| LES r32,m16:32 | C4 /r | Load ES:r32 with far pointer from memory. | |
| LFS r16,m16:16 | 0F B4 /r | Load FS:r16 with far pointer from memory. | |
| LFS r32,m16:32 | 0F B4 /r | Load FS:r32 with far pointer from memory. | |
| LFS r64,m16:64 | REX + 0F B4 /r | Load FS:r64 with far pointer from memory. | |
| LGS r16,m16:16 | 0F B5 /r | Load GS:r16 with far pointer from memory. | |
| LGS r32,m16:32 | 0F B5 /r | Load GS:r32 with far pointer from memory. | |
| LGS r64,m16:64 | REX + 0F B5 /r | Load GS:r64 with far pointer from memory. | |
| LEA r16,m | 8D /r | Store effective address for m in register r16. | |
| LEA r32,m | 8D /r | Store effective address for m in register r32. | |
| LEA r64,m | REX.W + 8D /r | Store effective address for m in register r64. | |
| LEAVE | C9 | Set SP to BP, then pop BP. | |
| LEAVE | C9 | Set ESP to EBP, then pop EBP. | |
| LEAVE | C9 | Set RSP to RBP, then pop RBP. | |
| LFENCE | 0F AE E8 | Serializes load operations. | |
| LGDT m16&32 | 0F 01 /2 | Load m into GDTR. | |
| LIDT m16&32 | 0F 01 /3 | Load m into IDTR. | |
| LGDT m16&64 | 0F 01 /2 | Load m into GDTR. | |
| LIDT m16&64 | 0F 01 /3 | Load m into IDTR. | |
| LLDT r/m16 | 0F 00 /2 | Load segment selector r/m16 into LDTR. | |
| LMSW r/m16 | 0F 01 /6 | Loads r/m16 in machine status word of CR0. | |
| LOCK | F0 | Asserts LOCK# signal for duration of the accompanying instruction. | |
| LODS m8 | AC | For legacy mode, Load byte at address DS:(E)SI into AL. For 64-bit mode load byte at address (R)SI into AL. | |
| LODS m16 | AD | For legacy mode, Load word at address DS:(E)SI into AX. For 64-bit mode load word at address (R)SI into AX. | |
| LODS m32 | AD | For legacy mode, Load dword at address DS:(E)SI into EAX. For 64-bit mode load dword at address (R)SI into EAX. | |
| LODS m64 | REX.W + AD | Load qword at address (R)SI into RAX. | |
| LODSB | AC | For legacy mode, Load byte at address DS:(E)SI into AL. For 64-bit mode load byte at address (R)SI into AL. | |
| LODSW | AD | For legacy mode, Load word at address DS:(E)SI into AX. For 64-bit mode load word at address (R)SI into AX. | |
| LODSD | AD | For legacy mode, Load dword at address DS:(E)SI into EAX. For 64-bit mode load dword at address (R)SI into EAX. | |
| LODSQ | REX.W + AD | Load qword at address (R)SI into RAX. | |
| LOOP rel8 | E2 cb | Decrement count; jump short if count ≠ 0. | |
| LOOPE rel8 | E1 cb | Decrement count; jump short if count ≠ 0 and ZF = 1. | |
| LOOPNE rel8 | E0 cb | Decrement count; jump short if count ≠ 0 and ZF = 0. | |
| LSL r16, r16/m16 | 0F 03 /r | Load: r16 ← segment limit, selector r16/m16. | |
| LSL r32, r32/m16 | 0F 03 /r | Load: r32 ← segment limit, selector r32/m16. | |
| LSL r64, r32/m16 | REX.W + 0F 03 /r | Load: r64 ← segment limit, selector r32/m16 | |
| LTR r/m16 | 0F 00 /3 | Load r/m16 into task register. | |
| LZCNT r16, r/m16 | F3 0F BD /r | lzcnt | Count the number of leading zero bits in r/m16, return result in r16. | 
| LZCNT r32, r/m32 | F3 0F BD /r | lzcnt | Count the number of leading zero bits in r/m32, return result in r32. | 
| LZCNT r64, r/m64 | F3 REX.W 0F BD /r | lzcnt | Count the number of leading zero bits in r/m64, return result in r64. | 
| MASKMOVDQU xmm1, xmm2 | 66 0F F7 /r | sse2 | Selectively write bytes from xmm1 to memory location using the byte mask in xmm2. The default memory location is specified by DS:DI/EDI/RDI. | 
| VMASKMOVDQU xmm1, xmm2 | VEX.128.66.0F.WIG F7 /r | avx | Selectively write bytes from xmm1 to memory location using the byte mask in xmm2. The default memory location is specified by DS:DI/EDI/RDI. | 
| MASKMOVQ mm1, mm2 | 0F F7 /r | Selectively write bytes from mm1 to memory location using the byte mask in mm2. The default memory location is specified by DS:DI/EDI/RDI. | |
| MAXPD xmm1, xmm2/m128 | 66 0F 5F /r | sse2 | Return the maximum double-precision floating-point values between xmm1 and xmm2/m128. | 
| VMAXPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 5F /r | avx | Return the maximum double-precision floating-point values between xmm2 and xmm3/m128. | 
| VMAXPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 5F /r | avx | Return the maximum packed double-precision floating-point values between ymm2 and ymm3/m256. | 
| VMAXPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 5F /r | avx512 | Return the maximum packed double-precision floating-point values between xmm2 and xmm3/m128/m64bcst and store result in xmm1 subject to writemask k1. | 
| VMAXPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 5F /r | avx512 | Return the maximum packed double-precision floating-point values between ymm2 and ymm3/m256/m64bcst and store result in ymm1 subject to writemask k1. | 
| VMAXPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae} | EVEX.NDS.512.66.0F.W1 5F /r | avx512 | Return the maximum packed double-precision floating-point values between zmm2 and zmm3/m512/m64bcst and store result in zmm1 subject to writemask k1. | 
| MAXPS xmm1, xmm2/m128 | 0F 5F /r | sse | Return the maximum single-precision floating-point values between xmm1 and xmm2/mem. | 
| VMAXPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 5F /r | avx | Return the maximum single-precision floating-point values between xmm2 and xmm3/mem. | 
| VMAXPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 5F /r | avx | Return the maximum single-precision floating-point values between ymm2 and ymm3/mem. | 
| VMAXPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 5F /r | avx512 | Return the maximum packed single-precision floating-point values between xmm2 and xmm3/m128/m32bcst and store result in xmm1 subject to writemask k1. | 
| VMAXPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 5F /r | avx512 | Return the maximum packed single-precision floating-point values between ymm2 and ymm3/m256/m32bcst and store result in ymm1 subject to writemask k1. | 
| VMAXPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{sae} | EVEX.NDS.512.0F.W0 5F /r | avx512 | Return the maximum packed single-precision floating-point values between zmm2 and zmm3/m512/m32bcst and store result in zmm1 subject to writemask k1. | 
| MAXSD xmm1, xmm2/m64 | F2 0F 5F /r | sse2 | Return the maximum scalar double-precision floating-point value between xmm2/m64 and xmm1. | 
| VMAXSD xmm1, xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 5F /r | avx | Return the maximum scalar double-precision floating-point value between xmm3/m64 and xmm2. | 
| VMAXSD xmm1 {k1}{z}, xmm2, xmm3/m64{sae} | EVEX.NDS.LIG.F2.0F.W1 5F /r | avx512 | Return the maximum scalar double-precision floating-point value between xmm3/m64 and xmm2. | 
| MAXSS xmm1, xmm2/m32 | F3 0F 5F /r | sse | Return the maximum scalar single-precision floating-point value between xmm2/m32 and xmm1. | 
| VMAXSS xmm1, xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 5F /r | avx | Return the maximum scalar single-precision floating-point value between xmm3/m32 and xmm2. | 
| VMAXSS xmm1 {k1}{z}, xmm2, xmm3/m32{sae} | EVEX.NDS.LIG.F3.0F.W0 5F /r | avx512 | Return the maximum scalar single-precision floating-point value between xmm3/m32 and xmm2. | 
| MFENCE | 0F AE F0 | Serializes load and store operations. | |
| MINPD xmm1, xmm2/m128 | 66 0F 5D /r | sse2 | Return the minimum double-precision floating-point values between xmm1 and xmm2/mem | 
| VMINPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 5D /r | avx | Return the minimum double-precision floating-point values between xmm2 and xmm3/mem. | 
| VMINPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 5D /r | avx | Return the minimum packed double-precision floating-point values between ymm2 and ymm3/mem. | 
| VMINPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 5D /r | avx512 | Return the minimum packed double-precision floating-point values between xmm2 and xmm3/m128/m64bcst and store result in xmm1 subject to writemask k1. | 
| VMINPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 5D /r | avx512 | Return the minimum packed double-precision floating-point values between ymm2 and ymm3/m256/m64bcst and store result in ymm1 subject to writemask k1. | 
| VMINPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae} | EVEX.NDS.512.66.0F.W1 5D /r | avx512 | Return the minimum packed double-precision floating-point values between zmm2 and zmm3/m512/m64bcst and store result in zmm1 subject to writemask k1. | 
| MINPS xmm1, xmm2/m128 | 0F 5D /r | sse | Return the minimum single-precision floating-point values between xmm1 and xmm2/mem. | 
| VMINPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 5D /r | avx | Return the minimum single-precision floating-point values between xmm2 and xmm3/mem. | 
| VMINPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 5D /r | avx | Return the minimum single double-precision floating-point values between ymm2 and ymm3/mem. | 
| VMINPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 5D /r | avx512 | Return the minimum packed single-precision floating-point values between xmm2 and xmm3/m128/m32bcst and store result in xmm1 subject to writemask k1. | 
| VMINPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 5D /r | avx512 | Return the minimum packed single-precision floating-point values between ymm2 and ymm3/m256/m32bcst and store result in ymm1 subject to writemask k1. | 
| VMINPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{sae} | EVEX.NDS.512.0F.W0 5D /r | avx512 | Return the minimum packed single-precision floating-point values between zmm2 and zmm3/m512/m32bcst and store result in zmm1 subject to writemask k1. | 
| MINSD xmm1, xmm2/m64 | F2 0F 5D /r | sse2 | Return the minimum scalar double-precision floating-point value between xmm2/m64 and xmm1. | 
| VMINSD xmm1, xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 5D /r | avx | Return the minimum scalar double-precision floating-point value between xmm3/m64 and xmm2. | 
| VMINSD xmm1 {k1}{z}, xmm2, xmm3/m64{sae} | EVEX.NDS.LIG.F2.0F.W1 5D /r | avx512 | Return the minimum scalar double-precision floating-point value between xmm3/m64 and xmm2. | 
| MINSS xmm1,xmm2/m32 | F3 0F 5D /r | sse | Return the minimum scalar single-precision floating-point value between xmm2/m32 and xmm1. | 
| VMINSS xmm1,xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 5D /r | avx | Return the minimum scalar single-precision floating-point value between xmm3/m32 and xmm2. | 
| VMINSS xmm1 {k1}{z}, xmm2, xmm3/m32{sae} | EVEX.NDS.LIG.F3.0F.W0 5D /r | avx512 | Return the minimum scalar single-precision floating-point value between xmm3/m32 and xmm2. | 
| MONITOR | 0F 01 C8 | Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be a write-back memory caching type. The address is DS:EAX (DS:RAX in 64-bit mode). | |
| MOV r32, CR0–CR7 | 0F 20/r | Move control register to r32. | |
| MOV r64, CR0–CR7 | 0F 20/r | Move extended control register to r64. | |
| MOV r64, CR8 | REX.R + 0F 20 /0 | Move extended CR8 to r64. | |
| MOV CR0–CR7, r32 | 0F 22 /r | Move r32 to control register. | |
| MOV CR0–CR7, r64 | 0F 22 /r | Move r64 to extended control register. | |
| MOV CR8, r64 | REX.R + 0F 22 /0 | Move r64 to extended CR8. | |
| MOV r32, DR0–DR7 | 0F 21/r | Move debug register to r32. | |
| MOV r64, DR0–DR7 | 0F 21/r | Move extended debug register to r64. | |
| MOV DR0–DR7, r32 | 0F 23 /r | Move r32 to debug register. | |
| MOV DR0–DR7, r64 | 0F 23 /r | Move r64 to extended debug register. | |
| MOV r/m8, r8 | 88 /r | Move r8 to r/m8. | |
| MOV r/m8, r8 | REX + 88 /r | Move r8 to r/m8. | |
| MOV r/m16, r16 | 89 /r | Move r16 to r/m16. | |
| MOV r/m32, r32 | 89 /r | Move r32 to r/m32. | |
| MOV r/m64, r64 | REX.W + 89 /r | Move r64 to r/m64. | |
| MOV r8, r/m8 | 8A /r | Move r/m8 to r8. | |
| MOV r8, r/m8 | REX + 8A /r | Move r/m8 to r8. | |
| MOV r16, r/m16 | 8B /r | Move r/m16 to r16. | |
| MOV r32, r/m32 | 8B /r | Move r/m32 to r32. | |
| MOV r64, r/m64 | REX.W + 8B /r | Move r/m64 to r64. | |
| MOV r/m16, Sreg | 8C /r | Move segment register to r/m16. | |
| MOV r/m64, Sreg | REX.W + 8C /r | Move zero extended 16-bit segment register to r/m64. | |
| MOV Sreg, r/m16 | 8E /r | Move r/m16 to segment register. | |
| MOV Sreg, r/m64 | REX.W + 8E /r | Move lower 16 bits of r/m64 to segment register. | |
| MOV AL, moffs8 | A0 | Move byte at (seg:offset) to AL. | |
| MOV AL, moffs8 | REX.W + A0 | Move byte at (offset) to AL. | |
| MOV AX, moffs16 | A1 | Move word at (seg:offset) to AX. | |
| MOV EAX, moffs32 | A1 | Move doubleword at (seg:offset) to EAX. | |
| MOV RAX, moffs64 | REX.W + A1 | Move quadword at (offset) to RAX. | |
| MOV moffs8, AL | A2 | Move AL to (seg:offset). | |
| MOV moffs8, AL | REX.W + A2 | Move AL to (offset). | |
| MOV moffs16, AX | A3 | Move AX to (seg:offset). | |
| MOV moffs32, EAX | A3 | Move EAX to (seg:offset). | |
| MOV moffs64, RAX | REX.W + A3 | Move RAX to (offset). | |
| MOV r8, imm8 | B0+ rb ib | Move imm8 to r8. | |
| MOV r8, imm8 | REX + B0+ rb ib | Move imm8 to r8. | |
| MOV r16, imm16 | B8+ rw iw | Move imm16 to r16. | |
| MOV r32, imm32 | B8+ rd id | Move imm32 to r32. | |
| MOV r64, imm64 | REX.W + B8+ rd io | Move imm64 to r64. | |
| MOV r/m8, imm8 | C6 /0 ib | Move imm8 to r/m8. | |
| MOV r/m8, imm8 | REX + C6 /0 ib | Move imm8 to r/m8. | |
| MOV r/m16, imm16 | C7 /0 iw | Move imm16 to r/m16. | |
| MOV r/m32, imm32 | C7 /0 id | Move imm32 to r/m32. | |
| MOV r/m64, imm32 | REX.W + C7 /0 io | Move imm32 sign extended to 64-bits to r/m64. | |
| MOVAPD xmm1, xmm2/m128 | 66 0F 28 /r | sse2 | Move aligned packed double-precision floating-point values from xmm2/mem to xmm1. | 
| MOVAPD xmm2/m128, xmm1 | 66 0F 29 /r | sse2 | Move aligned packed double-precision floating-point values from xmm1 to xmm2/mem. | 
| VMOVAPD xmm1, xmm2/m128 | VEX.128.66.0F.WIG 28 /r | avx | Move aligned packed double-precision floating-point values from xmm2/mem to xmm1. | 
| VMOVAPD xmm2/m128, xmm1 | VEX.128.66.0F.WIG 29 /r | avx | Move aligned packed double-precision floating-point values from xmm1 to xmm2/mem. | 
| VMOVAPD ymm1, ymm2/m256 | VEX.256.66.0F.WIG 28 /r | avx | Move aligned packed double-precision floating-point values from ymm2/mem to ymm1. | 
| VMOVAPD ymm2/m256, ymm1 | VEX.256.66.0F.WIG 29 /r | avx | Move aligned packed double-precision floating-point values from ymm1 to ymm2/mem. | 
| VMOVAPD xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F.W1 28 /r | avx512 | Move aligned packed double-precision floating-point values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVAPD ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F.W1 28 /r | avx512 | Move aligned packed double-precision floating-point values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVAPD zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F.W1 28 /r | avx512 | Move aligned packed double-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVAPD xmm2/m128 {k1}{z}, xmm1 | EVEX.128.66.0F.W1 29 /r | avx512 | Move aligned packed double-precision floating-point values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVAPD ymm2/m256 {k1}{z}, ymm1 | EVEX.256.66.0F.W1 29 /r | avx512 | Move aligned packed double-precision floating-point values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVAPD zmm2/m512 {k1}{z}, zmm1 | EVEX.512.66.0F.W1 29 /r | avx512 | Move aligned packed double-precision floating-point values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVAPS xmm1, xmm2/m128 | 0F 28 /r | sse | Move aligned packed single-precision floating-point values from xmm2/mem to xmm1. | 
| MOVAPS xmm2/m128, xmm1 | 0F 29 /r | sse | Move aligned packed single-precision floating-point values from xmm1 to xmm2/mem. | 
| VMOVAPS xmm1, xmm2/m128 | VEX.128.0F.WIG 28 /r | avx | Move aligned packed single-precision floating-point values from xmm2/mem to xmm1. | 
| VMOVAPS xmm2/m128, xmm1 | VEX.128.0F.WIG 29 /r | avx | Move aligned packed single-precision floating-point values from xmm1 to xmm2/mem. | 
| VMOVAPS ymm1, ymm2/m256 | VEX.256.0F.WIG 28 /r | avx | Move aligned packed single-precision floating-point values from ymm2/mem to ymm1. | 
| VMOVAPS ymm2/m256, ymm1 | VEX.256.0F.WIG 29 /r | avx | Move aligned packed single-precision floating-point values from ymm1 to ymm2/mem. | 
| VMOVAPS xmm1 {k1}{z}, xmm2/m128 | EVEX.128.0F.W0 28 /r | avx512 | Move aligned packed single-precision floating-point values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVAPS ymm1 {k1}{z}, ymm2/m256 | EVEX.256.0F.W0 28 /r | avx512 | Move aligned packed single-precision floating-point values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVAPS zmm1 {k1}{z}, zmm2/m512 | EVEX.512.0F.W0 28 /r | avx512 | Move aligned packed single-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVAPS xmm2/m128 {k1}{z}, xmm1 | EVEX.128.0F.W0 29 /r | avx512 | Move aligned packed single-precision floating-point values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVAPS ymm2/m256 {k1}{z}, ymm1 | EVEX.256.0F.W0 29 /r | avx512 | Move aligned packed single-precision floating-point values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVAPS zmm2/m512 {k1}{z}, zmm1 | EVEX.512.0F.W0 29 /r | avx512 | Move aligned packed single-precision floating-point values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVBE r16, m16 | 0F 38 F0 /r | Reverse byte order in m16 and move to r16. | |
| MOVBE r32, m32 | 0F 38 F0 /r | Reverse byte order in m32 and move to r32. | |
| MOVBE r64, m64 | REX.W + 0F 38 F0 /r | Reverse byte order in m64 and move to r64. | |
| MOVBE m16, r16 | 0F 38 F1 /r | Reverse byte order in r16 and move to m16. | |
| MOVBE m32, r32 | 0F 38 F1 /r | Reverse byte order in r32 and move to m32. | |
| MOVBE m64, r64 | REX.W + 0F 38 F1 /r | Reverse byte order in r64 and move to m64. | |
| MOVDDUP xmm1, xmm2/m64 | F2 0F 12 /r | sse3 | Move double-precision floating-point value from xmm2/m64 and duplicate into xmm1. | 
| VMOVDDUP xmm1, xmm2/m64 | VEX.128.F2.0F.WIG 12 /r | avx | Move double-precision floating-point value from xmm2/m64 and duplicate into xmm1. | 
| VMOVDDUP ymm1, ymm2/m256 | VEX.256.F2.0F.WIG 12 /r | avx | Move even index double-precision floating-point values from ymm2/mem and duplicate each element into ymm1. | 
| VMOVDDUP xmm1 {k1}{z}, xmm2/m64 | EVEX.128.F2.0F.W1 12 /r | avx512 | Move double-precision floating-point value from xmm2/m64 and duplicate each element into xmm1 subject to writemask k1. | 
| VMOVDDUP ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F2.0F.W1 12 /r | avx512 | Move even index double-precision floating-point values from ymm2/m256 and duplicate each element into ymm1 subject to writemask k1. | 
| VMOVDDUP zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F2.0F.W1 12 /r | avx512 | Move even index double-precision floating-point values from zmm2/m512 and duplicate each element into zmm1 subject to writemask k1. | 
| MOVDQ2Q mm, xmm | F2 0F D6 /r | Move low quadword from xmm to mmx register. | |
| MOVDQA xmm1, xmm2/m128 | 66 0F 6F /r | sse2 | Move aligned packed integer values from xmm2/mem to xmm1. | 
| MOVDQA xmm2/m128, xmm1 | 66 0F 7F /r | sse2 | Move aligned packed integer values from xmm1 to xmm2/mem. | 
| VMOVDQA xmm1, xmm2/m128 | VEX.128.66.0F.WIG 6F /r | avx | Move aligned packed integer values from xmm2/mem to xmm1. | 
| VMOVDQA xmm2/m128, xmm1 | VEX.128.66.0F.WIG 7F /r | avx | Move aligned packed integer values from xmm1 to xmm2/mem. | 
| VMOVDQA ymm1, ymm2/m256 | VEX.256.66.0F.WIG 6F /r | avx | Move aligned packed integer values from ymm2/mem to ymm1. | 
| VMOVDQA ymm2/m256, ymm1 | VEX.256.66.0F.WIG 7F /r | avx | Move aligned packed integer values from ymm1 to ymm2/mem. | 
| VMOVDQA32 xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F.W0 6F /r | avx512 | Move aligned packed doubleword integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQA32 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F.W0 6F /r | avx512 | Move aligned packed doubleword integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQA32 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F.W0 6F /r | avx512 | Move aligned packed doubleword integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQA32 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.66.0F.W0 7F /r | avx512 | Move aligned packed doubleword integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQA32 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.66.0F.W0 7F /r | avx512 | Move aligned packed doubleword integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQA32 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.66.0F.W0 7F /r | avx512 | Move aligned packed doubleword integer values from zmm1 to zmm2/m512 using writemask k1. | 
| VMOVDQA64 xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F.W1 6F /r | avx512 | Move aligned quadword integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQA64 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F.W1 6F /r | avx512 | Move aligned quadword integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQA64 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F.W1 6F /r | avx512 | Move aligned packed quadword integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQA64 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.66.0F.W1 7F /r | avx512 | Move aligned packed quadword integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQA64 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.66.0F.W1 7F /r | avx512 | Move aligned packed quadword integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQA64 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.66.0F.W1 7F /r | avx512 | Move aligned packed quadword integer values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVDQU xmm1, xmm2/m128 | F3 0F 6F /r | sse2 | Move unaligned packed integer values from xmm2/m128 to xmm1. | 
| MOVDQU xmm2/m128, xmm1 | F3 0F 7F /r | sse2 | Move unaligned packed integer values from xmm1 to xmm2/m128. | 
| VMOVDQU xmm1, xmm2/m128 | VEX.128.F3.0F.WIG 6F /r | avx | Move unaligned packed integer values from xmm2/m128 to xmm1. | 
| VMOVDQU xmm2/m128, xmm1 | VEX.128.F3.0F.WIG 7F /r | avx | Move unaligned packed integer values from xmm1 to xmm2/m128. | 
| VMOVDQU ymm1, ymm2/m256 | VEX.256.F3.0F.WIG 6F /r | avx | Move unaligned packed integer values from ymm2/m256 to ymm1. | 
| VMOVDQU ymm2/m256, ymm1 | VEX.256.F3.0F.WIG 7F /r | avx | Move unaligned packed integer values from ymm1 to ymm2/m256. | 
| VMOVDQU8 xmm1 {k1}{z}, xmm2/m128 | EVEX.128.F2.0F.W0 6F /r | avx512 | Move unaligned packed byte integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQU8 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F2.0F.W0 6F /r | avx512 | Move unaligned packed byte integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQU8 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F2.0F.W0 6F /r | avx512 | Move unaligned packed byte integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQU8 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.F2.0F.W0 7F /r | avx512 | Move unaligned packed byte integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQU8 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.F2.0F.W0 7F /r | avx512 | Move unaligned packed byte integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQU8 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.F2.0F.W0 7F /r | avx512 | Move unaligned packed byte integer values from zmm1 to zmm2/m512 using writemask k1. | 
| VMOVDQU16 xmm1 {k1}{z}, xmm2/m128 | EVEX.128.F2.0F.W1 6F /r | avx512 | Move unaligned packed word integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQU16 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F2.0F.W1 6F /r | avx512 | Move unaligned packed word integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQU16 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F2.0F.W1 6F /r | avx512 | Move unaligned packed word integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQU16 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.F2.0F.W1 7F /r | avx512 | Move unaligned packed word integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQU16 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.F2.0F.W1 7F /r | avx512 | Move unaligned packed word integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQU16 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.F2.0F.W1 7F /r | avx512 | Move unaligned packed word integer values from zmm1 to zmm2/m512 using writemask k1. | 
| VMOVDQU32 xmm1 {k1}{z}, xmm2/mm128 | EVEX.128.F3.0F.W0 6F /r | avx512 | Move unaligned packed doubleword integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQU32 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F3.0F.W0 6F /r | avx512 | Move unaligned packed doubleword integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQU32 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F3.0F.W0 6F /r | avx512 | Move unaligned packed doubleword integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQU32 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.F3.0F.W0 7F /r | avx512 | Move unaligned packed doubleword integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQU32 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.F3.0F.W0 7F /r | avx512 | Move unaligned packed doubleword integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQU32 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.F3.0F.W0 7F /r | avx512 | Move unaligned packed doubleword integer values from zmm1 to zmm2/m512 using writemask k1. | 
| VMOVDQU64 xmm1 {k1}{z}, xmm2/m128 | EVEX.128.F3.0F.W1 6F /r | avx512 | Move unaligned packed quadword integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVDQU64 ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F3.0F.W1 6F /r | avx512 | Move unaligned packed quadword integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVDQU64 zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F3.0F.W1 6F /r | avx512 | Move unaligned packed quadword integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVDQU64 xmm2/m128 {k1}{z}, xmm1 | EVEX.128.F3.0F.W1 7F /r | avx512 | Move unaligned packed quadword integer values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVDQU64 ymm2/m256 {k1}{z}, ymm1 | EVEX.256.F3.0F.W1 7F /r | avx512 | Move unaligned packed quadword integer values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVDQU64 zmm2/m512 {k1}{z}, zmm1 | EVEX.512.F3.0F.W1 7F /r | avx512 | Move unaligned packed quadword integer values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVD mm, r/m32 | 0F 6E /r | mmx | Move doubleword from r/m32 to mm. | 
| MOVQ mm, r/m64 | REX.W + 0F 6E /r | mmx | Move quadword from r/m64 to mm. | 
| MOVD r/m32, mm | 0F 7E /r | mmx | Move doubleword from mm to r/m32. | 
| MOVQ r/m64, mm | REX.W + 0F 7E /r | mmx | Move quadword from mm to r/m64. | 
| MOVD xmm, r/m32 | 66 0F 6E /r | sse2 | Move doubleword from r/m32 to xmm. | 
| MOVQ xmm, r/m64 | 66 REX.W 0F 6E /r | sse2 | Move quadword from r/m64 to xmm. | 
| MOVD r/m32, xmm | 66 0F 7E /r | sse2 | Move doubleword from xmm register to r/m32. | 
| MOVQ r/m64, xmm | 66 REX.W 0F 7E /r | sse2 | Move quadword from xmm register to r/m64. | 
| VMOVD xmm1, r32/m32 | VEX.128.66.0F.W0 6E / | avx | Move doubleword from r/m32 to xmm1. | 
| VMOVQ xmm1, r64/m64 | VEX.128.66.0F.W1 6E /r | avx | Move quadword from r/m64 to xmm1. | 
| VMOVD r32/m32, xmm1 | VEX.128.66.0F.W0 7E /r | avx | Move doubleword from xmm1 register to r/m32. | 
| MOVD xmm1, r32/m32 | VEX.128.66.0F.W1 7E /r | avx512 | Move doubleword from r/m32 to xmm1. | 
| VMOVQ xmm1, r64/m64 | EVEX.128.66.0F.W0 6E /r | avx512 | Move quadword from r/m64 to xmm1. | 
| VMOVD r32/m32, xmm1 | EVEX.128.66.0F.W0 7E /r | avx512 | Move doubleword from xmm1 register to r/m32. | 
| VMOVQ r64/m64, xmm1 | EVEX.128.66.0F.W1 7E /r | avx512 | Move quadword from xmm1 register to r/m64. | 
| MOVHLPS xmm1, xmm2 | 0F 12 /r | sse | Move two packed single-precision floating-point values from high quadword of xmm2 to low quadword of xmm1. | 
| VMOVHLPS xmm1, xmm2, xmm3 | VEX.NDS.128.0F.WIG 12 /r | avx | Merge two packed single-precision floating-point values from high quadword of xmm3 and low quadword of xmm2. | 
| VMOVHLPS xmm1, xmm2, xmm3 | EVEX.NDS.128.0F.W0 12 /r | avx512 | Merge two packed single-precision floating-point values from high quadword of xmm3 and low quadword of xmm2. | 
| MOVHPD xmm1, m64 | 66 0F 16 /r | sse2 | Move double-precision floating-point value from m64 to high quadword of xmm1. | 
| VMOVHPD xmm2, xmm1, m64 | VEX.NDS.128.66.0F.WIG 16 /r | avx | Merge double-precision floating-point value from m64 and the low quadword of xmm1. | 
| VMOVHPD xmm2, xmm1, m64 | EVEX.NDS.128.66.0F.W1 16 /r | avx512 | Merge double-precision floating-point value from m64 and the low quadword of xmm1. | 
| MOVHPD m64, xmm1 | 66 0F 17 /r | sse2 | Move double-precision floating-point value from high quadword of xmm1 to m64. | 
| VMOVHPD m64, xmm1 | VEX.128.66.0F.WIG 17 /r | avx | Move double-precision floating-point value from high quadword of xmm1 to m64. | 
| VMOVHPD m64, xmm1 | EVEX.128.66.0F.W1 17 /r | avx512 | Move double-precision floating-point value from high quadword of xmm1 to m64. | 
| MOVHPS xmm1, m64 | 0F 16 /r | sse | Move two packed single-precision floating-point values from m64 to high quadword of xmm1. | 
| VMOVHPS xmm2, xmm1, m64 | VEX.NDS.128.0F.WIG 16 /r | avx | Merge two packed single-precision floating-point values from m64 and the low quadword of xmm1. | 
| VMOVHPS xmm2, xmm1, m64 | EVEX.NDS.128.0F.W0 16 /r | avx512 | Merge two packed single-precision floating-point values from m64 and the low quadword of xmm1. | 
| MOVHPS m64, xmm1 | 0F 17 /r | sse | Move two packed single-precision floating-point values from high quadword of xmm1 to m64. | 
| VMOVHPS m64, xmm1 | VEX.128.0F.WIG 17 /r | avx | Move two packed single-precision floating-point values from high quadword of xmm1 to m64. | 
| VMOVHPS m64, xmm1 | EVEX.128.0F.W0 17 /r | avx512 | Move two packed single-precision floating-point values from high quadword of xmm1 to m64. | 
| MOVLHPS xmm1, xmm2 | 0F 16 /r | sse | Move two packed single-precision floating-point values from low quadword of xmm2 to high quadword of xmm1. | 
| VMOVLHPS xmm1, xmm2, xmm3 | VEX.NDS.128.0F.WIG 16 /r | avx | Merge two packed single-precision floating-point values from low quadword of xmm3 and low quadword of xmm2. | 
| VMOVLHPS xmm1, xmm2, xmm3 | EVEX.NDS.128.0F.W0 16 /r | avx512 | Merge two packed single-precision floating-point values from low quadword of xmm3 and low quadword of xmm2. | 
| MOVLPD xmm1, m64 | 66 0F 12 /r | sse2 | Move double-precision floating-point value from m64 to low quadword of xmm1. | 
| VMOVLPD xmm2, xmm1, m64 | VEX.NDS.128.66.0F.WIG 12 /r | avx | Merge double-precision floating-point value from m64 and the high quadword of xmm1. | 
| VMOVLPD xmm2, xmm1, m64 | EVEX.NDS.128.66.0F.W1 12 /r | avx512 | Merge double-precision floating-point value from m64 and the high quadword of xmm1. | 
| MOVLPD m64, xmm1 | 66 0F 13/r | sse2 | Move double-precision floating-point value from low quadword of xmm1 to m64. | 
| VMOVLPD m64, xmm1 | VEX.128.66.0F.WIG 13/r | avx | Move double-precision floating-point value from low quadword of xmm1 to m64. | 
| VMOVLPD m64, xmm1 | EVEX.128.66.0F.W1 13/r | avx512 | Move double-precision floating-point value from low quadword of xmm1 to m64. | 
| MOVLPS xmm1, m64 | 0F 12 /r | sse | Move two packed single-precision floating-point values from m64 to low quadword of xmm1. | 
| VMOVLPS xmm2, xmm1, m64 | VEX.NDS.128.0F.WIG 12 /r | avx | Merge two packed single-precision floating-point values from m64 and the high quadword of xmm1. | 
| VMOVLPS xmm2, xmm1, m64 | EVEX.NDS.128.0F.W0 12 /r | avx512 | Merge two packed single-precision floating-point values from m64 and the high quadword of xmm1. | 
| MOVLPS m64, xmm1 | 0F 13/r | sse | Move two packed single-precision floating-point values from low quadword of xmm1 to m64. | 
| VMOVLPS m64, xmm1 | VEX.128.0F.WIG 13/r | avx | Move two packed single-precision floating-point values from low quadword of xmm1 to m64. | 
| VMOVLPS m64, xmm1 | EVEX.128.0F.W0 13/r | avx512 | Move two packed single-precision floating-point values from low quadword of xmm1 to m64. | 
| MOVMSKPD reg, xmm | 66 0F 50 /r | sse2 | Extract 2-bit sign mask from xmm and store in reg. The upper bits of r32 or r64 are filled with zeros. | 
| VMOVMSKPD reg, xmm2 | VEX.128.66.0F.WIG 50 /r | avx | Extract 2-bit sign mask from xmm2 and store in reg. The upper bits of r32 or r64 are zeroed. | 
| VMOVMSKPD reg, ymm2 | VEX.256.66.0F.WIG 50 /r | avx | Extract 4-bit sign mask from ymm2 and store in reg. The upper bits of r32 or r64 are zeroed. | 
| MOVMSKPS reg, xmm | 0F 50 /r | sse | Extract 4-bit sign mask from xmm and store in reg. The upper bits of r32 or r64 are filled with zeros. | 
| VMOVMSKPS reg, xmm2 | VEX.128.0F.WIG 50 /r | avx | Extract 4-bit sign mask from xmm2 and store in reg. The upper bits of r32 or r64 are zeroed. | 
| VMOVMSKPS reg, ymm2 | VEX.256.0F.WIG 50 /r | avx | Extract 8-bit sign mask from ymm2 and store in reg. The upper bits of r32 or r64 are zeroed. | 
| MOVNTDQ m128, xmm1 | 66 0F E7 /r | sse2 | Move packed integer values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTDQ m128, xmm1 | VEX.128.66.0F.WIG E7 /r | avx | Move packed integer values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTDQ m256, ymm1 | VEX.256.66.0F.WIG E7 /r | avx | Move packed integer values in ymm1 to m256 using non-temporal hint. | 
| VMOVNTDQ m128, xmm1 | EVEX.128.66.0F.W0 E7 /r | avx512 | Move packed integer values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTDQ m256, ymm1 | EVEX.256.66.0F.W0 E7 /r | avx512 | Move packed integer values in zmm1 to m256 using non-temporal hint. | 
| VMOVNTDQ m512, zmm1 | EVEX.512.66.0F.W0 E7 /r | avx512 | Move packed integer values in zmm1 to m512 using non-temporal hint. | 
| MOVNTDQA xmm1, m128 | 66 0F 38 2A /r | sse4.1 | Move double quadword from m128 to xmm1 using non-temporal hint if WC memory type. | 
| VMOVNTDQA xmm1, m128 | VEX.128.66.0F38.WIG 2A /r | avx | Move double quadword from m128 to xmm using non-temporal hint if WC memory type. | 
| VMOVNTDQA ymm1, m256 | VEX.256.66.0F38.WIG 2A /r | avx2 | Move 256-bit data from m256 to ymm using non-temporal hint if WC memory type. | 
| VMOVNTDQA xmm1, m128 | EVEX.128.66.0F38.W0 2A /r | avx512 | Move 128-bit data from m128 to xmm using non-temporal hint if WC memory type. | 
| VMOVNTDQA ymm1, m256 | EVEX.256.66.0F38.W0 2A /r | avx512 | Move 256-bit data from m256 to ymm using non-temporal hint if WC memory type. | 
| VMOVNTDQA zmm1, m512 | EVEX.512.66.0F38.W0 2A /r | avx512 | Move 512-bit data from m512 to zmm using non-temporal hint if WC memory type. | 
| MOVNTI m32, r32 | 0F C3 /r | Move doubleword from r32 to m32 using non-temporal hint. | |
| MOVNTI m64, r64 | REX.W + 0F C3 /r | Move quadword from r64 to m64 using non-temporal hint. | |
| MOVNTPD m128, xmm1 | 66 0F 2B /r | sse2 | Move packed double-precision values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTPD m128, xmm1 | VEX.128.66.0F.WIG 2B /r | avx | Move packed double-precision values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTPD m256, ymm1 | VEX.256.66.0F.WIG 2B /r | avx | Move packed double-precision values in ymm1 to m256 using non-temporal hint. | 
| VMOVNTPD m128, xmm1 | EVEX.128.66.0F.W1 2B /r | avx512 | Move packed double-precision values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTPD m256, ymm1 | EVEX.256.66.0F.W1 2B /r | avx512 | Move packed double-precision values in ymm1 to m256 using non-temporal hint. | 
| VMOVNTPD m512, zmm1 | EVEX.512.66.0F.W1 2B /r | avx512 | Move packed double-precision values in zmm1 to m512 using non-temporal hint. | 
| MOVNTPS m128, xmm1 | 0F 2B /r | sse | Move packed single-precision values xmm1 to mem using non-temporal hint. | 
| VMOVNTPS m128, xmm1 | VEX.128.0F.WIG 2B /r | avx | Move packed single-precision values xmm1 to mem using non-temporal hint. | 
| VMOVNTPS m256, ymm1 | VEX.256.0F.WIG 2B /r | avx | Move packed single-precision values ymm1 to mem using non-temporal hint. | 
| VMOVNTPS m128, xmm1 | EVEX.128.0F.W0 2B /r | avx512 | Move packed single-precision values in xmm1 to m128 using non-temporal hint. | 
| VMOVNTPS m256, ymm1 | EVEX.256.0F.W0 2B /r | avx512 | Move packed single-precision values in ymm1 to m256 using non-temporal hint. | 
| VMOVNTPS m512, zmm1 | EVEX.512.0F.W0 2B /r | avx512 | Move packed single-precision values in zmm1 to m512 using non-temporal hint. | 
| MOVNTQ m64, mm | 0F E7 /r | Move quadword from mm to m64 using non-temporal hint. | |
| MOVQ mm, mm/m64 | 0F 6F /r | mmx | Move quadword from mm/m64 to mm. | 
| MOVQ mm/m64, mm | 0F 7F /r | mmx | Move quadword from mm to mm/m64. | 
| MOVQ xmm1, xmm2/m64 | F3 0F 7E /r | sse2 | Move quadword from xmm2/mem64 to xmm1. | 
| VMOVQ xmm1, xmm2/m64 | VEX.128.F3.0F.WIG 7E /r | avx | Move quadword from xmm2 to xmm1. | 
| VMOVQ xmm1, xmm2/m64 | EVEX.128.F3.0F.W1 7E /r | avx512 | Move quadword from xmm2/m64 to xmm1. | 
| MOVQ xmm2/m64, xmm1 | 66 0F D6 /r | sse2 | Move quadword from xmm1 to xmm2/mem64. | 
| VMOVQ xmm1/m64, xmm2 | VEX.128.66.0F.WIG D6 /r | avx | Move quadword from xmm2 register to xmm1/m64. | 
| VMOVQ xmm1/m64, xmm2 | EVEX.128.66.0F.W1 D6 /r | avx512 | Move quadword from xmm2 register to xmm1/m64. | 
| MOVQ2DQ xmm, mm | F3 0F D6 /r | Move quadword from mmx to low quadword of xmm. | |
| MOVSD xmm1, xmm2 | F2 0F 10 /r | sse2 | Move scalar double-precision floating-point value from xmm2 to xmm1 register. | 
| MOVSD xmm1, m64 | F2 0F 10 /r | sse2 | Load scalar double-precision floating-point value from m64 to xmm1 register. | 
| MOVSD xmm1/m64, xmm2 | F2 0F 11 /r | sse2 | Move scalar double-precision floating-point value from xmm2 register to xmm1/m64. | 
| VMOVSD xmm1, xmm2, xmm3 | VEX.NDS.LIG.F2.0F.WIG 10 /r | avx | Merge scalar double-precision floating-point value from xmm2 and xmm3 to xmm1 register. | 
| VMOVSD xmm1, m64 | VEX.LIG.F2.0F.WIG 10 /r | avx | Load scalar double-precision floating-point value from m64 to xmm1 register. | 
| VMOVSD xmm1, xmm2, xmm3 | VEX.NDS.LIG.F2.0F.WIG 11 /r | avx | Merge scalar double-precision floating-point value from xmm2 and xmm3 registers to xmm1. | 
| VMOVSD m64, xmm1 | VEX.LIG.F2.0F.WIG 11 /r | avx | Store scalar double-precision floating-point value from xmm1 register to m64. | 
| VMOVSD xmm1 {k1}{z}, xmm2, xmm3 | EVEX.NDS.LIG.F2.0F.W1 10 /r | avx512 | Merge scalar double-precision floating-point value from xmm2 and xmm3 registers to xmm1 under writemask k1. | 
| VMOVSD xmm1 {k1}{z}, m64 | EVEX.LIG.F2.0F.W1 10 /r | avx512 | Load scalar double-precision floating-point value from m64 to xmm1 register under writemask k1. | 
| VMOVSD xmm1 {k1}{z}, xmm2, xmm3 | EVEX.NDS.LIG.F2.0F.W1 11 /r | avx512 | Merge scalar double-precision floating-point value from xmm2 and xmm3 registers to xmm1 under writemask k1. | 
| VMOVSD m64 {k1}, xmm1 | EVEX.LIG.F2.0F.W1 11 /r | avx512 | Store scalar double-precision floating-point value from xmm1 register to m64 under writemask k1. | 
| MOVSHDUP xmm1, xmm2/m128 | F3 0F 16 /r | sse3 | Move odd index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1. | 
| VMOVSHDUP xmm1, xmm2/m128 | VEX.128.F3.0F.WIG 16 /r | avx | Move odd index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1. | 
| VMOVSHDUP ymm1, ymm2/m256 | VEX.256.F3.0F.WIG 16 /r | avx | Move odd index single-precision floating-point values from ymm2/mem and duplicate each element into ymm1. | 
| VMOVSHDUP xmm1 {k1}{z}, xmm2/m128 | EVEX.128.F3.0F.W0 16 /r | avx512 | Move odd index single-precision floating-point values from xmm2/m128 and duplicate each element into xmm1 under writemask. | 
| VMOVSHDUP ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F3.0F.W0 16 /r | avx512 | Move odd index single-precision floating-point values from ymm2/m256 and duplicate each element into ymm1 under writemask. | 
| VMOVSHDUP zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F3.0F.W0 16 /r | avx512 | Move odd index single-precision floating-point values from zmm2/m512 and duplicate each element into zmm1 under writemask. | 
| MOVSLDUP xmm1, xmm2/m128 | F3 0F 12 /r | sse3 | Move even index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1. | 
| VMOVSLDUP xmm1, xmm2/m128 | VEX.128.F3.0F.WIG 12 /r | avx | Move even index single-precision floating-point values from xmm2/mem and duplicate each element into xmm1. | 
| VMOVSLDUP ymm1, ymm2/m256 | VEX.256.F3.0F.WIG 12 /r | avx | Move even index single-precision floating-point values from ymm2/mem and duplicate each element into ymm1. | 
| VMOVSLDUP xmm1 {k1}{z}, xmm2/m128 | EVEX.128.F3.0F.W0 12 /r | avx512 | Move even index single-precision floating-point values from xmm2/m128 and duplicate each element into xmm1 under writemask. | 
| VMOVSLDUP ymm1 {k1}{z}, ymm2/m256 | EVEX.256.F3.0F.W0 12 /r | avx512 | Move even index single-precision floating-point values from ymm2/m256 and duplicate each element into ymm1 under writemask. | 
| VMOVSLDUP zmm1 {k1}{z}, zmm2/m512 | EVEX.512.F3.0F.W0 12 /r | avx512 | Move even index single-precision floating-point values from zmm2/m512 and duplicate each element into zmm1 under writemask. | 
| MOVSS xmm1, xmm2 | F3 0F 10 /r | sse | Merge scalar single-precision floating-point value from xmm2 to xmm1 register. | 
| MOVSS xmm1, m32 | F3 0F 10 /r | sse | Load scalar single-precision floating-point value from m32 to xmm1 register. | 
| VMOVSS xmm1, xmm2, xmm3 | VEX.NDS.LIG.F3.0F.WIG 10 /r | avx | Merge scalar single-precision floating-point value from xmm2 and xmm3 to xmm1 register | 
| VMOVSS xmm1, m32 | VEX.LIG.F3.0F.WIG 10 /r | avx | Load scalar single-precision floating-point value from m32 to xmm1 register. | 
| MOVSS xmm2/m32, xmm1 | F3 0F 11 /r | sse | Move scalar single-precision floating-point value from xmm1 register to xmm2/m32. | 
| VMOVSS xmm1, xmm2, xmm3 | VEX.NDS.LIG.F3.0F.WIG 11 /r | avx | Move scalar single-precision floating-point value from xmm2 and xmm3 to xmm1 register. | 
| VMOVSS m32, xmm1 | VEX.LIG.F3.0F.WIG 11 /r | avx | Move scalar single-precision floating-point value from xmm1 register to m32. | 
| VMOVSS xmm1 {k1}{z}, xmm2, xmm3 | EVEX.NDS.LIG.F3.0F.W0 10 /r | avx512 | Move scalar single-precision floating-point value from xmm2 and xmm3 to xmm1 register under writemask k1. | 
| VMOVSS xmm1 {k1}{z}, m32 | EVEX.LIG.F3.0F.W0 10 /r | avx512 | Move scalar single-precision floating-point values from m32 to xmm1 under writemask k1. | 
| VMOVSS xmm1 {k1}{z}, xmm2, xmm3 | EVEX.NDS.LIG.F3.0F.W0 11 /r | avx512 | Move scalar single-precision floating-point value from xmm2 and xmm3 to xmm1 register under writemask k1. | 
| VMOVSS m32 {k1}, xmm1 | EVEX.LIG.F3.0F.W0 11 /r | avx512 | Move scalar single-precision floating-point values from xmm1 to m32 under writemask k1. | 
| MOVSX r16, r/m8 | 0F BE /r | Move byte to word with sign-extension. | |
| MOVSX r32, r/m8 | 0F BE /r | Move byte to doubleword with sign-extension. | |
| MOVSX r64, r/m8 | REX + 0F BE /r | Move byte to quadword with sign-extension. | |
| MOVSX r32, r/m16 | 0F BF /r | Move word to doubleword, with sign-extension. | |
| MOVSX r64, r/m16 | REX.W + 0F BF /r | Move word to quadword with sign-extension. | |
| MOVSXD r64, r/m32 | REX.W + 63 /r | Move doubleword to quadword with sign-extension. | |
| MOVS m8, m8 | A4 | For legacy mode, Move byte from address DS:(E)SI to ES:(E)DI. For 64-bit mode move byte from address (R|E)SI to (R|E)DI. | |
| MOVS m16, m16 | A5 | For legacy mode, move word from address DS:(E)SI to ES:(E)DI. For 64-bit mode move word at address (R|E)SI to (R|E)DI. | |
| MOVS m32, m32 | A5 | For legacy mode, move dword from address DS:(E)SI to ES:(E)DI. For 64-bit mode move dword from address (R|E)SI to (R|E)DI. | |
| MOVS m64, m64 | REX.W + A5 | Move qword from address (R|E)SI to (R|E)DI. | |
| MOVSB | A4 | For legacy mode, Move byte from address DS:(E)SI to ES:(E)DI. For 64-bit mode move byte from address (R|E)SI to (R|E)DI. | |
| MOVSW | A5 | For legacy mode, move word from address DS:(E)SI to ES:(E)DI. For 64-bit mode move word at address (R|E)SI to (R|E)DI. | |
| MOVSD | A5 | For legacy mode, move dword from address DS:(E)SI to ES:(E)DI. For 64-bit mode move dword from address (R|E)SI to (R|E)DI. | |
| MOVSQ | REX.W + A5 | Move qword from address (R|E)SI to (R|E)DI. | |
| MOVUPD xmm1, xmm2/m128 | 66 0F 10 /r | sse2 | Move unaligned packed double-precision floating-point from xmm2/mem to xmm1. | 
| MOVUPD xmm2/m128, xmm1 | 66 0F 11 /r | sse2 | Move unaligned packed double-precision floating-point from xmm1 to xmm2/mem. | 
| VMOVUPD xmm1, xmm2/m128 | VEX.128.66.0F.WIG 10 /r | avx | Move unaligned packed double-precision floating-point from xmm2/mem to xmm1. | 
| VMOVUPD xmm2/m128, xmm1 | VEX.128.66.0F.WIG 11 /r | avx | Move unaligned packed double-precision floating-point from xmm1 to xmm2/mem. | 
| VMOVUPD ymm1, ymm2/m256 | VEX.256.66.0F.WIG 10 /r | avx | Move unaligned packed double-precision floating-point from ymm2/mem to ymm1. | 
| VMOVUPD ymm2/m256, ymm1 | VEX.256.66.0F.WIG 11 /r | avx | Move unaligned packed double-precision floating-point from ymm1 to ymm2/mem. | 
| VMOVUPD xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F.W1 10 /r | avx512 | Move unaligned packed double-precision floating-point from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVUPD xmm2/m128 {k1}{z}, xmm1 | EVEX.128.66.0F.W1 11 /r | avx512 | Move unaligned packed double-precision floating-point from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVUPD ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F.W1 10 /r | avx512 | Move unaligned packed double-precision floating-point from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVUPD ymm2/m256 {k1}{z}, ymm1 | EVEX.256.66.0F.W1 11 /r | avx512 | Move unaligned packed double-precision floating-point from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVUPD zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F.W1 10 /r | avx512 | Move unaligned packed double-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVUPD zmm2/m512 {k1}{z}, zmm1 | EVEX.512.66.0F.W1 11 /r | avx512 | Move unaligned packed double-precision floating-point values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVUPS xmm1, xmm2/m128 | 0F 10 /r | sse | Move unaligned packed single-precision floating-point from xmm2/mem to xmm1. | 
| MOVUPS xmm2/m128, xmm1 | 0F 11 /r | sse | Move unaligned packed single-precision floating-point from xmm1 to xmm2/mem. | 
| VMOVUPS xmm1, xmm2/m128 | VEX.128.0F.WIG 10 /r | avx | Move unaligned packed single-precision floating-point from xmm2/mem to xmm1. | 
| VMOVUPS xmm2/m128, xmm1 | VEX.128.0F 11.WIG /r | avx | Move unaligned packed single-precision floating-point from xmm1 to xmm2/mem. | 
| VMOVUPS ymm1, ymm2/m256 | VEX.256.0F 10.WIG /r | avx | Move unaligned packed single-precision floating-point from ymm2/mem to ymm1. | 
| VMOVUPS ymm2/m256, ymm1 | VEX.256.0F 11.WIG /r | avx | Move unaligned packed single-precision floating-point from ymm1 to ymm2/mem. | 
| VMOVUPS xmm1 {k1}{z}, xmm2/m128 | EVEX.128.0F.W0 10 /r | avx512 | Move unaligned packed single-precision floating-point values from xmm2/m128 to xmm1 using writemask k1. | 
| VMOVUPS ymm1 {k1}{z}, ymm2/m256 | EVEX.256.0F.W0 10 /r | avx512 | Move unaligned packed single-precision floating-point values from ymm2/m256 to ymm1 using writemask k1. | 
| VMOVUPS zmm1 {k1}{z}, zmm2/m512 | EVEX.512.0F.W0 10 /r | avx512 | Move unaligned packed single-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VMOVUPS xmm2/m128 {k1}{z}, xmm1 | EVEX.128.0F.W0 11 /r | avx512 | Move unaligned packed single-precision floating-point values from xmm1 to xmm2/m128 using writemask k1. | 
| VMOVUPS ymm2/m256 {k1}{z}, ymm1 | EVEX.256.0F.W0 11 /r | avx512 | Move unaligned packed single-precision floating-point values from ymm1 to ymm2/m256 using writemask k1. | 
| VMOVUPS zmm2/m512 {k1}{z}, zmm1 | EVEX.512.0F.W0 11 /r | avx512 | Move unaligned packed single-precision floating-point values from zmm1 to zmm2/m512 using writemask k1. | 
| MOVZX r16, r/m8 | 0F B6 /r | Move byte to word with zero-extension. | |
| MOVZX r32, r/m8 | 0F B6 /r | Move byte to doubleword, zero-extension. | |
| MOVZX r64, r/m8 | REX.W + 0F B6 /r | Move byte to quadword, zero-extension. | |
| MOVZX r32, r/m16 | 0F B7 /r | Move word to doubleword, zero-extension. | |
| MOVZX r64, r/m16 | REX.W + 0F B7 /r | Move word to quadword, zero-extension. | |
| MPSADBW xmm1, xmm2/m128, imm8 | 66 0F 3A 42 /r ib | sse4.1 | Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers in xmm1 and xmm2/m128 and writes the results in xmm1. Starting offsets within xmm1 and xmm2/m128 are determined by imm8. | 
| VMPSADBW xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 42 /r ib | avx | Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers in xmm2 and xmm3/m128 and writes the results in xmm1. Starting offsets within xmm2 and xmm3/m128 are determined by imm8. | 
| VMPSADBW ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 42 /r ib | avx2 | Sums absolute 8-bit integer difference of adjacent groups of 4 byte integers in xmm2 and ymm3/m128 and writes the results in ymm1. Starting offsets within ymm2 and xmm3/m128 are determined by imm8. | 
| MUL r/m8 | F6 /4 | Unsigned multiply (AX ← AL ∗ r/m8). | |
| MUL r/m8 | REX + F6 /4 | Unsigned multiply (AX ← AL ∗ r/m8). | |
| MUL r/m16 | F7 /4 | Unsigned multiply (DX:AX ← AX ∗ r/m16). | |
| MUL r/m32 | F7 /4 | Unsigned multiply (EDX:EAX ← EAX ∗ r/m32). | |
| MUL r/m64 | REX.W + F7 /4 | Unsigned multiply (RDX:RAX ← RAX ∗ r/m64). | |
| MULPD xmm1, xmm2/m128 | 66 0F 59 /r | sse2 | Multiply packed double-precision floating-point values in xmm2/m128 with xmm1 and store result in xmm1. | 
| VMULPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 59 /r | avx | Multiply packed double-precision floating-point values in xmm3/m128 with xmm2 and store result in xmm1. | 
| VMULPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 59 /r | avx | Multiply packed double-precision floating-point values in ymm3/m256 with ymm2 and store result in ymm1. | 
| VMULPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 59 /r | avx512 | Multiply packed double-precision floating-point values from xmm3/m128/m64bcst to xmm2 and store result in xmm1. | 
| VMULPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 59 /r | avx512 | Multiply packed double-precision floating-point values from ymm3/m256/m64bcst to ymm2 and store result in ymm1. | 
| VMULPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F.W1 59 /r | avx512 | Multiply packed double-precision floating-point values in zmm3/m512/m64bcst with zmm2 and store result in zmm1. | 
| MULPS xmm1, xmm2/m128 | 0F 59 /r | sse | Multiply packed single-precision floating-point values in xmm2/m128 with xmm1 and store result in xmm1. | 
| VMULPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 59 /r | avx | Multiply packed single-precision floating-point values in xmm3/m128 with xmm2 and store result in xmm1. | 
| VMULPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 59 /r | avx | Multiply packed single-precision floating-point values in ymm3/m256 with ymm2 and store result in ymm1. | 
| VMULPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 59 /r | avx512 | Multiply packed single-precision floating-point values from xmm3/m128/m32bcst to xmm2 and store result in xmm1. | 
| VMULPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 59 /r | avx512 | Multiply packed single-precision floating-point values from ymm3/m256/m32bcst to ymm2 and store result in ymm1. | 
| VMULPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst {er} | EVEX.NDS.512.0F.W0 59 /r | avx512 | Multiply packed single-precision floating-point values in zmm3/m512/m32bcst with zmm2 and store result in zmm1. | 
| MULSD xmm1,xmm2/m64 | F2 0F 59 /r | sse2 | Multiply the low double-precision floating-point value in xmm2/m64 by low double-precision floating-point value in xmm1. | 
| VMULSD xmm1,xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 59 /r | avx | Multiply the low double-precision floating-point value in xmm3/m64 by low double-precision floating-point value in xmm2. | 
| VMULSD xmm1 {k1}{z}, xmm2, xmm3/m64 {er} | EVEX.NDS.LIG.F2.0F.W1 59 /r | avx512 | Multiply the low double-precision floating-point value in xmm3/m64 by low double-precision floating-point value in xmm2. | 
| MULSS xmm1,xmm2/m32 | F3 0F 59 /r | sse | Multiply the low single-precision floating-point value in xmm2/m32 by the low single-precision floating-point value in xmm1. | 
| VMULSS xmm1,xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 59 /r | avx | Multiply the low single-precision floating-point value in xmm3/m32 by the low single-precision floating-point value in xmm2. | 
| VMULSS xmm1 {k1}{z}, xmm2, xmm3/m32 {er} | EVEX.NDS.LIG.F3.0F.W0 59 /r | avx512 | Multiply the low single-precision floating-point value in xmm3/m32 by the low single-precision floating-point value in xmm2. | 
| MULX r32a, r32b, r/m32 | VEX.NDD.LZ.F2.0F38.W0 F6 /r | bmi2 | Unsigned multiply of r/m32 with EDX without affecting arithmetic flags. | 
| MULX r64a, r64b, r/m64 | VEX.NDD.LZ.F2.0F38.W1 F6 /r | bmi2 | Unsigned multiply of r/m64 with RDX without affecting arithmetic flags. | 
| MWAIT | 0F 01 C9 | A hint that allow the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. | |
| NEG r/m8 | F6 /3 | Two's complement negate r/m8. | |
| NEG r/m8 | REX + F6 /3 | Two's complement negate r/m8. | |
| NEG r/m16 | F7 /3 | Two's complement negate r/m16. | |
| NEG r/m32 | F7 /3 | Two's complement negate r/m32. | |
| NEG r/m64 | REX.W + F7 /3 | Two's complement negate r/m64. | |
| NOP | 90 | One byte no-operation instruction. | |
| NOP r/m16 | 0F 1F /0 | Multi-byte no-operation instruction. | |
| NOP r/m32 | 0F 1F /0 | Multi-byte no-operation instruction. | |
| NOT r/m8 | F6 /2 | Reverse each bit of r/m8. | |
| NOT r/m8 | REX + F6 /2 | Reverse each bit of r/m8. | |
| NOT r/m16 | F7 /2 | Reverse each bit of r/m16. | |
| NOT r/m32 | F7 /2 | Reverse each bit of r/m32. | |
| NOT r/m64 | REX.W + F7 /2 | Reverse each bit of r/m64. | |
| OR AL, imm8 | 0C ib | AL OR imm8. | |
| OR AX, imm16 | 0D iw | AX OR imm16. | |
| OR EAX, imm32 | 0D id | EAX OR imm32. | |
| OR RAX, imm32 | REX.W + 0D id | RAX OR imm32 (sign-extended). | |
| OR r/m8, imm8 | 80 /1 ib | r/m8 OR imm8. | |
| OR r/m8, imm8 | REX + 80 /1 ib | r/m8 OR imm8. | |
| OR r/m16, imm16 | 81 /1 iw | r/m16 OR imm16. | |
| OR r/m32, imm32 | 81 /1 id | r/m32 OR imm32. | |
| OR r/m64, imm32 | REX.W + 81 /1 id | r/m64 OR imm32 (sign-extended). | |
| OR r/m16, imm8 | 83 /1 ib | r/m16 OR imm8 (sign-extended). | |
| OR r/m32, imm8 | 83 /1 ib | r/m32 OR imm8 (sign-extended). | |
| OR r/m64, imm8 | REX.W + 83 /1 ib | r/m64 OR imm8 (sign-extended). | |
| OR r/m8, r8 | 08 /r | r/m8 OR r8. | |
| OR r/m8, r8 | REX + 08 /r | r/m8 OR r8. | |
| OR r/m16, r16 | 09 /r | r/m16 OR r16. | |
| OR r/m32, r32 | 09 /r | r/m32 OR r32. | |
| OR r/m64, r64 | REX.W + 09 /r | r/m64 OR r64. | |
| OR r8, r/m8 | 0A /r | r8 OR r/m8. | |
| OR r8, r/m8 | REX + 0A /r | r8 OR r/m8. | |
| OR r16, r/m16 | 0B /r | r16 OR r/m16. | |
| OR r32, r/m32 | 0B /r | r32 OR r/m32. | |
| OR r64, r/m64 | REX.W + 0B /r | r64 OR r/m64. | |
| ORPD xmm1, xmm2/m128 | 66 0F 56/r | sse2 | Return the bitwise logical OR of packed double-precision floating-point values in xmm1 and xmm2/mem | 
| VORPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F 56 /r | avx | Return the bitwise logical OR of packed double-precision floating-point values in xmm2 and xmm3/mem | 
| VORPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F 56 /r | avx | Return the bitwise logical OR of packed double-precision floating-point values in ymm2 and ymm3/mem | 
| VORPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 56 /r | avx512 | Return the bitwise logical OR of packed double-precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1. | 
| VORPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 56 /r | avx512 | Return the bitwise logical OR of packed double-precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1. | 
| VORPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 56 /r | avx512 | Return the bitwise logical OR of packed double-precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1. | 
| ORPS xmm1, xmm2/m128 | 0F 56 /r | sse | Return the bitwise logical OR of packed single-precision floating-point values in xmm1 and xmm2/mem | 
| VORPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F 56 /r | avx | Return the bitwise logical OR of packed single-precision floating-point values in xmm2 and xmm3/mem | 
| VORPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F 56 /r | avx | Return the bitwise logical OR of packed single-precision floating-point values in ymm2 and ymm3/mem | 
| VORPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 56 /r | avx512 | Return the bitwise logical OR of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1. | 
| VORPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 56 /r | avx512 | Return the bitwise logical OR of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1. | 
| VORPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.0F.W0 56 /r | avx512 | Return the bitwise logical OR of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1. | 
| OUT imm8, AL | E6 ib | Output byte in AL to I/O port address imm8. | |
| OUT imm8, AX | E7 ib | Output word in AX to I/O port address imm8. | |
| OUT imm8, EAX | E7 ib | Output doubleword in EAX to I/O port address imm8. | |
| OUT DX, AL | EE | Output byte in AL to I/O port address in DX. | |
| OUT DX, AX | EF | Output word in AX to I/O port address in DX. | |
| OUT DX, EAX | EF | Output doubleword in EAX to I/O port address in DX. | |
| OUTS DX, m8 | 6E | Output byte from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| OUTS DX, m16 | 6F | Output word from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| OUTS DX, m32 | 6F | Output doubleword from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| OUTSB | 6E | Output byte from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| OUTSW | 6F | Output word from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| OUTSD | 6F | Output doubleword from memory location specified in DS:(E)SI or RSI to I/O port specified in DX. | |
| PABSB mm1, mm2/m64 | 0F 38 1C /r | ssse3 | Compute the absolute value of bytes in mm2/m64 and store UNSIGNED result in mm1. | 
| PABSB xmm1, xmm2/m128 | 66 0F 38 1C /r | ssse3 | Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1. | 
| PABSW mm1, mm2/m64 | 0F 38 1D /r | ssse3 | Compute the absolute value of 16-bit integers in mm2/m64 and store UNSIGNED result in mm1. | 
| PABSW xmm1, xmm2/m128 | 66 0F 38 1D /r | ssse3 | Compute the absolute value of 16-bit integers in xmm2/m128 and store UNSIGNED result in xmm1. | 
| PABSD mm1, mm2/m64 | 0F 38 1E /r | ssse3 | Compute the absolute value of 32-bit integers in mm2/m64 and store UNSIGNED result in mm1. | 
| PABSD xmm1, xmm2/m128 | 66 0F 38 1E /r | ssse3 | Compute the absolute value of 32-bit integers in xmm2/m128 and store UNSIGNED result in xmm1. | 
| VPABSB xmm1, xmm2/m128 | VEX.128.66.0F38.WIG 1C /r | avx | Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1. | 
| VPABSW xmm1, xmm2/m128 | VEX.128.66.0F38.WIG 1D /r | avx | Compute the absolute value of 16- bit integers in xmm2/m128 and store UNSIGNED result in xmm1. | 
| VPABSD xmm1, xmm2/m128 | VEX.128.66.0F38.WIG 1E /r | avx | Compute the absolute value of 32- bit integers in xmm2/m128 and store UNSIGNED result in xmm1. | 
| VPABSB ymm1, ymm2/m256 | VEX.256.66.0F38.WIG 1C /r | avx2 | Compute the absolute value of bytes in ymm2/m256 and store UNSIGNED result in ymm1. | 
| VPABSW ymm1, ymm2/m256 | VEX.256.66.0F38.WIG 1D /r | avx2 | Compute the absolute value of 16-bit integers in ymm2/m256 and store UNSIGNED result in ymm1. | 
| VPABSD ymm1, ymm2/m256 | VEX.256.66.0F38.WIG 1E /r | avx2 | Compute the absolute value of 32-bit integers in ymm2/m256 and store UNSIGNED result in ymm1. | 
| VPABSB xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.WIG 1C /r | avx512 | Compute the absolute value of bytes in xmm2/m128 and store UNSIGNED result in xmm1 using writemask k1. | 
| VPABSB ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F38.WIG 1C /r | avx512 | Compute the absolute value of bytes in ymm2/m256 and store UNSIGNED result in ymm1 using writemask k1. | 
| VPABSB zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F38.WIG 1C /r | avx512 | Compute the absolute value of bytes in zmm2/m512 and store UNSIGNED result in zmm1 using writemask k1. | 
| VPABSW xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.WIG 1D /r | avx512 | Compute the absolute value of 16-bit integers in xmm2/m128 and store UNSIGNED result in xmm1 using writemask k1. | 
| PACKSSWB mm1, mm2/m64 | 0F 63 /r | mmx | Converts 4 packed signed word integers from mm1 and from mm2/m64 into 8 packed signed byte integers in mm1 using signed saturation. | 
| PACKSSWB xmm1, xmm2/m128 | 66 0F 63 /r | sse2 | Converts 8 packed signed word integers from xmm1 and from xxm2/m128 into 16 packed signed byte integers in xxm1 using signed saturation. | 
| PACKSSDW mm1, mm2/m64 | 0F 6B /r | mmx | Converts 2 packed signed doubleword integers from mm1 and from mm2/m64 into 4 packed signed word integers in mm1 using signed saturation. | 
| PACKSSDW xmm1, xmm2/m128 | 66 0F 6B /r | sse2 | Converts 4 packed signed doubleword integers from xmm1 and from xxm2/m128 into 8 packed signed word integers in xxm1 using signed saturation. | 
| VPACKSSWB xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 63 /r | avx | Converts 8 packed signed word integers from xmm2 and from xmm3/m128 into 16 packed signed byte integers in xmm1 using signed saturation. | 
| VPACKSSDW xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 6B /r | avx | Converts 4 packed signed doubleword integers from xmm2 and from xmm3/m128 into 8 packed signed word integers in xmm1 using signed saturation. | 
| VPACKSSWB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 63 /r | avx2 | Converts 16 packed signed word integers from ymm2 and from ymm3/m256 into 32 packed signed byte integers in ymm1 using signed saturation. | 
| VPACKSSDW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 6B /r | avx2 | Converts 8 packed signed doubleword integers from ymm2 and from ymm3/m256 into 16 packed signed word integers in ymm1using signed saturation. | 
| VPACKSSWB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 63 /r | avx512 | Converts packed signed word integers from xmm2 and from xmm3/m128 into packed signed byte integers in xmm1 using signed saturation under writemask k1. | 
| VPACKSSWB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG 63 /r | avx512 | Converts packed signed word integers from ymm2 and from ymm3/m256 into packed signed byte integers in ymm1 using signed saturation under writemask k1. | 
| VPACKSSWB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG 63 /r | avx512 | Converts packed signed word integers from zmm2 and from zmm3/m512 into packed signed byte integers in zmm1 using signed saturation under writemask k1. | 
| VPACKSSDW xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 6B /r | avx512 | Converts packed signed doubleword integers from xmm2 and from xmm3/m128/m32bcst into packed signed word integers in xmm1 using signed saturation under writemask k1. | 
| PACKUSDW xmm1, xmm2/m128 | 66 0F 38 2B /r | sse4.1 | Convert 4 packed signed doubleword integers from xmm1 and 4 packed signed doubleword integers from xmm2/m128 into 8 packed unsigned word integers in xmm1 using unsigned saturation. | 
| VPACKUSDW xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F38 2B /r | avx | Convert 4 packed signed doubleword integers from xmm2 and 4 packed signed doubleword integers from xmm3/m128 into 8 packed unsigned word integers in xmm1 using unsigned saturation. | 
| VPACKUSDW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38 2B /r | avx2 | Convert 8 packed signed doubleword integers from ymm2 and 8 packed signed doubleword integers from ymm3/m256 into 16 packed unsigned word integers in ymm1 using unsigned saturation. | 
| VPACKUSDW xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 2B /r | avx512 | Convert packed signed doubleword integers from xmm2 and packed signed doubleword integers from xmm3/m128/m32bcst into packed unsigned word integers in xmm1 using unsigned saturation under writemask k1. | 
| VPACKUSDW ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 2B /r | avx512 | Convert packed signed doubleword integers from ymm2 and packed signed doubleword integers from ymm3/m256/m32bcst into packed unsigned word integers in ymm1 using unsigned saturation under writemask k1. | 
| VPACKUSDW zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 2B /r | avx512 | Convert packed signed doubleword integers from zmm2 and packed signed doubleword integers from zmm3/m512/m32bcst into packed unsigned word integers in zmm1 using unsigned saturation under writemask k1. | 
| PACKUSWB mm, mm/m64 | 0F 67 /r | mmx | Converts 4 signed word integers from mm and 4 signed word integers from mm/m64 into 8 unsigned byte integers in mm using unsigned saturation. | 
| PACKUSWB xmm1, xmm2/m128 | 66 0F 67 /r | sse2 | Converts 8 signed word integers from xmm1 and 8 signed word integers from xmm2/m128 into 16 unsigned byte integers in xmm1 using unsigned saturation. | 
| VPACKUSWB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 67 /r | avx | Converts 8 signed word integers from xmm2 and 8 signed word integers from xmm3/m128 into 16 unsigned byte integers in xmm1 using unsigned saturation. | 
| VPACKUSWB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 67 /r | avx2 | Converts 16 signed word integers from ymm2 and 16signed word integers from ymm3/m256 into 32 unsigned byte integers in ymm1 using unsigned saturation. | 
| VPACKUSWB xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 67 /r | avx512 | Converts signed word integers from xmm2 and signed word integers from xmm3/m128 into unsigned byte integers in xmm1 using unsigned saturation under writemask k1. | 
| VPACKUSWB ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG 67 /r | avx512 | Converts signed word integers from ymm2 and signed word integers from ymm3/m256 into unsigned byte integers in ymm1 using unsigned saturation under writemask k1. | 
| VPACKUSWB zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG 67 /r | avx512 | Converts signed word integers from zmm2 and signed word integers from zmm3/m512 into unsigned byte integers in zmm1 using unsigned saturation under writemask k1. | 
| PADDB mm, mm/m64 | 0F FC /r | mmx | Add packed byte integers from mm/m64 and mm. | 
| PADDW mm, mm/m64 | 0F FD /r | mmx | Add packed word integers from mm/m64 and mm. | 
| PADDB xmm1, xmm2/m128 | 66 0F FC /r | sse2 | Add packed byte integers from xmm2/m128 and xmm1. | 
| PADDW xmm1, xmm2/m128 | 66 0F FD /r | sse2 | Add packed word integers from xmm2/m128 and xmm1. | 
| PADDD xmm1, xmm2/m128 | 66 0F FE /r | sse2 | Add packed doubleword integers from xmm2/m128 and xmm1. | 
| PADDQ xmm1, xmm2/m128 | 66 0F D4 /r | sse2 | Add packed quadword integers from xmm2/m128 and xmm1. | 
| VPADDB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG FC /r | avx | Add packed byte integers from xmm2, and xmm3/m128 and store in xmm1. | 
| VPADDW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG FD /r | avx | Add packed word integers from xmm2, xmm3/m128 and store in xmm1. | 
| VPADDD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG FE /r | avx | Add packed doubleword integers from xmm2, xmm3/m128 and store in xmm1. | 
| VPADDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D4 /r | avx | Add packed quadword integers from xmm2, xmm3/m128 and store in xmm1. | 
| VPADDB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG FC /r | avx2 | Add packed byte integers from ymm2, and ymm3/m256 and store in ymm1. | 
| VPADDW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG FD /r | avx2 | Add packed word integers from ymm2, ymm3/m256 and store in ymm1. | 
| VPADDD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG FE /r | avx2 | Add packed doubleword integers from ymm2, ymm3/m256 and store in ymm1. | 
| VPADDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG D4 /r | avx2 | Add packed quadword integers from ymm2, ymm3/m256 and store in ymm1. | 
| VPADDB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG FC /r | avx512 | Add packed byte integers from xmm2, and xmm3/m128 and store in xmm1 using writemask k1. | 
| VPADDW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG FD /r | avx512 | Add packed word integers from xmm2, and xmm3/m128 and store in xmm1 using writemask k1. | 
| VPADDD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 FE /r | avx512 | Add packed doubleword integers from xmm2, and xmm3/m128/m32bcst and store in xmm1 using writemask k1. | 
| VPADDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 D4 /r | avx512 | Add packed quadword integers from xmm2, and xmm3/m128/m64bcst and store in xmm1 using writemask k1. | 
| VPADDB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG FC /r | avx512 | Add packed byte integers from ymm2, and ymm3/m256 and store in ymm1 using writemask k1. | 
| VPADDW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG FD /r | avx512 | Add packed word integers from ymm2, and ymm3/m256 and store in ymm1 using writemask k1. | 
| VPADDD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 FE /r | avx512 | Add packed doubleword integers from ymm2, ymm3/m256/m32bcst and store in ymm1 using writemask k1. | 
| VPADDQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 D4 /r | avx512 | Add packed quadword integers from ymm2, ymm3/m256/m64bcst and store in ymm1 using writemask k1. | 
| VPADDB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG FC /r | avx512 | Add packed byte integers from zmm2, and zmm3/m512 and store in zmm1 using writemask k1. | 
| VPADDW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG FD /r | avx512 | Add packed word integers from zmm2, and zmm3/m512 and store in zmm1 using writemask k1. | 
| VPADDD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 FE /r | avx512 | Add packed doubleword integers from zmm2, zmm3/m512/m32bcst and store in zmm1 using writemask k1. | 
| VPADDQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 D4 /r | avx512 | Add packed quadword integers from zmm2, zmm3/m512/m64bcst and store in zmm1 using writemask k1. | 
| PADDSB mm, mm/m64 | 0F EC /r | mmx | Add packed signed byte integers from mm/m64 and mm and saturate the results. | 
| PADDSB xmm1, xmm2/m128 | 66 0F EC /r | sse2 | Add packed signed byte integers from xmm2/m128 and xmm1 saturate the results. | 
| PADDSW mm, mm/m64 | 0F ED /r | mmx | Add packed signed word integers from mm/m64 and mm and saturate the results. | 
| PADDSW xmm1, xmm2/m128 | 66 0F ED /r | sse2 | Add packed signed word integers from xmm2/m128 and xmm1 and saturate the results. | 
| VPADDSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG EC /r | avx | Add packed signed byte integers from xmm3/m128 and xmm2 saturate the results. | 
| VPADDSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG ED /r | avx | Add packed signed word integers from xmm3/m128 and xmm2 and saturate the results. | 
| VPADDSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG EC /r | avx2 | Add packed signed byte integers from ymm2, and ymm3/m256 and store the saturated results in ymm1. | 
| VPADDSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG ED /r | avx2 | Add packed signed word integers from ymm2, and ymm3/m256 and store the saturated results in ymm1. | 
| VPADDSB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG EC /r | avx512 | Add packed signed byte integers from xmm2, and xmm3/m128 and store the saturated results in xmm1 under writemask k1. | 
| VPADDSB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG EC /r | avx512 | Add packed signed byte integers from ymm2, and ymm3/m256 and store the saturated results in ymm1 under writemask k1. | 
| VPADDSB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG EC /r | avx512 | Add packed signed byte integers from zmm2, and zmm3/m512 and store the saturated results in zmm1 under writemask k1. | 
| VPADDSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG ED /r | avx512 | Add packed signed word integers from xmm2, and xmm3/m128 and store the saturated results in xmm1 under writemask k1. | 
| VPADDSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG ED /r | avx512 | Add packed signed word integers from ymm2, and ymm3/m256 and store the saturated results in ymm1 under writemask k1. | 
| VPADDSW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG ED /r | avx512 | Add packed signed word integers from zmm2, and zmm3/m512 and store the saturated results in zmm1 under writemask k1. | 
| PADDUSB mm, mm/m64 | 0F DC /r | mmx | Add packed unsigned byte integers from mm/m64 and mm and saturate the results. | 
| PADDUSB xmm1, xmm2/m128 | 66 0F DC /r | sse2 | Add packed unsigned byte integers from xmm2/m128 and xmm1 saturate the results. | 
| PADDUSW mm, mm/m64 | 0F DD /r | mmx | Add packed unsigned word integers from mm/m64 and mm and saturate the results. | 
| PADDUSW xmm1, xmm2/m128 | 66 0F DD /r | sse2 | Add packed unsigned word integers from xmm2/m128 to xmm1 and saturate the results. | 
| VPADDUSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.660F.WIG DC /r | avx | Add packed unsigned byte integers from xmm3/m128 to xmm2 and saturate the results. | 
| VPADDUSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG DD /r | avx | Add packed unsigned word integers from xmm3/m128 to xmm2 and saturate the results. | 
| VPADDUSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG DC /r | avx2 | Add packed unsigned byte integers from ymm2, and ymm3/m256 and store the saturated results in ymm1. | 
| VPADDUSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG DD /r | avx2 | Add packed unsigned word integers from ymm2, and ymm3/m256 and store the saturated results in ymm1. | 
| VPADDUSB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG DC /r | avx512 | Add packed unsigned byte integers from xmm2, and xmm3/m128 and store the saturated results in xmm1 under writemask k1. | 
| VPADDUSB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG DC /r | avx512 | Add packed unsigned byte integers from ymm2, and ymm3/m256 and store the saturated results in ymm1 under writemask k1. | 
| VPADDUSB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG DC /r | avx512 | Add packed unsigned byte integers from zmm2, and zmm3/m512 and store the saturated results in zmm1 under writemask k1. | 
| VPADDUSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG DD /r | avx512 | Add packed unsigned word integers from xmm2, and xmm3/m128 and store the saturated results in xmm1 under writemask k1. | 
| VPADDUSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG DD /r | avx512 | Add packed unsigned word integers from ymm2, and ymm3/m256 and store the saturated results in ymm1 under writemask k1. | 
| PALIGNR mm1, mm2/m64, imm8 | 0F 3A 0F /r ib | ssse3 | Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into mm1. | 
| PALIGNR xmm1, xmm2/m128, imm8 | 66 0F 3A 0F /r ib | ssse3 | Concatenate destination and source operands, extract byte-aligned result shifted to the right by constant value in imm8 into xmm1. | 
| VPALIGNR xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 0F /r ib | avx | Concatenate xmm2 and xmm3/m128, extract byte aligned result shifted to the right by constant value in imm8 and result is stored in xmm1. | 
| VPALIGNR ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 0F /r ib | avx2 | Concatenate pairs of 16 bytes in ymm2 and ymm3/m256 into 32-byte intermediate result, extract byte-aligned, 16-byte result shifted to the right by constant values in imm8 from each intermediate result, and two 16-byte results are stored in ymm1. | 
| VPALIGNR xmm1 {k1}{z}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.WIG 0F /r ib | avx512 | Concatenate xmm2 and xmm3/m128 into a 32-byte intermediate result, extract byte aligned result shifted to the right by constant value in imm8 and result is stored in xmm1. | 
| VPALIGNR ymm1 {k1}{z}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.WIG 0F /r ib | avx512 | Concatenate pairs of 16 bytes in ymm2 and ymm3/m256 into 32-byte intermediate result, extract byte-aligned, 16-byte result shifted to the right by constant values in imm8 from each intermediate result, and two 16-byte results are stored in ymm1. | 
| VPALIGNR zmm1 {k1}{z}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.WIG 0F /r ib | avx512 | Concatenate pairs of 32 bytes in zmm2 and zmm3/m512 into 32-byte intermediate result, extract byte-aligned, 32-byte result shifted to the right by constant values in imm8 from each intermediate result, and four 32-byte results are stored in zmm1. | 
| PAND mm, mm/m64 | 0F DB /r | mmx | Bitwise AND mm/m64 and mm. | 
| PAND xmm1, xmm2/m128 | 66 0F DB /r | sse2 | Bitwise AND of xmm2/m128 and xmm1. | 
| VPAND xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG DB /r | avx | Bitwise AND of xmm3/m128 and xmm. | 
| VPAND ymm1, ymm2, ymm3/.m256 | VEX.NDS.256.66.0F.WIG DB /r | avx2 | Bitwise AND of ymm2, and ymm3/m256 and store result in ymm1. | 
| VPANDD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 DB /r | avx512 | Bitwise AND of packed doubleword integers in xmm2 and xmm3/m128/m32bcst and store result in xmm1 using writemask k1. | 
| VPANDD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 DB /r | avx512 | Bitwise AND of packed doubleword integers in ymm2 and ymm3/m256/m32bcst and store result in ymm1 using writemask k1. | 
| VPANDD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 DB /r | avx512 | Bitwise AND of packed doubleword integers in zmm2 and zmm3/m512/m32bcst and store result in zmm1 using writemask k1. | 
| VPANDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 DB /r | avx512 | Bitwise AND of packed quadword integers in xmm2 and xmm3/m128/m64bcst and store result in xmm1 using writemask k1. | 
| VPANDQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 DB /r | avx512 | Bitwise AND of packed quadword integers in ymm2 and ymm3/m256/m64bcst and store result in ymm1 using writemask k1. | 
| VPANDQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 DB /r | avx512 | Bitwise AND of packed quadword integers in zmm2 and zmm3/m512/m64bcst and store result in zmm1 using writemask k1. | 
| PANDN mm, mm/m64 | 0F DF /r | mmx | Bitwise AND NOT of mm/m64 and mm. | 
| PANDN xmm1, xmm2/m128 | 66 0F DF /r | sse2 | Bitwise AND NOT of xmm2/m128 and xmm1. | 
| VPANDN xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG DF /r | avx | Bitwise AND NOT of xmm3/m128 and xmm2. | 
| VPANDN ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG DF /r | avx2 | Bitwise AND NOT of ymm2, and ymm3/m256 and store result in ymm1. | 
| VPANDND xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 DF /r | avx512 | Bitwise AND NOT of packed doubleword integers in xmm2 and xmm3/m128/m32bcst and store result in xmm1 using writemask k1. | 
| VPANDND ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 DF /r | avx512 | Bitwise AND NOT of packed doubleword integers in ymm2 and ymm3/m256/m32bcst and store result in ymm1 using writemask k1. | 
| VPANDND zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 DF /r | avx512 | Bitwise AND NOT of packed doubleword integers in zmm2 and zmm3/m512/m32bcst and store result in zmm1 using writemask k1. | 
| VPANDNQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 DF /r | avx512 | Bitwise AND NOT of packed quadword integers in xmm2 and xmm3/m128/m64bcst and store result in xmm1 using writemask k1. | 
| VPANDNQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 DF /r | avx512 | Bitwise AND NOT of packed quadword integers in ymm2 and ymm3/m256/m64bcst and store result in ymm1 using writemask k1. | 
| VPANDNQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 DF /r | avx512 | Bitwise AND NOT of packed quadword integers in zmm2 and zmm3/m512/m64bcst and store result in zmm1 using writemask k1. | 
| PAUSE | F3 90 | Gives hint to processor that improves performance of spin-wait loops. | |
| PAVGB mm1, mm2/m64 | 0F E0 /r | sse | Average packed unsigned byte integers from mm2/m64 and mm1 with rounding. | 
| PAVGB xmm1, xmm2/m128 | 66 0F E0, /r | sse2 | Average packed unsigned byte integers from xmm2/m128 and xmm1 with rounding. | 
| PAVGW mm1, mm2/m64 | 0F E3 /r | sse | Average packed unsigned word integers from mm2/m64 and mm1 with rounding. | 
| PAVGW xmm1, xmm2/m128 | 66 0F E3 /r | sse2 | Average packed unsigned word integers from xmm2/m128 and xmm1 with rounding. | 
| VPAVGB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E0 /r | avx | Average packed unsigned byte integers from xmm3/m128 and xmm2 with rounding. | 
| VPAVGW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E3 /r | avx | Average packed unsigned word integers from xmm3/m128 and xmm2 with rounding. | 
| VPAVGB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E0 /r | avx2 | Average packed unsigned byte integers from ymm2, and ymm3/m256 with rounding and store to ymm1. | 
| VPAVGW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E3 /r | avx2 | Average packed unsigned word integers from ymm2, ymm3/m256 with rounding to ymm1. | 
| VPAVGB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E0 /r | avx512 | Average packed unsigned byte integers from xmm2, and xmm3/m128 with rounding and store to xmm1 under writemask k1. | 
| VPAVGB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E0 /r | avx512 | Average packed unsigned byte integers from ymm2, and ymm3/m256 with rounding and store to ymm1 under writemask k1. | 
| VPAVGB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG E0 /r | avx512 | Average packed unsigned byte integers from zmm2, and zmm3/m512 with rounding and store to zmm1 under writemask k1. | 
| VPAVGW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E3 /r | avx512 | Average packed unsigned word integers from xmm2, xmm3/m128 with rounding to xmm1 under writemask k1. | 
| VPAVGW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E3 /r | avx512 | Average packed unsigned word integers from ymm2, ymm3/m256 with rounding to ymm1 under writemask k1. | 
| VPAVGW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG E3 /r | avx512 | Average packed unsigned word integers from zmm2, zmm3/m512 with rounding to zmm1 under writemask k1. | 
| PBLENDVB xmm1, xmm2/m128, <XMM0> | 66 0F 38 10 /r | sse4.1 | Select byte values from xmm1 and xmm2/m128 from mask specified in the high bit of each byte in XMM0 and store the values into xmm1. | 
| VPBLENDVB xmm1, xmm2, xmm3/m128, xmm4 | VEX.NDS.128.66.0F3A.W0 4C /r /is4 | avx | Select byte values from xmm2 and xmm3/m128 using mask bits in the specified mask register, xmm4, and store the values into xmm1. | 
| VPBLENDVB ymm1, ymm2, ymm3/m256, ymm4 | VEX.NDS.256.66.0F3A.W0 4C /r /is4 | avx2 | Select byte values from ymm2 and ymm3/m256 from mask specified in the high bit of each byte in ymm4 and store the values into ymm1. | 
| PBLENDW xmm1, xmm2/m128, imm8 | 66 0F 3A 0E /r ib | sse4.1 | Select words from xmm1 and xmm2/m128 from mask specified in imm8 and store the values into xmm1. | 
| VPBLENDW xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 0E /r ib | avx | Select words from xmm2 and xmm3/m128 from mask specified in imm8 and store the values into xmm1. | 
| VPBLENDW ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.WIG 0E /r ib | avx2 | Select words from ymm2 and ymm3/m256 from mask specified in imm8 and store the values into ymm1. | 
| PCLMULQDQ xmm1, xmm2/m128, imm8 | 66 0F 3A 44 /r ib | clmul | Carry-less multiplication of one quadword of xmm1 by one quadword of xmm2/m128, stores the 128-bit result in xmm1. The imme-diate is used to determine which quadwords of xmm1 and xmm2/m128 should be used. | 
| VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.WIG 44 /r ib | avx | Carry-less multiplication of one quadword of xmm2 by one quadword of xmm3/m128, stores the 128-bit result in xmm1. The imme-diate is used to determine which quadwords of xmm2 and xmm3/m128 should be used. | 
| PCMPEQB mm, mm/m64 | 0F 74 /r | mmx | Compare packed bytes in mm/m64 and mm for equality. | 
| PCMPEQB xmm1, xmm2/m128 | 66 0F 74 /r | sse2 | Compare packed bytes in xmm2/m128 and xmm1 for equality. | 
| PCMPEQW mm, mm/m64 | 0F 75 /r | mmx | Compare packed words in mm/m64 and mm for equality. | 
| PCMPEQW xmm1, xmm2/m128 | 66 0F 75 /r | sse2 | Compare packed words in xmm2/m128 and xmm1 for equality. | 
| PCMPEQD mm, mm/m64 | 0F 76 /r | mmx | Compare packed doublewords in mm/m64 and mm for equality. | 
| PCMPEQD xmm1, xmm2/m128 | 66 0F 76 /r | sse2 | Compare packed doublewords in xmm2/m128 and xmm1 for equality. | 
| VPCMPEQB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 74 /r | avx | Compare packed bytes in xmm3/m128 and xmm2 for equality. | 
| VPCMPEQW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 75 /r | avx | Compare packed words in xmm3/m128 and xmm2 for equality. | 
| VPCMPEQD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 76 /r | avx | Compare packed doublewords in xmm3/m128 and xmm2 for equality. | 
| VPCMPEQB ymm1, ymm2, ymm3 /m256 | VEX.NDS.256.66.0F.WIG 74 /r | avx2 | Compare packed bytes in ymm3/m256 and ymm2 for equality. | 
| VPCMPEQW ymm1, ymm2, ymm3 /m256 | VEX.NDS.256.66.0F.WIG 75 /r | avx2 | Compare packed words in ymm3/m256 and ymm2 for equality. | 
| VPCMPEQD ymm1, ymm2, ymm3 /m256 | VEX.NDS.256.66.0F.WIG 76 /r | avx2 | Compare packed doublewords in ymm3/m256 and ymm2 for equality. | 
| VPCMPEQD k1 {k2}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 76 /r | avx512 | Compare Equal between int32 vector xmm2 and int32 vector xmm3/m128/m32bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPEQD k1 {k2}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 76 /r | avx512 | Compare Equal between int32 vector ymm2 and int32 vector ymm3/m256/m32bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPEQD k1 {k2}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 76 /r | avx512 | Compare Equal between int32 vectors in zmm2 and zmm3/m512/m32bcst, and set destination k1 according to the comparison results under writemask k2, | 
| VPCMPEQB k1 {k2}, xmm2, xmm3 /m128 | EVEX.NDS.128.66.0F.WIG 74 /r | avx512 | Compare packed bytes in xmm3/m128 and xmm2 for equality and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| PCMPEQQ xmm1, xmm2/m128 | 66 0F 38 29 /r | sse4.1 | Compare packed qwords in xmm2/m128 and xmm1 for equality. | 
| VPCMPEQQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 29 /r | avx | Compare packed quadwords in xmm3/m128 and xmm2 for equality. | 
| VPCMPEQQ ymm1, ymm2, ymm3 /m256 | VEX.NDS.256.66.0F38.WIG 29 /r | avx2 | Compare packed quadwords in ymm3/m256 and ymm2 for equality. | 
| VPCMPEQQ k1 {k2}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 29 /r | avx512 | Compare Equal between int64 vector xmm2 and int64 vector xmm3/m128/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPEQQ k1 {k2}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 29 /r | avx512 | Compare Equal between int64 vector ymm2 and int64 vector ymm3/m256/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPEQQ k1 {k2}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 29 /r | avx512 | Compare Equal between int64 vector zmm2 and int64 vector zmm3/m512/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| PCMPESTRI xmm1, xmm2/m128, imm8 | 66 0F 3A 61 /r imm8 | sse4.2 | Perform a packed comparison of string data with explicit lengths, generating an index, and storing the result in ECX. | 
| VPCMPESTRI xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 61 /r ib | avx | Perform a packed comparison of string data with explicit lengths, generating an index, and storing the result in ECX. | 
| PCMPESTRM xmm1, xmm2/m128, imm8 | 66 0F 3A 60 /r imm8 | sse4.2 | Perform a packed comparison of string data with explicit lengths, generating a mask, and storing the result in XMM0 | 
| VPCMPESTRM xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 60 /r ib | avx | Perform a packed comparison of string data with explicit lengths, generating a mask, and storing the result in XMM0. | 
| PCMPGTB mm, mm/m64 | 0F 64 /r | mmx | Compare packed signed byte integers in mm and mm/m64 for greater than. | 
| PCMPGTB xmm1, xmm2/m128 | 66 0F 64 /r | sse2 | Compare packed signed byte integers in xmm1 and xmm2/m128 for greater than. | 
| PCMPGTW mm, mm/m64 | 0F 65 /r | mmx | Compare packed signed word integers in mm and mm/m64 for greater than. | 
| PCMPGTW xmm1, xmm2/m128 | 66 0F 65 /r | sse2 | Compare packed signed word integers in xmm1 and xmm2/m128 for greater than. | 
| PCMPGTD mm, mm/m64 | 0F 66 /r | mmx | Compare packed signed doubleword integers in mm and mm/m64 for greater than. | 
| PCMPGTD xmm1, xmm2/m128 | 66 0F 66 /r | sse2 | Compare packed signed doubleword integers in xmm1 and xmm2/m128 for greater than. | 
| VPCMPGTB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 64 /r | avx | Compare packed signed byte integers in xmm2 and xmm3/m128 for greater than. | 
| VPCMPGTW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 65 /r | avx | Compare packed signed word integers in xmm2 and xmm3/m128 for greater than. | 
| VPCMPGTD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 66 /r | avx | Compare packed signed doubleword integers in xmm2 and xmm3/m128 for greater than. | 
| VPCMPGTB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 64 /r | avx2 | Compare packed signed byte integers in ymm2 and ymm3/m256 for greater than. | 
| VPCMPGTW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 65 /r | avx2 | Compare packed signed word integers in ymm2 and ymm3/m256 for greater than. | 
| VPCMPGTD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 66 /r | avx2 | Compare packed signed doubleword integers in ymm2 and ymm3/m256 for greater than. | 
| VPCMPGTD k1 {k2}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 66 /r | avx512 | Compare Greater between int32 vector xmm2 and int32 vector xmm3/m128/m32bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPGTD k1 {k2}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 66 /r | avx512 | Compare Greater between int32 vector ymm2 and int32 vector ymm3/m256/m32bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPGTD k1 {k2}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 66 /r | avx512 | Compare Greater between int32 elements in zmm2 and zmm3/m512/m32bcst, and set destination k1 according to the comparison results under writemask. k2. | 
| VPCMPGTB k1 {k2}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 64 /r | avx512 | Compare packed signed byte integers in xmm2 and xmm3/m128 for greater than, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPGTB k1 {k2}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG 64 /r | avx512 | Compare packed signed byte integers in ymm2 and ymm3/m256 for greater than, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| PCMPGTQ xmm1,xmm2/m128 | 66 0F 38 37 /r | sse4.2 | Compare packed signed qwords in xmm2/m128 and xmm1 for greater than. | 
| VPCMPGTQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 37 /r | avx | Compare packed signed qwords in xmm2 and xmm3/m128 for greater than. | 
| VPCMPGTQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 37 /r | avx2 | Compare packed signed qwords in ymm2 and ymm3/m256 for greater than. | 
| VPCMPGTQ k1 {k2}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 37 /r | avx512 | Compare Greater between int64 vector xmm2 and int64 vector xmm3/m128/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPGTQ k1 {k2}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 37 /r | avx512 | Compare Greater between int64 vector ymm2 and int64 vector ymm3/m256/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| VPCMPGTQ k1 {k2}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 37 /r | avx512 | Compare Greater between int64 vector zmm2 and int64 vector zmm3/m512/m64bcst, and set vector mask k1 to reflect the zero/nonzero status of each element of the result, under writemask. | 
| PCMPISTRI xmm1, xmm2/m128, imm8 | 66 0F 3A 63 /r imm8 | sse4.2 | Perform a packed comparison of string data with implicit lengths, generating an index, and storing the result in ECX. | 
| VPCMPISTRI xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 63 /r ib | avx | Perform a packed comparison of string data with implicit lengths, generating an index, and storing the result in ECX. | 
| PCMPISTRM xmm1, xmm2/m128, imm8 | 66 0F 3A 62 /r imm8 | sse4.2 | Perform a packed comparison of string data with implicit lengths, generating a mask, and storing the result in XMM0. | 
| VPCMPISTRM xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 62 /r ib | avx | Perform a packed comparison of string data with implicit lengths, generating a Mask, and storing the result in XMM0. | 
| PDEP r32a, r32b, r/m32 | VEX.NDS.LZ.F2.0F38.W0 F5 /r | bmi2 | Parallel deposit of bits from r32b using mask in r/m32, result is writ-ten to r32a. | 
| PDEP r64a, r64b, r/m64 | VEX.NDS.LZ.F2.0F38.W1 F5 /r | bmi2 | Parallel deposit of bits from r64b using mask in r/m64, result is writ-ten to r64a. | 
| PEXT r32a, r32b, r/m32 | VEX.NDS.LZ.F3.0F38.W0 F5 /r | bmi2 | Parallel extract of bits from r32b using mask in r/m32, result is writ-ten to r32a. | 
| PEXT r64a, r64b, r/m64 | VEX.NDS.LZ.F3.0F38.W1 F5 /r | bmi2 | Parallel extract of bits from r64b using mask in r/m64, result is writ-ten to r64a. | 
| PEXTRB reg/m8, xmm2, imm8 | 66 0F 3A 14 /r ib | sse4.1 | Extract a byte integer value from xmm2 at the source byte offset specified by imm8 into reg or m8. The upper bits of r32 or r64 are zeroed. | 
| PEXTRD r/m32, xmm2, imm8 | 66 0F 3A 16 /r ib | sse4.1 | Extract a dword integer value from xmm2 at the source dword offset specified by imm8 into r/m32. | 
| PEXTRQ r/m64, xmm2, imm8 | 66 REX.W 0F 3A 16 /r ib | sse4.1 | Extract a qword integer value from xmm2 at the source qword offset specified by imm8 into r/m64. | 
| VPEXTRB reg/m8, xmm2, imm8 | VEX.128.66.0F3A.W0 14 /r ib | avx | Extract a byte integer value from xmm2 at the source byte offset specified by imm8 into reg or m8. The upper bits of r64/r32 is filled with zeros. | 
| VPEXTRD r32/m32, xmm2, imm8 | VEX.128.66.0F3A.W0 16 /r ib | avx | Extract a dword integer value from xmm2 at the source dword offset specified by imm8 into r32/m32. | 
| VPEXTRQ r64/m64, xmm2, imm8 | VEX.128.66.0F3A.W1 16 /r ib | avx | Extract a qword integer value from xmm2 at the source dword offset specified by imm8 into r64/m64. | 
| VPEXTRB reg/m8, xmm2, imm8 | EVEX.128.66.0F3A.WIG 14 /r ib | avx512 | Extract a byte integer value from xmm2 at the source byte offset specified by imm8 into reg or m8. The upper bits of r64/r32 is filled with zeros. | 
| VPEXTRD r32/m32, xmm2, imm8 | EVEX.128.66.0F3A.W0 16 /r ib | avx512 | Extract a dword integer value from xmm2 at the source dword offset specified by imm8 into r32/m32. | 
| VPEXTRQ r64/m64, xmm2, imm8 | EVEX.128.66.0F3A.W1 16 /r ib | avx512 | Extract a qword integer value from xmm2 at the source dword offset specified by imm8 into r64/m64. | 
| PEXTRW reg, mm, imm8 | 0F C5 /r ib | sse | Extract the word specified by imm8 from mm and move it to reg, bits 15-0. The upper bits of r32 or r64 is zeroed. | 
| PEXTRW reg, xmm, imm8 | 66 0F C5 /r ib | sse2 | Extract the word specified by imm8 from xmm and move it to reg, bits 15-0. The upper bits of r32 or r64 is zeroed. | 
| PEXTRW reg/m16, xmm, imm8 | 66 0F 3A 15 /r ib | sse4.1 | Extract the word specified by imm8 from xmm and copy it to lowest 16 bits of reg or m16. Zero-extend the result in the destination, r32 or r64. | 
| VPEXTRW reg, xmm1, imm8 | VEX.128.66.0F.W0 C5 /r ib | avx | Extract the word specified by imm8 from xmm1 and move it to reg, bits 15:0. Zero-extend the result. The upper bits of r64/r32 is filled with zeros. | 
| VPEXTRW reg/m16, xmm2, imm8 | VEX.128.66.0F3A.W0 15 /r ib | avx | Extract a word integer value from xmm2 at the source word offset specified by imm8 into reg or m16. The upper bits of r64/r32 is filled with zeros. | 
| VPEXTRW reg, xmm1, imm8 | EVEX.128.66.0F.WIG C5 /r ib | avx512 | Extract the word specified by imm8 from xmm1 and move it to reg, bits 15:0. Zero-extend the result. The upper bits of r64/r32 is filled with zeros. | 
| VPEXTRW reg/m16, xmm2, imm8 | EVEX.128.66.0F3A.WIG 15 /r ib | avx512 | Extract a word integer value from xmm2 at the source word offset specified by imm8 into reg or m16. The upper bits of r64/r32 is filled with zeros. | 
| PHADDSW mm1, mm2/m64 | 0F 38 03 /r | ssse3 | Add 16-bit signed integers horizontally, pack saturated integers to mm1. | 
| PHADDSW xmm1, xmm2/m128 | 66 0F 38 03 /r | ssse3 | Add 16-bit signed integers horizontally, pack saturated integers to xmm1. | 
| VPHADDSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 03 /r | avx | Add 16-bit signed integers horizontally, pack saturated integers to xmm1. | 
| VPHADDSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 03 /r | avx2 | Add 16-bit signed integers horizontally, pack saturated integers to ymm1. | 
| PHADDW mm1, mm2/m64 | 0F 38 01 /r | ssse3 | Add 16-bit integers horizontally, pack to mm1. | 
| PHADDW xmm1, xmm2/m128 | 66 0F 38 01 /r | ssse3 | Add 16-bit integers horizontally, pack to xmm1. | 
| PHADDD mm1, mm2/m64 | 0F 38 02 /r | ssse3 | Add 32-bit integers horizontally, pack to mm1. | 
| PHADDD xmm1, xmm2/m128 | 66 0F 38 02 /r | ssse3 | Add 32-bit integers horizontally, pack to xmm1. | 
| VPHADDW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 01 /r | avx | Add 16-bit integers horizontally, pack to xmm1. | 
| VPHADDD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 02 /r | avx | Add 32-bit integers horizontally, pack to xmm1. | 
| VPHADDW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 01 /r | avx2 | Add 16-bit signed integers horizontally, pack to ymm1. | 
| VPHADDD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 02 /r | avx2 | Add 32-bit signed integers horizontally, pack to ymm1. | 
| PHMINPOSUW xmm1, xmm2/m128 | 66 0F 38 41 /r | sse4.1 | Find the minimum unsigned word in xmm2/m128 and place its value in the low word of xmm1 and its index in the second-lowest word of xmm1. | 
| VPHMINPOSUW xmm1, xmm2/m128 | VEX.128.66.0F38.WIG 41 /r | avx | Find the minimum unsigned word in xmm2/m128 and place its value in the low word of xmm1 and its index in the second-lowest word of xmm1. | 
| PHSUBSW mm1, mm2/m64 | 0F 38 07 /r | ssse3 | Subtract 16-bit signed integer horizontally, pack saturated integers to mm1. | 
| PHSUBSW xmm1, xmm2/m128 | 66 0F 38 07 /r | ssse3 | Subtract 16-bit signed integer horizontally, pack saturated integers to xmm1. | 
| VPHSUBSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 07 /r | avx | Subtract 16-bit signed integer horizontally, pack saturated integers to xmm1. | 
| VPHSUBSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 07 /r | avx2 | Subtract 16-bit signed integer horizontally, pack saturated integers to ymm1. | 
| PHSUBW mm1, mm2/m64 | 0F 38 05 /r | ssse3 | Subtract 16-bit signed integers horizontally, pack to mm1. | 
| PHSUBW xmm1, xmm2/m128 | 66 0F 38 05 /r | ssse3 | Subtract 16-bit signed integers horizontally, pack to xmm1. | 
| PHSUBD mm1, mm2/m64 | 0F 38 06 /r | ssse3 | Subtract 32-bit signed integers horizontally, pack to mm1. | 
| PHSUBD xmm1, xmm2/m128 | 66 0F 38 06 /r | ssse3 | Subtract 32-bit signed integers horizontally, pack to xmm1. | 
| VPHSUBW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 05 /r | avx | Subtract 16-bit signed integers horizontally, pack to xmm1. | 
| VPHSUBD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 06 /r | avx | Subtract 32-bit signed integers horizontally, pack to xmm1. | 
| VPHSUBW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 05 /r | avx2 | Subtract 16-bit signed integers horizontally, pack to ymm1. | 
| VPHSUBD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 06 /r | avx2 | Subtract 32-bit signed integers horizontally, pack to ymm1. | 
| PINSRB xmm1, r32/m8, imm8 | 66 0F 3A 20 /r ib | sse4.1 | Insert a byte integer value from r32/m8 into xmm1 at the destination element in xmm1 specified by imm8. | 
| PINSRD xmm1, r/m32, imm8 | 66 0F 3A 22 /r ib | sse4.1 | Insert a dword integer value from r/m32 into the xmm1 at the destination element specified by imm8. | 
| PINSRQ xmm1, r/m64, imm8 | 66 REX.W 0F 3A 22 /r ib | sse4.1 | Insert a qword integer value from r/m64 into the xmm1 at the destination element specified by imm8. | 
| VPINSRB xmm1, xmm2, r32/m8, imm8 | VEX.NDS.128.66.0F3A.W0 20 /r ib | avx | Merge a byte integer value from r32/m8 and rest from xmm2 into xmm1 at the byte offset in imm8. | 
| VPINSRD xmm1, xmm2, r/m32, imm8 | VEX.NDS.128.66.0F3A.W0 22 /r ib | avx | Insert a dword integer value from r32/m32 and rest from xmm2 into xmm1 at the dword offset in imm8. | 
| VPINSRQ xmm1, xmm2, r/m64, imm8 | VEX.NDS.128.66.0F3A.W1 22 /r ib | avx | Insert a qword integer value from r64/m64 and rest from xmm2 into xmm1 at the qword offset in imm8. | 
| VPINSRB xmm1, xmm2, r32/m8, imm8 | EVEX.NDS.128.66.0F3A.WIG 20 /r ib | avx512 | Merge a byte integer value from r32/m8 and rest from xmm2 into xmm1 at the byte offset in imm8. | 
| VPINSRD xmm1, xmm2, r32/m32, imm8 | EVEX.NDS.128.66.0F3A.W0 22 /r ib | avx512 | Insert a dword integer value from r32/m32 and rest from xmm2 into xmm1 at the dword offset in imm8. | 
| VPINSRQ xmm1, xmm2, r64/m64, imm8 | EVEX.NDS.128.66.0F3A.W1 22 /r ib | avx512 | Insert a qword integer value from r64/m64 and rest from xmm2 into xmm1 at the qword offset in imm8. | 
| PINSRW mm, r32/m16, imm8 | 0F C4 /r ib | sse | Insert the low word from r32 or from m16 into mm at the word position specified by imm8. | 
| PINSRW xmm, r32/m16, imm8 | 66 0F C4 /r ib | sse2 | Move the low word of r32 or from m16 into xmm at the word position specified by imm8. | 
| VPINSRW xmm1, xmm2, r32/m16, imm8 | VEX.NDS.128.66.0F.W0 C4 /r ib | avx512 | Insert a word integer value from r32/m16 and rest from xmm2 into xmm1 at the word offset in imm8. | 
| VPINSRW xmm1, xmm2, r32/m16, imm8 | EVEX.NDS.128.66.0F.WIG C4 /r ib | avx512 | Insert a word integer value from r32/m16 and rest from xmm2 into xmm1 at the word offset in imm8. | 
| PMADDUBSW mm1, mm2/m64 | 0F 38 04 /r | ssse3 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to mm1. | 
| PMADDUBSW xmm1, xmm2/m128 | 66 0F 38 04 /r | ssse3 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to xmm1. | 
| VPMADDUBSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 04 /r | avx | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to xmm1. | 
| VPMADDUBSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 04 /r | avx2 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to ymm1. | 
| VPMADDUBSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 04 /r | avx512 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to xmm1 under writemask k1. | 
| VPMADDUBSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 04 /r | avx512 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to ymm1 under writemask k1. | 
| VPMADDUBSW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 04 /r | avx512 | Multiply signed and unsigned bytes, add horizontal pair of signed words, pack saturated signed-words to zmm1 under writemask k1. | 
| PMADDWD mm, mm/m64 | 0F F5 /r | mmx | Multiply the packed words in mm by the packed words in mm/m64, add adjacent doubleword results, and store in mm. | 
| PMADDWD xmm1, xmm2/m128 | 66 0F F5 /r | sse2 | Multiply the packed word integers in xmm1 by the packed word integers in xmm2/m128, add adjacent doubleword results, and store in xmm1. | 
| VPMADDWD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F5 /r | avx | Multiply the packed word integers in xmm2 by the packed word integers in xmm3/m128, add adjacent doubleword results, and store in xmm1. | 
| VPMADDWD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG F5 /r | avx2 | Multiply the packed word integers in ymm2 by the packed word integers in ymm3/m256, add adjacent doubleword results, and store in ymm1. | 
| VPMADDWD xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG F5 /r | avx512 | Multiply the packed word integers in xmm2 by the packed word integers in xmm3/m128, add adjacent doubleword results, and store in xmm1 under writemask k1. | 
| VPMADDWD ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG F5 /r | avx512 | Multiply the packed word integers in ymm2 by the packed word integers in ymm3/m256, add adjacent doubleword results, and store in ymm1 under writemask k1. | 
| VPMADDWD zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG F5 /r | avx512 | Multiply the packed word integers in zmm2 by the packed word integers in zmm3/m512, add adjacent doubleword results, and store in zmm1 under writemask k1. | 
| PMAXSW mm1, mm2/m64 | 0F EE /r | sse | Compare signed word integers in mm2/m64 and mm1 and return maximum values. | 
| PMAXSB xmm1, xmm2/m128 | 66 0F 38 3C /r | sse4.1 | Compare packed signed byte integers in xmm1 and xmm2/m128 and store packed maximum values in xmm1. | 
| PMAXSW xmm1, xmm2/m128 | 66 0F EE /r | sse2 | Compare packed signed word integers in xmm2/m128 and xmm1 and stores maximum packed values in xmm1. | 
| PMAXSD xmm1, xmm2/m128 | 66 0F 38 3D /r | sse4.1 | Compare packed signed dword integers in xmm1 and xmm2/m128 and store packed maximum values in xmm1. | 
| VPMAXSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 3C /r | avx | Compare packed signed byte integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1. | 
| VPMAXSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG EE /r | avx | Compare packed signed word integers in xmm3/m128 and xmm2 and store packed maximum values in xmm1. | 
| VPMAXSD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 3D /r | avx | Compare packed signed dword integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1. | 
| VPMAXSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 3C /r | avx2 | Compare packed signed byte integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1. | 
| VPMAXSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG EE /r | avx2 | Compare packed signed word integers in ymm3/m256 and ymm2 and store packed maximum values in ymm1. | 
| VPMAXSD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 3D /r | avx2 | Compare packed signed dword integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1. | 
| VPMAXSB xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 3C /r | avx512 | Compare packed signed byte integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXSB ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 3C /r | avx512 | Compare packed signed byte integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXSB zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 3C /r | avx512 | Compare packed signed byte integers in zmm2 and zmm3/m512 and store packed maximum values in zmm1 under writemask k1. | 
| VPMAXSW xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG EE /r | avx512 | Compare packed signed word integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXSW ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG EE /r | avx512 | Compare packed signed word integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXSW zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG EE /r | avx512 | Compare packed signed word integers in zmm2 and zmm3/m512 and store packed maximum values in zmm1 under writemask k1. | 
| VPMAXSD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 3D /r | avx512 | Compare packed signed dword integers in xmm2 and xmm3/m128/m32bcst and store packed maximum values in xmm1 using writemask k1. | 
| VPMAXSD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 3D /r | avx512 | Compare packed signed dword integers in ymm2 and ymm3/m256/m32bcst and store packed maximum values in ymm1 using writemask k1. | 
| VPMAXSD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 3D /r | avx512 | Compare packed signed dword integers in zmm2 and zmm3/m512/m32bcst and store packed maximum values in zmm1 using writemask k1. | 
| VPMAXSQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 3D /r | avx512 | Compare packed signed qword integers in xmm2 and xmm3/m128/m64bcst and store packed maximum values in xmm1 using writemask k1. | 
| VPMAXSQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 3D /r | avx512 | Compare packed signed qword integers in ymm2 and ymm3/m256/m64bcst and store packed maximum values in ymm1 using writemask k1. | 
| VPMAXSQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 3D /r | avx512 | Compare packed signed qword integers in zmm2 and zmm3/m512/m64bcst and store packed maximum values in zmm1 using writemask k1. | 
| PMAXUB mm1, mm2/m64 | 0F DE /r | sse | Compare unsigned byte integers in mm2/m64 and mm1 and returns maximum values. | 
| PMAXUB xmm1, xmm2/m128 | 66 0F DE /r | sse2 | Compare packed unsigned byte integers in xmm1 and xmm2/m128 and store packed maximum values in xmm1. | 
| PMAXUW xmm1, xmm2/m128 | 66 0F 38 3E/r | sse4.1 | Compare packed unsigned word integers in xmm2/m128 and xmm1 and stores maximum packed values in xmm1. | 
| VPMAXUB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F DE /r | avx | Compare packed unsigned byte integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1. | 
| VPMAXUW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38 3E/r | avx | Compare packed unsigned word integers in xmm3/m128 and xmm2 and store maximum packed values in xmm1. | 
| VPMAXUB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F DE /r | avx2 | Compare packed unsigned byte integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1. | 
| VPMAXUW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38 3E/r | avx2 | Compare packed unsigned word integers in ymm3/m256 and ymm2 and store maximum packed values in ymm1. | 
| VPMAXUB xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG DE /r | avx512 | Compare packed unsigned byte integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXUB ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG DE /r | avx512 | Compare packed unsigned byte integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXUB zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG DE /r | avx512 | Compare packed unsigned byte integers in zmm2 and zmm3/m512 and store packed maximum values in zmm1 under writemask k1. | 
| VPMAXUW xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 3E /r | avx512 | Compare packed unsigned word integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXUW ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 3E /r | avx512 | Compare packed unsigned word integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXUW zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 3E /r | avx512 | Compare packed unsigned word integers in zmm2 and zmm3/m512 and store packed maximum values in zmm1 under writemask k1. | 
| PMAXUD xmm1, xmm2/m128 | 66 0F 38 3F /r | sse4.1 | Compare packed unsigned dword integers in xmm1 and xmm2/m128 and store packed maximum values in xmm1. | 
| VPMAXUD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 3F /r | avx | Compare packed unsigned dword integers in xmm2 and xmm3/m128 and store packed maximum values in xmm1. | 
| VPMAXUD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 3F /r | avx2 | Compare packed unsigned dword integers in ymm2 and ymm3/m256 and store packed maximum values in ymm1. | 
| VPMAXUD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 3F /r | avx512 | Compare packed unsigned dword integers in xmm2 and xmm3/m128/m32bcst and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXUD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 3F /r | avx512 | Compare packed unsigned dword integers in ymm2 and ymm3/m256/m32bcst and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXUD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 3F /r | avx512 | Compare packed unsigned dword integers in zmm2 and zmm3/m512/m32bcst and store packed maximum values in zmm1 under writemask k1. | 
| VPMAXUQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 3F /r | avx512 | Compare packed unsigned qword integers in xmm2 and xmm3/m128/m64bcst and store packed maximum values in xmm1 under writemask k1. | 
| VPMAXUQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 3F /r | avx512 | Compare packed unsigned qword integers in ymm2 and ymm3/m256/m64bcst and store packed maximum values in ymm1 under writemask k1. | 
| VPMAXUQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 3F /r | avx512 | Compare packed unsigned qword integers in zmm2 and zmm3/m512/m64bcst and store packed maximum values in zmm1 under writemask k1. | 
| PMINSW mm1, mm2/m64 | 0F EA /r | sse | Compare signed word integers in mm2/m64 and mm1 and return minimum values. | 
| PMINSB xmm1, xmm2/m128 | 66 0F 38 38 /r | sse4.1 | Compare packed signed byte integers in xmm1 and xmm2/m128 and store packed minimum values in xmm1. | 
| PMINSW xmm1, xmm2/m128 | 66 0F EA /r | sse2 | Compare packed signed word integers in xmm2/m128 and xmm1 and store packed minimum values in xmm1. | 
| VPMINSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38 38 /r | avx | Compare packed signed byte integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1. | 
| VPMINSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F EA /r | avx | Compare packed signed word integers in xmm3/m128 and xmm2 and return packed minimum values in xmm1. | 
| VPMINSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38 38 /r | avx2 | Compare packed signed byte integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1. | 
| VPMINSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F EA /r | avx2 | Compare packed signed word integers in ymm3/m256 and ymm2 and return packed minimum values in ymm1. | 
| VPMINSB xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 38 /r | avx512 | Compare packed signed byte integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1 under writemask k1. | 
| VPMINSB ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 38 /r | avx512 | Compare packed signed byte integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1 under writemask k1. | 
| VPMINSB zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 38 /r | avx512 | Compare packed signed byte integers in zmm2 and zmm3/m512 and store packed minimum values in zmm1 under writemask k1. | 
| VPMINSW xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG EA /r | avx512 | Compare packed signed word integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1 under writemask k1. | 
| VPMINSW ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG EA /r | avx512 | Compare packed signed word integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1 under writemask k1. | 
| VPMINSW zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG EA /r | avx512 | Compare packed signed word integers in zmm2 and zmm3/m512 and store packed minimum values in zmm1 under writemask k1. | 
| PMINSD xmm1, xmm2/m128 | 66 0F 38 39 /r | sse4.1 | Compare packed signed dword integers in xmm1 and xmm2/m128 and store packed minimum values in xmm1. | 
| VPMINSD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 39 /r | avx | Compare packed signed dword integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1. | 
| VPMINSD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 39 /r | avx2 | Compare packed signed dword integers in ymm2 and ymm3/m128 and store packed minimum values in ymm1. | 
| VPMINSD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 39 /r | avx512 | Compare packed signed dword integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1 under writemask k1. | 
| VPMINSD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 39 /r | avx512 | Compare packed signed dword integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1 under writemask k1. | 
| VPMINSD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 39 /r | avx512 | Compare packed signed dword integers in zmm2 and zmm3/m512/m32bcst and store packed minimum values in zmm1 under writemask k1. | 
| VPMINSQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 39 /r | avx512 | Compare packed signed qword integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1 under writemask k1. | 
| VPMINSQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 39 /r | avx512 | Compare packed signed qword integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1 under writemask k1. | 
| VPMINSQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 39 /r | avx512 | Compare packed signed qword integers in zmm2 and zmm3/m512/m64bcst and store packed minimum values in zmm1 under writemask k1. | 
| PMINUB mm1, mm2/m64 | 0F DA /r | sse | Compare unsigned byte integers in mm2/m64 and mm1 and returns minimum values. | 
| PMINUB xmm1, xmm2/m128 | 66 0F DA /r | sse2 | Compare packed unsigned byte integers in xmm1 and xmm2/m128 and store packed minimum values in xmm1. | 
| PMINUW xmm1, xmm2/m128 | 66 0F 38 3A/r | sse4.1 | Compare packed unsigned word integers in xmm2/m128 and xmm1 and store packed minimum values in xmm1. | 
| VPMINUB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F DA /r | avx | Compare packed unsigned byte integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1. | 
| VPMINUW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38 3A/r | avx | Compare packed unsigned word integers in xmm3/m128 and xmm2 and return packed minimum values in xmm1. | 
| VPMINUB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F DA /r | avx2 | Compare packed unsigned byte integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1. | 
| VPMINUW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38 3A/r | avx2 | Compare packed unsigned word integers in ymm3/m256 and ymm2 and return packed minimum values in ymm1. | 
| VPMINUB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F DA /r | avx512 | Compare packed unsigned byte integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1 under writemask k1. | 
| VPMINUB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F DA /r | avx512 | Compare packed unsigned byte integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1 under writemask k1. | 
| VPMINUB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F DA /r | avx512 | Compare packed unsigned byte integers in zmm2 and zmm3/m512 and store packed minimum values in zmm1 under writemask k1. | 
| VPMINUW xmm1{k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38 3A/r | avx512 | Compare packed unsigned word integers in xmm3/m128 and xmm2 and return packed minimum values in xmm1 under writemask k1. | 
| VPMINUW ymm1{k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38 3A/r | avx512 | Compare packed unsigned word integers in ymm3/m256 and ymm2 and return packed minimum values in ymm1 under writemask k1. | 
| VPMINUW zmm1{k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38 3A/r | avx512 | Compare packed unsigned word integers in zmm3/m512 and zmm2 and return packed minimum values in zmm1 under writemask k1. | 
| PMINUD xmm1, xmm2/m128 | 66 0F 38 3B /r | sse4.1 | Compare packed unsigned dword integers in xmm1 and xmm2/m128 and store packed minimum values in xmm1. | 
| VPMINUD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 3B /r | avx | Compare packed unsigned dword integers in xmm2 and xmm3/m128 and store packed minimum values in xmm1. | 
| VPMINUD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 3B /r | avx2 | Compare packed unsigned dword integers in ymm2 and ymm3/m256 and store packed minimum values in ymm1. | 
| VPMINUD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 3B /r | avx512 | Compare packed unsigned dword integers in xmm2 and xmm3/m128/m32bcst and store packed minimum values in xmm1 under writemask k1. | 
| VPMINUD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 3B /r | avx512 | Compare packed unsigned dword integers in ymm2 and ymm3/m256/m32bcst and store packed minimum values in ymm1 under writemask k1. | 
| VPMINUD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 3B /r | avx512 | Compare packed unsigned dword integers in zmm2 and zmm3/m512/m32bcst and store packed minimum values in zmm1 under writemask k1. | 
| VPMINUQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 3B /r | avx512 | Compare packed unsigned qword integers in xmm2 and xmm3/m128/m64bcst and store packed minimum values in xmm1 under writemask k1. | 
| VPMINUQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 3B /r | avx512 | Compare packed unsigned qword integers in ymm2 and ymm3/m256/m64bcst and store packed minimum values in ymm1 under writemask k1. | 
| VPMINUQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 3B /r | avx512 | Compare packed unsigned qword integers in zmm2 and zmm3/m512/m64bcst and store packed minimum values in zmm1 under writemask k1. | 
| PMOVMSKB reg, mm | 0F D7 /r | sse | Move a byte mask of mm to reg. The upper bits of r32 or r64 are zeroed | 
| PMOVMSKB reg, xmm | 66 0F D7 /r | sse2 | Move a byte mask of xmm to reg. The upper bits of r32 or r64 are zeroed | 
| VPMOVMSKB reg, xmm1 | VEX.128.66.0F.WIG D7 /r | avx | Move a byte mask of xmm1 to reg. The upper bits of r32 or r64 are filled with zeros. | 
| VPMOVMSKB reg, ymm1 | VEX.256.66.0F.WIG D7 /r | avx2 | Move a 32-bit mask of ymm1 to reg. The upper bits of r64 are filled with zeros. | 
| PMOVSXBW xmm1, xmm2/m64 | 66 0f 38 20 /r | sse4.1 | Sign extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 16-bit integers in xmm1. | 
| PMOVSXBD xmm1, xmm2/m32 | 66 0f 38 21 /r | sse4.1 | Sign extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 32-bit integers in xmm1. | 
| PMOVSXBQ xmm1, xmm2/m16 | 66 0f 38 22 /r | sse4.1 | Sign extend 2 packed 8-bit integers in the low 2 bytes of xmm2/m16 to 2 packed 64-bit integers in xmm1. | 
| PMOVSXWD xmm1, xmm2/m64 | 66 0f 38 23/r | sse4.1 | Sign extend 4 packed 16-bit integers in the low 8 bytes of xmm2/m64 to 4 packed 32-bit integers in xmm1. | 
| PMOVSXWQ xmm1, xmm2/m32 | 66 0f 38 24 /r | sse4.1 | Sign extend 2 packed 16-bit integers in the low 4 bytes of xmm2/m32 to 2 packed 64-bit integers in xmm1. | 
| PMOVSXDQ xmm1, xmm2/m64 | 66 0f 38 25 /r | sse4.1 | Sign extend 2 packed 32-bit integers in the low 8 bytes of xmm2/m64 to 2 packed 64-bit integers in xmm1. | 
| VPMOVSXBW xmm1, xmm2/m64 | VEX.128.66.0F38.WIG 20 /r | avx | Sign extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 16-bit integers in xmm1. | 
| VPMOVSXBD xmm1, xmm2/m32 | VEX.128.66.0F38.WIG 21 /r | avx | Sign extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 32-bit integers in xmm1. | 
| VPMOVSXBQ xmm1, xmm2/m16 | VEX.128.66.0F38.WIG 22 /r | avx | Sign extend 2 packed 8-bit integers in the low 2 bytes of xmm2/m16 to 2 packed 64-bit integers in xmm1. | 
| VPMOVSXWD xmm1, xmm2/m64 | VEX.128.66.0F38.WIG 23 /r | avx | Sign extend 4 packed 16-bit integers in the low 8 bytes of xmm2/m64 to 4 packed 32-bit integers in xmm1. | 
| VPMOVSXWQ xmm1, xmm2/m32 | VEX.128.66.0F38.WIG 24 /r | avx | Sign extend 2 packed 16-bit integers in the low 4 bytes of xmm2/m32 to 2 packed 64-bit integers in xmm1. | 
| VPMOVSXDQ xmm1, xmm2/m64 | VEX.128.66.0F38.WIG 25 /r | avx | Sign extend 2 packed 32-bit integers in the low 8 bytes of xmm2/m64 to 2 packed 64-bit integers in xmm1. | 
| VPMOVSXBW ymm1, xmm2/m128 | VEX.256.66.0F38.WIG 20 /r | avx2 | Sign extend 16 packed 8-bit integers in xmm2/m128 to 16 packed 16-bit integers in ymm1. | 
| VPMOVSXBD ymm1, xmm2/m64 | VEX.256.66.0F38.WIG 21 /r | avx2 | Sign extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 32-bit integers in ymm1. | 
| VPMOVSXBQ ymm1, xmm2/m32 | VEX.256.66.0F38.WIG 22 /r | avx2 | Sign extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 64-bit integers in ymm1. | 
| VPMOVSXWD ymm1, xmm2/m128 | VEX.256.66.0F38.WIG 23 /r | avx2 | Sign extend 8 packed 16-bit integers in the low 16 bytes of xmm2/m128 to 8 packed 32-bit integers in ymm1. | 
| PMOVZXBW xmm1, xmm2/m64 | 66 0f 38 30 /r | sse4.1 | Zero extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 16-bit integers in xmm1. | 
| PMOVZXBD xmm1, xmm2/m32 | 66 0f 38 31 /r | sse4.1 | Zero extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 32-bit integers in xmm1. | 
| PMOVZXBQ xmm1, xmm2/m16 | 66 0f 38 32 /r | sse4.1 | Zero extend 2 packed 8-bit integers in the low 2 bytes of xmm2/m16 to 2 packed 64-bit integers in xmm1. | 
| PMOVZXWD xmm1, xmm2/m64 | 66 0f 38 33 /r | sse4.1 | Zero extend 4 packed 16-bit integers in the low 8 bytes of xmm2/m64 to 4 packed 32-bit integers in xmm1. | 
| PMOVZXWQ xmm1, xmm2/m32 | 66 0f 38 34 /r | sse4.1 | Zero extend 2 packed 16-bit integers in the low 4 bytes of xmm2/m32 to 2 packed 64-bit integers in xmm1. | 
| PMOVZXDQ xmm1, xmm2/m64 | 66 0f 38 35 /r | sse4.1 | Zero extend 2 packed 32-bit integers in the low 8 bytes of xmm2/m64 to 2 packed 64-bit integers in xmm1. | 
| VPMOVZXBW xmm1, xmm2/m64 | VEX.128.66.0F38.WIG 30 /r | avx | Zero extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 16-bit integers in xmm1. | 
| VPMOVZXBD xmm1, xmm2/m32 | VEX.128.66.0F38.WIG 31 /r | avx | Zero extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 32-bit integers in xmm1. | 
| VPMOVZXBQ xmm1, xmm2/m16 | VEX.128.66.0F38.WIG 32 /r | avx | Zero extend 2 packed 8-bit integers in the low 2 bytes of xmm2/m16 to 2 packed 64-bit integers in xmm1. | 
| VPMOVZXWD xmm1, xmm2/m64 | VEX.128.66.0F38.WIG 33 /r | avx | Zero extend 4 packed 16-bit integers in the low 8 bytes of xmm2/m64 to 4 packed 32-bit integers in xmm1. | 
| VPMOVZXWQ xmm1, xmm2/m32 | VEX.128.66.0F38.WIG 34 /r | avx | Zero extend 2 packed 16-bit integers in the low 4 bytes of xmm2/m32 to 2 packed 64-bit integers in xmm1. | 
| VPMOVZXDQ xmm1, xmm2/m64 | VEX.128.66.0F 38.WIG 35 /r | avx | Zero extend 2 packed 32-bit integers in the low 8 bytes of xmm2/m64 to 2 packed 64-bit integers in xmm1. | 
| VPMOVZXBW ymm1, xmm2/m128 | VEX.256.66.0F38.WIG 30 /r | avx2 | Zero extend 16 packed 8-bit integers in xmm2/m128 to 16 packed 16-bit integers in ymm1. | 
| VPMOVZXBD ymm1, xmm2/m64 | VEX.256.66.0F38.WIG 31 /r | avx2 | Zero extend 8 packed 8-bit integers in the low 8 bytes of xmm2/m64 to 8 packed 32-bit integers in ymm1. | 
| VPMOVZXBQ ymm1, xmm2/m32 | VEX.256.66.0F38.WIG 32 /r | avx2 | Zero extend 4 packed 8-bit integers in the low 4 bytes of xmm2/m32 to 4 packed 64-bit integers in ymm1. | 
| VPMOVZXWD ymm1, xmm2/m128 | VEX.256.66.0F38.WIG 33 /r | avx2 | Zero extend 8 packed 16-bit integers xmm2/m128 to 8 packed 32-bit integers in ymm1. | 
| PMULDQ xmm1, xmm2/m128 | 66 0F 38 28 /r | sse4.1 | Multiply packed signed doubleword integers in xmm1 by packed signed doubleword integers in xmm2/m128, and store the quadword results in xmm1. | 
| VPMULDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 28 /r | avx | Multiply packed signed doubleword integers in xmm2 by packed signed doubleword integers in xmm3/m128, and store the quadword results in xmm1. | 
| VPMULDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 28 /r | avx2 | Multiply packed signed doubleword integers in ymm2 by packed signed doubleword integers in ymm3/m256, and store the quadword results in ymm1. | 
| VPMULDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 28 /r | avx512 | Multiply packed signed doubleword integers in xmm2 by packed signed doubleword integers in xmm3/m128/m64bcst, and store the quadword results in xmm1 using writemask k1. | 
| VPMULDQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 28 /r | avx512 | Multiply packed signed doubleword integers in ymm2 by packed signed doubleword integers in ymm3/m256/m64bcst, and store the quadword results in ymm1 using writemask k1. | 
| VPMULDQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 28 /r | avx512 | Multiply packed signed doubleword integers in zmm2 by packed signed doubleword integers in zmm3/m512/m64bcst, and store the quadword results in zmm1 using writemask k1. | 
| PMULHRSW mm1, mm2/m64 | 0F 38 0B /r | ssse3 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to mm1. | 
| PMULHRSW xmm1, xmm2/m128 | 66 0F 38 0B /r | ssse3 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1. | 
| VPMULHRSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 0B /r | avx | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1. | 
| VPMULHRSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 0B /r | avx2 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to ymm1. | 
| VPMULHRSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 0B /r | avx512 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to xmm1 under writemask k1. | 
| VPMULHRSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 0B /r | avx512 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to ymm1 under writemask k1. | 
| VPMULHRSW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 0B /r | avx512 | Multiply 16-bit signed words, scale and round signed doublewords, pack high 16 bits to zmm1 under writemask k1. | 
| PMULHUW mm1, mm2/m64 | 0F E4 /r | sse | Multiply the packed unsigned word integers in mm1 register and mm2/m64, and store the high 16 bits of the results in mm1. | 
| PMULHUW xmm1, xmm2/m128 | 66 0F E4 /r | sse2 | Multiply the packed unsigned word integers in xmm1 and xmm2/m128, and store the high 16 bits of the results in xmm1. | 
| VPMULHUW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E4 /r | avx | Multiply the packed unsigned word integers in xmm2 and xmm3/m128, and store the high 16 bits of the results in xmm1. | 
| VPMULHUW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E4 /r | avx2 | Multiply the packed unsigned word integers in ymm2 and ymm3/m256, and store the high 16 bits of the results in ymm1. | 
| VPMULHUW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E4 /r | avx512 | Multiply the packed unsigned word integers in xmm2 and xmm3/m128, and store the high 16 bits of the results in xmm1 under writemask k1. | 
| VPMULHUW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E4 /r | avx512 | Multiply the packed unsigned word integers in ymm2 and ymm3/m256, and store the high 16 bits of the results in ymm1 under writemask k1. | 
| VPMULHUW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG E4 /r | avx512 | Multiply the packed unsigned word integers in zmm2 and zmm3/m512, and store the high 16 bits of the results in zmm1 under writemask k1. | 
| PMULHW mm, mm/m64 | 0F E5 /r | mmx | Multiply the packed signed word integers in mm1 register and mm2/m64, and store the high 16 bits of the results in mm1. | 
| PMULHW xmm1, xmm2/m128 | 66 0F E5 /r | sse2 | Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the high 16 bits of the results in xmm1. | 
| VPMULHW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E5 /r | avx | Multiply the packed signed word integers in xmm2 and xmm3/m128, and store the high 16 bits of the results in xmm1. | 
| VPMULHW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E5 /r | avx2 | Multiply the packed signed word integers in ymm2 and ymm3/m256, and store the high 16 bits of the results in ymm1. | 
| VPMULHW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E5 /r | avx512 | Multiply the packed signed word integers in xmm2 and xmm3/m128, and store the high 16 bits of the results in xmm1 under writemask k1. | 
| VPMULHW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E5 /r | avx512 | Multiply the packed signed word integers in ymm2 and ymm3/m256, and store the high 16 bits of the results in ymm1 under writemask k1. | 
| VPMULHW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG E5 /r | avx512 | Multiply the packed signed word integers in zmm2 and zmm3/m512, and store the high 16 bits of the results in zmm1 under writemask k1. | 
| PMULLD xmm1, xmm2/m128 | 66 0F 38 40 /r | sse4.1 | Multiply the packed dword signed integers in xmm1 and xmm2/m128 and store the low 32 bits of each product in xmm1. | 
| VPMULLD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 40 /r | avx | Multiply the packed dword signed integers in xmm2 and xmm3/m128 and store the low 32 bits of each product in xmm1. | 
| VPMULLD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 40 /r | avx2 | Multiply the packed dword signed integers in ymm2 and ymm3/m256 and store the low 32 bits of each product in ymm1. | 
| VPMULLD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 40 /r | avx512 | Multiply the packed dword signed integers in xmm2 and xmm3/m128/m32bcst and store the low 32 bits of each product in xmm1 under writemask k1. | 
| VPMULLD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 40 /r | avx512 | Multiply the packed dword signed integers in ymm2 and ymm3/m256/m32bcst and store the low 32 bits of each product in ymm1 under writemask k1. | 
| VPMULLD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 40 /r | avx512 | Multiply the packed dword signed integers in zmm2 and zmm3/m512/m32bcst and store the low 32 bits of each product in zmm1 under writemask k1. | 
| VPMULLQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 40 /r | avx512 | Multiply the packed qword signed integers in xmm2 and xmm3/m128/m64bcst and store the low 64 bits of each product in xmm1 under writemask k1. | 
| VPMULLQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 40 /r | avx512 | Multiply the packed qword signed integers in ymm2 and ymm3/m256/m64bcst and store the low 64 bits of each product in ymm1 under writemask k1. | 
| VPMULLQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 40 /r | avx512 | Multiply the packed qword signed integers in zmm2 and zmm3/m512/m64bcst and store the low 64 bits of each product in zmm1 under writemask k1. | 
| PMULLW mm, mm/m64 | 0F D5 /r | mmx | Multiply the packed signed word integers in mm1 register and mm2/m64, and store the low 16 bits of the results in mm1. | 
| PMULLW xmm1, xmm2/m128 | 66 0F D5 /r | sse2 | Multiply the packed signed word integers in xmm1 and xmm2/m128, and store the low 16 bits of the results in xmm1. | 
| VPMULLW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D5 /r | avx | Multiply the packed dword signed integers in xmm2 and xmm3/m128 and store the low 32 bits of each product in xmm1. | 
| VPMULLW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG D5 /r | avx2 | Multiply the packed signed word integers in ymm2 and ymm3/m256, and store the low 16 bits of the results in ymm1. | 
| VPMULLW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG D5 /r | avx512 | Multiply the packed signed word integers in xmm2 and xmm3/m128, and store the low 16 bits of the results in xmm1 under writemask k1. | 
| VPMULLW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG D5 /r | avx512 | Multiply the packed signed word integers in ymm2 and ymm3/m256, and store the low 16 bits of the results in ymm1 under writemask k1. | 
| VPMULLW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG D5 /r | avx512 | Multiply the packed signed word integers in zmm2 and zmm3/m512, and store the low 16 bits of the results in zmm1 under writemask k1. | 
| PMULUDQ mm1, mm2/m64 | 0F F4 /r | sse2 | Multiply unsigned doubleword integer in mm1 by unsigned doubleword integer in mm2/m64, and store the quadword result in mm1. | 
| PMULUDQ xmm1, xmm2/m128 | 66 0F F4 /r | sse2 | Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1. | 
| VPMULUDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F4 /r | avx | Multiply packed unsigned doubleword integers in xmm2 by packed unsigned doubleword integers in xmm3/m128, and store the quadword results in xmm1. | 
| VPMULUDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG F4 /r | avx2 | Multiply packed unsigned doubleword integers in ymm2 by packed unsigned doubleword integers in ymm3/m256, and store the quadword results in ymm1. | 
| VPMULUDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 F4 /r | avx512 | Multiply packed unsigned doubleword integers in xmm2 by packed unsigned doubleword integers in xmm3/m128/m64bcst, and store the quadword results in xmm1 under writemask k1. | 
| VPMULUDQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 F4 /r | avx512 | Multiply packed unsigned doubleword integers in ymm2 by packed unsigned doubleword integers in ymm3/m256/m64bcst, and store the quadword results in ymm1 under writemask k1. | 
| VPMULUDQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 F4 /r | avx512 | Multiply packed unsigned doubleword integers in zmm2 by packed unsigned doubleword integers in zmm3/m512/m64bcst, and store the quadword results in zmm1 under writemask k1. | 
| POP r/m16 | 8F /0 | Pop top of stack into m16; increment stack pointer. | |
| POP r/m32 | 8F /0 | Pop top of stack into m32; increment stack pointer. | |
| POP r/m64 | 8F /0 | Pop top of stack into m64; increment stack pointer. Cannot encode 32-bit operand size. | |
| POP r16 | 58+ rw | Pop top of stack into r16; increment stack pointer. | |
| POP r32 | 58+ rd | Pop top of stack into r32; increment stack pointer. | |
| POP r64 | 58+ rd | Pop top of stack into r64; increment stack pointer. Cannot encode 32-bit operand size. | |
| POP DS | 1F | Pop top of stack into DS; increment stack pointer. | |
| POP ES | 07 | Pop top of stack into ES; increment stack pointer. | |
| POP SS | 17 | Pop top of stack into SS; increment stack pointer. | |
| POP FS | 0F A1 | Pop top of stack into FS; increment stack pointer by 16 bits. | |
| POP FS | 0F A1 | Pop top of stack into FS; increment stack pointer by 32 bits. | |
| POP FS | 0F A1 | Pop top of stack into FS; increment stack pointer by 64 bits. | |
| POP GS | 0F A9 | Pop top of stack into GS; increment stack pointer by 16 bits. | |
| POP GS | 0F A9 | Pop top of stack into GS; increment stack pointer by 32 bits. | |
| POP GS | 0F A9 | Pop top of stack into GS; increment stack pointer by 64 bits. | |
| POPA | 61 | Pop DI, SI, BP, BX, DX, CX, and AX. | |
| POPAD | 61 | Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX. | |
| POPCNT r16, r/m16 | F3 0F B8 /r | POPCNT on r/m16 | |
| POPCNT r32, r/m32 | F3 0F B8 /r | POPCNT on r/m32 | |
| POPCNT r64, r/m64 | F3 REX.W 0F B8 /r | POPCNT on r/m64 | |
| POPF | 9D | Pop top of stack into lower 16 bits of EFLAGS. | |
| POPFD | 9D | Pop top of stack into EFLAGS. | |
| POPFQ | 9D | Pop top of stack and zero-extend into RFLAGS. | |
| POR mm, mm/m64 | 0F EB /r | mmx | Bitwise OR of mm/m64 and mm. | 
| POR xmm1, xmm2/m128 | 66 0F EB /r | sse2 | Bitwise OR of xmm2/m128 and xmm1. | 
| VPOR xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG EB /r | avx | Bitwise OR of xmm2/m128 and xmm3. | 
| VPOR ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG EB /r | avx2 | Bitwise OR of ymm2/m256 and ymm3. | 
| VPORD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 EB /r | avx512 | Bitwise OR of packed doubleword integers in xmm2 and xmm3/m128/m32bcst using writemask k1. | 
| VPORD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 EB /r | avx512 | Bitwise OR of packed doubleword integers in ymm2 and ymm3/m256/m32bcst using writemask k1. | 
| VPORD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 EB /r | avx512 | Bitwise OR of packed doubleword integers in zmm2 and zmm3/m512/m32bcst using writemask k1. | 
| VPORQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 EB /r | avx512 | Bitwise OR of packed quadword integers in xmm2 and xmm3/m128/m64bcst using writemask k1. | 
| VPORQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 EB /r | avx512 | Bitwise OR of packed quadword integers in ymm2 and ymm3/m256/m64bcst using writemask k1. | 
| VPORQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 EB /r | avx512 | Bitwise OR of packed quadword integers in zmm2 and zmm3/m512/m64bcst using writemask k1. | 
| PREFETCHW m8 | 0F 0D /1 | prfchw | Move data from m8 closer to the processor in anticipation of a write. | 
| PREFETCHWT1 m8 | 0F 0D /2 | prefetchwt1 | Move data from m8 closer to the processor using T1 hint with intent to write. | 
| PREFETCHT0 m8 | 0F 18 /1 | Move data from m8 closer to the processor using T0 hint. | |
| PREFETCHT1 m8 | 0F 18 /2 | Move data from m8 closer to the processor using T1 hint. | |
| PREFETCHT2 m8 | 0F 18 /3 | Move data from m8 closer to the processor using T2 hint. | |
| PREFETCHNTA m8 | 0F 18 /0 | Move data from m8 closer to the processor using NTA hint. | |
| VPROLVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 15 /r | avx512 | Rotate doublewords in xmm2 left by count in the corresponding element of xmm3/m128/m32bcst. Result written to xmm1 under writemask k1. | 
| VPROLD xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.NDD.128.66.0F.W0 72 /1 ib | avx512 | Rotate doublewords in xmm2/m128/m32bcst left by imm8. Result written to xmm1 using writemask k1. | 
| VPROLVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 15 /r | avx512 | Rotate quadwords in xmm2 left by count in the corresponding element of xmm3/m128/m64bcst. Result written to xmm1 under writemask k1. | 
| VPROLQ xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.NDD.128.66.0F.W1 72 /1 ib | avx512 | Rotate quadwords in xmm2/m128/m64bcst left by imm8. Result written to xmm1 using writemask k1. | 
| VPROLVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 15 /r | avx512 | Rotate doublewords in ymm2 left by count in the corresponding element of ymm3/m256/m32bcst. Result written to ymm1 under writemask k1. | 
| VPROLD ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.NDD.256.66.0F.W0 72 /1 ib | avx512 | Rotate doublewords in ymm2/m256/m32bcst left by imm8. Result written to ymm1 using writemask k1. | 
| VPROLVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 15 /r | avx512 | Rotate quadwords in ymm2 left by count in the corresponding element of ymm3/m256/m64bcst. Result written to ymm1 under writemask k1. | 
| VPROLQ ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.NDD.256.66.0F.W1 72 /1 ib | avx512 | Rotate quadwords in ymm2/m256/m64bcst left by imm8. Result written to ymm1 using writemask k1. | 
| VPROLVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 15 /r | avx512 | Rotate left of doublewords in zmm2 by count in the corresponding element of zmm3/m512/m32bcst. Result written to zmm1 using writemask k1. | 
| VPROLD zmm1 {k1}{z}, zmm2/m512/m32bcst, imm8 | EVEX.NDD.512.66.0F.W0 72 /1 ib | avx512 | Rotate left of doublewords in zmm3/m512/m32bcst by imm8. Result written to zmm1 using writemask k1. | 
| VPROLVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 15 /r | avx512 | Rotate quadwords in zmm2 left by count in the corresponding element of zmm3/m512/m64bcst. Result written to zmm1under writemask k1. | 
| VPROLQ zmm1 {k1}{z}, zmm2/m512/m64bcst, imm8 | EVEX.NDD.512.66.0F.W1 72 /1 ib | avx512 | Rotate quadwords in zmm2/m512/m64bcst left by imm8. Result written to zmm1 using writemask k1. | 
| VPRORVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 14 /r | avx512 | Rotate doublewords in xmm2 right by count in the corresponding element of xmm3/m128/m32bcst, store result using writemask k1. | 
| VPRORD xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.NDD.128.66.0F.W0 72 /0 ib | avx512 | Rotate doublewords in xmm2/m128/m32bcst right by imm8, store result using writemask k1. | 
| VPRORVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 14 /r | avx512 | Rotate quadwords in xmm2 right by count in the corresponding element of xmm3/m128/m64bcst, store result using writemask k1. | 
| VPRORQ xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.NDD.128.66.0F.W1 72 /0 ib | avx512 | Rotate quadwords in xmm2/m128/m64bcst right by imm8, store result using writemask k1. | 
| VPRORVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 14 /r | avx512 | Rotate doublewords in ymm2 right by count in the corresponding element of ymm3/m256/m32bcst, store using result writemask k1. | 
| VPRORD ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.NDD.256.66.0F.W0 72 /0 ib | avx512 | Rotate doublewords in ymm2/m256/m32bcst right by imm8, store result using writemask k1. | 
| VPRORVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 14 /r | avx512 | Rotate quadwords in ymm2 right by count in the corresponding element of ymm3/m256/m64bcst, store result using writemask k1. | 
| VPRORQ ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.NDD.256.66.0F.W1 72 /0 ib | avx512 | Rotate quadwords in ymm2/m256/m64bcst right by imm8, store result using writemask k1. | 
| VPRORVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 14 /r | avx512 | Rotate doublewords in zmm2 right by count in the corresponding element of zmm3/m512/m32bcst, store result using writemask k1. | 
| VPRORD zmm1 {k1}{z}, zmm2/m512/m32bcst, imm8 | EVEX.NDD.512.66.0F.W0 72 /0 ib | avx512 | Rotate doublewords in zmm2/m512/m32bcst right by imm8, store result using writemask k1. | 
| VPRORVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 14 /r | avx512 | Rotate quadwords in zmm2 right by count in the corresponding element of zmm3/m512/m64bcst, store result using writemask k1. | 
| VPRORQ zmm1 {k1}{z}, zmm2/m512/m64bcst, imm8 | EVEX.NDD.512.66.0F.W1 72 /0 ib | avx512 | Rotate quadwords in zmm2/m512/m64bcst right by imm8, store result using writemask k1. | 
| PSADBW mm1, mm2/m64 | 0F F6 /r | sse | Computes the absolute differences of the packed unsigned byte integers from mm2 /m64 and mm1; differences are then summed to produce an unsigned word integer result. | 
| PSADBW xmm1, xmm2/m128 | 66 0F F6 /r | sse2 | Computes the absolute differences of the packed unsigned byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two unsigned word integer results. | 
| VPSADBW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F6 /r | avx | Computes the absolute differences of the packed unsigned byte integers from xmm3 /m128 and xmm2; the 8 low differences and 8 high differences are then summed separately to produce two unsigned word integer results. | 
| VPSADBW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG F6 /r | avx2 | Computes the absolute differences of the packed unsigned byte integers from ymm3 /m256 and ymm2; then each consecutive 8 differences are summed separately to produce four unsigned word integer results. | 
| VPSADBW xmm1, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG F6 /r | avx512 | Computes the absolute differences of the packed unsigned byte integers from xmm3 /m128 and xmm2; then each consecutive 8 differences are summed separately to produce four unsigned word integer results. | 
| VPSADBW ymm1, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG F6 /r | avx512 | Computes the absolute differences of the packed unsigned byte integers from ymm3 /m256 and ymm2; then each consecutive 8 differences are summed separately to produce four unsigned word integer results. | 
| VPSADBW zmm1, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG F6 /r | avx512 | Computes the absolute differences of the packed unsigned byte integers from zmm3 /m512 and zmm2; then each consecutive 8 differences are summed separately to produce four unsigned word integer results. | 
| PSHUFB mm1, mm2/m64 | 0F 38 00 /r | ssse3 | Shuffle bytes in mm1 according to contents of mm2/m64. | 
| PSHUFB xmm1, xmm2/m128 | 66 0F 38 00 /r | ssse3 | Shuffle bytes in xmm1 according to contents of xmm2/m128. | 
| VPSHUFB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 00 /r | avx | Shuffle bytes in xmm2 according to contents of xmm3/m128. | 
| VPSHUFB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 00 /r | avx2 | Shuffle bytes in ymm2 according to contents of ymm3/m256. | 
| VPSHUFB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.WIG 00 /r | avx512 | Shuffle bytes in xmm2 according to contents of xmm3/m128 under write mask k1. | 
| VPSHUFB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.WIG 00 /r | avx512 | Shuffle bytes in ymm2 according to contents of ymm3/m256 under write mask k1. | 
| VPSHUFB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.WIG 00 /r | avx512 | Shuffle bytes in ymm2 according to contents of zmm3/m512 under write mask k1. | 
| PSHUFD xmm1, xmm2/m128, imm8 | 66 0F 70 /r ib | sse2 | Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFD xmm1, xmm2/m128, imm8 | VEX.128.66.0F.WIG 70 /r ib | avx | Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFD ymm1, ymm2/m256, imm8 | VEX.256.66.0F.WIG 70 /r ib | avx2 | Shuffle the doublewords in ymm2/m256 based on the encoding in imm8 and store the result in ymm1. | 
| VPSHUFD xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F.W0 70 /r ib | avx512 | Shuffle the doublewords in xmm2/m128/m32bcst based on the encoding in imm8 and store the result in xmm1 using writemask k1. | 
| VPSHUFD ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F.W0 70 /r ib | avx512 | Shuffle the doublewords in ymm2/m256/m32bcst based on the encoding in imm8 and store the result in ymm1 using writemask k1. | 
| VPSHUFD zmm1 {k1}{z}, zmm2/m512/m32bcst, imm8 | EVEX.512.66.0F.W0 70 /r ib | avx512 | Shuffle the doublewords in zmm2/m512/m32bcst based on the encoding in imm8 and store the result in zmm1 using writemask k1. | 
| PSHUFHW xmm1, xmm2/m128, imm8 | F3 0F 70 /r ib | sse2 | Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFHW xmm1, xmm2/m128, imm8 | VEX.128.F3.0F.WIG 70 /r ib | avx | Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFHW ymm1, ymm2/m256, imm8 | VEX.256.F3.0F.WIG 70 /r ib | avx2 | Shuffle the high words in ymm2/m256 based on the encoding in imm8 and store the result in ymm1. | 
| VPSHUFHW xmm1 {k1}{z}, xmm2/m128, imm8 | EVEX.128.F3.0F.WIG 70 /r ib | avx512 | Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1 under write mask k1. | 
| VPSHUFHW ymm1 {k1}{z}, ymm2/m256, imm8 | EVEX.256.F3.0F.WIG 70 /r ib | avx512 | Shuffle the high words in ymm2/m256 based on the encoding in imm8 and store the result in ymm1 under write mask k1. | 
| VPSHUFHW zmm1 {k1}{z}, zmm2/m512, imm8 | EVEX.512.F3.0F.WIG 70 /r ib | avx512 | Shuffle the high words in ymm2/m256 based on the encoding in imm8 and store the result in zmm1 under write mask k1. | 
| PSHUFLW xmm1, xmm2/m128, imm8 | F2 0F 70 /r ib | sse2 | Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFLW xmm1, xmm2/m128, imm8 | VEX.128.F2.0F.WIG 70 /r ib | avx | Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | 
| VPSHUFLW ymm1, ymm2/m256, imm8 | VEX.256.F2.0F.WIG 70 /r ib | avx2 | Shuffle the low words in ymm2/m256 based on the encoding in imm8 and store the result in ymm1. | 
| VPSHUFLW xmm1 {k1}{z}, xmm2/m128, imm8 | EVEX.128.F2.0F.WIG 70 /r ib | avx512 | Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1 under write mask k1. | 
| VPSHUFLW ymm1 {k1}{z}, ymm2/m256, imm8 | EVEX.256.F2.0F.WIG 70 /r ib | avx512 | Shuffle the low words in ymm2/m256 based on the encoding in imm8 and store the result in ymm1 under write mask k1. | 
| VPSHUFLW zmm1 {k1}{z}, zmm2/m512, imm8 | EVEX.512.F2.0F.WIG 70 /r ib | avx512 | Shuffle the low words in zmm2/m512 based on the encoding in imm8 and store the result in zmm1 under write mask k1. | 
| PSHUFW mm1, mm2/m64, imm8 | 0F 70 /r ib | Shuffle the words in mm2/m64 based on the encoding in imm8 and store the result in mm1. | |
| PSIGNB mm1, mm2/m64 | 0F 38 08 /r | ssse3 | Negate/zero/preserve packed byte integers in mm1 depending on the corresponding sign in mm2/m64. | 
| PSIGNB xmm1, xmm2/m128 | 66 0F 38 08 /r | ssse3 | Negate/zero/preserve packed byte integers in xmm1 depending on the corresponding sign in xmm2/m128. | 
| PSIGNW mm1, mm2/m64 | 0F 38 09 /r | ssse3 | Negate/zero/preserve packed word integers in mm1 depending on the corresponding sign in mm2/m128. | 
| PSIGNW xmm1, xmm2/m128 | 66 0F 38 09 /r | ssse3 | Negate/zero/preserve packed word integers in xmm1 depending on the corresponding sign in xmm2/m128. | 
| PSIGND mm1, mm2/m64 | 0F 38 0A /r | ssse3 | Negate/zero/preserve packed doubleword integers in mm1 depending on the corresponding sign in mm2/m128. | 
| PSIGND xmm1, xmm2/m128 | 66 0F 38 0A /r | ssse3 | Negate/zero/preserve packed doubleword integers in xmm1 depending on the corresponding sign in xmm2/m128. | 
| VPSIGNB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 08 /r | avx | Negate/zero/preserve packed byte integers in xmm2 depending on the corresponding sign in xmm3/m128. | 
| VPSIGNW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 09 /r | avx | Negate/zero/preserve packed word integers in xmm2 depending on the corresponding sign in xmm3/m128. | 
| VPSIGND xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.WIG 0A /r | avx | Negate/zero/preserve packed doubleword integers in xmm2 depending on the corresponding sign in xmm3/m128. | 
| VPSIGNB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 08 /r | avx2 | Negate packed byte integers in ymm2 if the corresponding sign in ymm3/m256 is less than zero. | 
| VPSIGNW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 09 /r | avx2 | Negate packed 16-bit integers in ymm2 if the corresponding sign in ymm3/m256 is less than zero. | 
| VPSIGND ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.WIG 0A /r | avx2 | Negate packed doubleword integers in ymm2 if the corresponding sign in ymm3/m256 is less than zero. | 
| PSLLDQ xmm1, imm8 | 66 0F 73 /7 ib | sse2 | Shift xmm1 left by imm8 bytes while shifting in 0s. | 
| VPSLLDQ xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 73 /7 ib | avx | Shift xmm2 left by imm8 bytes while shifting in 0s and store result in xmm1. | 
| VPSLLDQ ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 73 /7 ib | avx2 | Shift ymm2 left by imm8 bytes while shifting in 0s and store result in ymm1. | 
| VPSLLDQ xmm1,xmm2/ m128, imm8 | EVEX.NDD.128.66.0F.WIG 73 /7 ib | avx512 | Shift xmm2/m128 left by imm8 bytes while shifting in 0s and store result in xmm1. | 
| VPSLLDQ ymm1, ymm2/m256, imm8 | EVEX.NDD.256.66.0F.WIG 73 /7 ib | avx512 | Shift ymm2/m256 left by imm8 bytes while shifting in 0s and store result in ymm1. | 
| VPSLLDQ zmm1, zmm2/m512, imm8 | EVEX.NDD.512.66.0F.WIG 73 /7 ib | avx512 | Shift zmm2/m512 left by imm8 bytes while shifting in 0s and store result in zmm1. | 
| PSLLW mm, mm/m64 | 0F F1 /r | mmx | Shift words in mm left mm/m64 while shifting in 0s. | 
| PSLLW xmm1, xmm2/m128 | 66 0F F1 /r | sse2 | Shift words in xmm1 left by xmm2/m128 while shifting in 0s. | 
| PSLLW mm1, imm8 | 0F 71 /6 ib | mmx | Shift words in mm left by imm8 while shifting in 0s. | 
| PSLLW xmm1, imm8 | 66 0F 71 /6 ib | sse2 | Shift words in xmm1 left by imm8 while shifting in 0s. | 
| PSLLD mm, mm/m64 | 0F F2 /r | mmx | Shift doublewords in mm left by mm/m64 while shifting in 0s. | 
| PSLLD xmm1, xmm2/m128 | 66 0F F2 /r | sse2 | Shift doublewords in xmm1 left by xmm2/m128 while shifting in 0s. | 
| PSLLD mm, imm8 | 0F 72 /6 ib | mmx | Shift doublewords in mm left by imm8 while shifting in 0s. | 
| PSLLD xmm1, imm8 | 66 0F 72 /6 ib | sse2 | Shift doublewords in xmm1 left by imm8 while shifting in 0s. | 
| PSLLQ mm, mm/m64 | 0F F3 /r | mmx | Shift quadword in mm left by mm/m64 while shifting in 0s. | 
| PSLLQ xmm1, xmm2/m128 | 66 0F F3 /r | sse2 | Shift quadwords in xmm1 left by xmm2/m128 while shifting in 0s. | 
| PSLLQ mm, imm8 | 0F 73 /6 ib | mmx | Shift quadword in mm left by imm8 while shifting in 0s. | 
| PSLLQ xmm1, imm8 | 66 0F 73 /6 ib | sse2 | Shift quadwords in xmm1 left by imm8 while shifting in 0s. | 
| VPSLLW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F1 /r | avx | Shift words in xmm2 left by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSLLW xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 71 /6 ib | avx | Shift words in xmm2 left by imm8 while shifting in 0s. | 
| VPSLLD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F2 /r | avx | Shift doublewords in xmm2 left by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSLLD xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 72 /6 ib | avx | Shift doublewords in xmm2 left by imm8 while shifting in 0s. | 
| VPSLLQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F3 /r | avx | Shift quadwords in xmm2 left by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSLLQ xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 73 /6 ib | avx | Shift quadwords in xmm2 left by imm8 while shifting in 0s. | 
| VPSLLW ymm1, ymm2, xmm3/m128 | VEX.NDS.256.66.0F.WIG F1 /r | avx2 | Shift words in ymm2 left by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSLLW ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 71 /6 ib | avx2 | Shift words in ymm2 left by imm8 while shifting in 0s. | 
| PSRAW mm, mm/m64 | 0F E1 /r | mmx | Shift words in mm right by mm/m64 while shifting in sign bits. | 
| PSRAW xmm1, xmm2/m128 | 66 0F E1 /r | sse2 | Shift words in xmm1 right by xmm2/m128 while shifting in sign bits. | 
| PSRAW mm, imm8 | 0F 71 /4 ib | mmx | Shift words in mm right by imm8 while shifting in sign bits | 
| PSRAW xmm1, imm8 | 66 0F 71 /4 ib | sse2 | Shift words in xmm1 right by imm8 while shifting in sign bits | 
| PSRAD mm, mm/m64 | 0F E2 /r | mmx | Shift doublewords in mm right by mm/m64 while shifting in sign bits. | 
| PSRAD xmm1, xmm2/m128 | 66 0F E2 /r | sse2 | Shift doubleword in xmm1 right by xmm2 /m128 while shifting in sign bits. | 
| PSRAD mm, imm8 | 0F 72 /4 ib | mmx | Shift doublewords in mm right by imm8 while shifting in sign bits. | 
| PSRAD xmm1, imm8 | 66 0F 72 /4 ib | sse2 | Shift doublewords in xmm1 right by imm8 while shifting in sign bits. | 
| VPSRAW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E1 /r | avx | Shift words in xmm2 right by amount specified in xmm3/m128 while shifting in sign bits. | 
| VPSRAW xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 71 /4 ib | avx | Shift words in xmm2 right by imm8 while shifting in sign bits. | 
| VPSRAD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E2 /r | avx | Shift doublewords in xmm2 right by amount specified in xmm3/m128 while shifting in sign bits. | 
| VPSRAD xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 72 /4 ib | avx | Shift doublewords in xmm2 right by imm8 while shifting in sign bits. | 
| VPSRAW ymm1, ymm2, xmm3/m128 | VEX.NDS.256.66.0F.WIG E1 /r | avx2 | Shift words in ymm2 right by amount specified in xmm3/m128 while shifting in sign bits. | 
| VPSRAW ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 71 /4 ib | avx2 | Shift words in ymm2 right by imm8 while shifting in sign bits. | 
| VPSRAD ymm1, ymm2, xmm3/m128 | VEX.NDS.256.66.0F.WIG E2 /r | avx2 | Shift doublewords in ymm2 right by amount specified in xmm3/m128 while shifting in sign bits. | 
| VPSRAD ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 72 /4 ib | avx2 | Shift doublewords in ymm2 right by imm8 while shifting in sign bits. | 
| VPSRAW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E1 /r | avx512 | Shift words in xmm2 right by amount specified in xmm3/m128 while shifting in sign bits using writemask k1. | 
| VPSRAW ymm1 {k1}{z}, ymm2, xmm3/m128 | EVEX.NDS.256.66.0F.WIG E1 /r | avx512 | Shift words in ymm2 right by amount specified in xmm3/m128 while shifting in sign bits using writemask k1. | 
| PSRLDQ xmm1, imm8 | 66 0F 73 /3 ib | sse2 | Shift xmm1 right by imm8 while shifting in 0s. | 
| VPSRLDQ xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 73 /3 ib | avx | Shift xmm2 right by imm8 bytes while shifting in 0s. | 
| VPSRLDQ ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 73 /3 ib | avx2 | Shift ymm1 right by imm8 bytes while shifting in 0s. | 
| VPSRLDQ xmm1, xmm2/m128, imm8 | EVEX.NDD.128.66.0F.WIG 73 /3 ib | avx512 | Shift xmm2/m128 right by imm8 bytes while shifting in 0s and store result in xmm1. | 
| VPSRLDQ ymm1, ymm2/m256, imm8 | EVEX.NDD.256.66.0F.WIG 73 /3 ib | avx512 | Shift ymm2/m256 right by imm8 bytes while shifting in 0s and store result in ymm1. | 
| VPSRLDQ zmm1, zmm2/m512, imm8 | EVEX.NDD.512.66.0F.WIG 73 /3 ib | avx512 | Shift zmm2/m512 right by imm8 bytes while shifting in 0s and store result in zmm1. | 
| PSRLW mm, mm/m64 | 0F D1 /r | mmx | Shift words in mm right by amount specified in mm/m64 while shifting in 0s. | 
| PSRLW xmm1, xmm2/m128 | 66 0F D1 /r | sse2 | Shift words in xmm1 right by amount specified in xmm2/m128 while shifting in 0s. | 
| PSRLW mm, imm8 | 0F 71 /2 ib | mmx | Shift words in mm right by imm8 while shifting in 0s. | 
| PSRLW xmm1, imm8 | 66 0F 71 /2 ib | sse2 | Shift words in xmm1 right by imm8 while shifting in 0s. | 
| PSRLD mm, mm/m64 | 0F D2 /r | mmx | Shift doublewords in mm right by amount specified in mm/m64 while shifting in 0s. | 
| PSRLD xmm1, xmm2/m128 | 66 0F D2 /r | sse2 | Shift doublewords in xmm1 right by amount specified in xmm2 /m128 while shifting in 0s. | 
| PSRLD mm, imm8 | 0F 72 /2 ib | mmx | Shift doublewords in mm right by imm8 while shifting in 0s. | 
| PSRLD xmm1, imm8 | 66 0F 72 /2 ib | sse2 | Shift doublewords in xmm1 right by imm8 while shifting in 0s. | 
| PSRLQ mm, mm/m64 | 0F D3 /r | mmx | Shift mm right by amount specified in mm/m64 while shifting in 0s. | 
| PSRLQ xmm1, xmm2/m128 | 66 0F D3 /r | sse2 | Shift quadwords in xmm1 right by amount specified in xmm2/m128 while shifting in 0s. | 
| PSRLQ mm, imm8 | 0F 73 /2 ib | mmx | Shift mm right by imm8 while shifting in 0s. | 
| PSRLQ xmm1, imm8 | 66 0F 73 /2 ib | sse2 | Shift quadwords in xmm1 right by imm8 while shifting in 0s. | 
| VPSRLW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D1 /r | avx | Shift words in xmm2 right by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSRLW xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 71 /2 ib | avx | Shift words in xmm2 right by imm8 while shifting in 0s. | 
| VPSRLD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D2 /r | avx | Shift doublewords in xmm2 right by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSRLD xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 72 /2 ib | avx | Shift doublewords in xmm2 right by imm8 while shifting in 0s. | 
| VPSRLQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D3 /r | avx | Shift quadwords in xmm2 right by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSRLQ xmm1, xmm2, imm8 | VEX.NDD.128.66.0F.WIG 73 /2 ib | avx | Shift quadwords in xmm2 right by imm8 while shifting in 0s. | 
| VPSRLW ymm1, ymm2, xmm3/m128 | VEX.NDS.256.66.0F.WIG D1 /r | avx2 | Shift words in ymm2 right by amount specified in xmm3/m128 while shifting in 0s. | 
| VPSRLW ymm1, ymm2, imm8 | VEX.NDD.256.66.0F.WIG 71 /2 ib | avx2 | Shift words in ymm2 right by imm8 while shifting in 0s. | 
| PSUBB mm, mm/m64 | 0F F8 /r | mmx | Subtract packed byte integers in mm/m64 from packed byte integers in mm. | 
| PSUBB xmm1, xmm2/m128 | 66 0F F8 /r | sse2 | Subtract packed byte integers in xmm2/m128 from packed byte integers in xmm1. | 
| PSUBW mm, mm/m64 | 0F F9 /r | mmx | Subtract packed word integers in mm/m64 from packed word integers in mm. | 
| PSUBW xmm1, xmm2/m128 | 66 0F F9 /r | sse2 | Subtract packed word integers in xmm2/m128 from packed word integers in xmm1. | 
| PSUBD mm, mm/m64 | 0F FA /r | mmx | Subtract packed doubleword integers in mm/m64 from packed doubleword integers in mm. | 
| PSUBD xmm1, xmm2/m128 | 66 0F FA /r | sse2 | Subtract packed doubleword integers in xmm2/mem128 from packed doubleword integers in xmm1. | 
| VPSUBB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F8 /r | avx | Subtract packed byte integers in xmm3/m128 from xmm2. | 
| VPSUBW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG F9 /r | avx | Subtract packed word integers in xmm3/m128 from xmm2. | 
| VPSUBD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG FA /r | avx | Subtract packed doubleword integers in xmm3/m128 from xmm2. | 
| VPSUBB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG F8 /r | avx2 | Subtract packed byte integers in ymm3/m256 from ymm2. | 
| VPSUBW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG F9 /r | avx2 | Subtract packed word integers in ymm3/m256 from ymm2. | 
| VPSUBD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG FA /r | avx2 | Subtract packed doubleword integers in ymm3/m256 from ymm2. | 
| VPSUBB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG F8 /r | avx512 | Subtract packed byte integers in xmm3/m128 from xmm2 and store in xmm1 using writemask k1. | 
| VPSUBB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG F8 /r | avx512 | Subtract packed byte integers in ymm3/m256 from ymm2 and store in ymm1 using writemask k1. | 
| VPSUBB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG F8 /r | avx512 | Subtract packed byte integers in zmm3/m512 from zmm2 and store in zmm1 using writemask k1. | 
| VPSUBW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG F9 /r | avx512 | Subtract packed word integers in xmm3/m128 from xmm2 and store in xmm1 using writemask k1. | 
| VPSUBW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG F9 /r | avx512 | Subtract packed word integers in ymm3/m256 from ymm2 and store in ymm1 using writemask k1. | 
| VPSUBW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG F9 /r | avx512 | Subtract packed word integers in zmm3/m512 from zmm2 and store in zmm1 using writemask k1. | 
| PSUBQ mm1, mm2/m64 | 0F FB /r | sse2 | Subtract quadword integer in mm1 from mm2 /m64. | 
| PSUBQ xmm1, xmm2/m128 | 66 0F FB /r | sse2 | Subtract packed quadword integers in xmm1 from xmm2 /m128. | 
| VPSUBQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG FB/r | avx | Subtract packed quadword integers in xmm3/m128 from xmm2. | 
| VPSUBQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG FB /r | avx2 | Subtract packed quadword integers in ymm3/m256 from ymm2. | 
| VPSUBQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 FB /r | avx512 | Subtract packed quadword integers in xmm3/m128/m64bcst from xmm2 and store in xmm1 using writemask k1. | 
| VPSUBQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 FB /r | avx512 | Subtract packed quadword integers in ymm3/m256/m64bcst from ymm2 and store in ymm1 using writemask k1. | 
| VPSUBQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 FB/r | avx512 | Subtract packed quadword integers in zmm3/m512/m64bcst from zmm2 and store in zmm1 using writemask k1. | 
| PSUBSB mm, mm/m64 | 0F E8 /r | mmx | Subtract signed packed bytes in mm/m64 from signed packed bytes in mm and saturate results. | 
| PSUBSB xmm1, xmm2/m128 | 66 0F E8 /r | sse2 | Subtract packed signed byte integers in xmm2/m128 from packed signed byte integers in xmm1 and saturate results. | 
| PSUBSW mm, mm/m64 | 0F E9 /r | mmx | Subtract signed packed words in mm/m64 from signed packed words in mm and saturate results. | 
| PSUBSW xmm1, xmm2/m128 | 66 0F E9 /r | sse2 | Subtract packed signed word integers in xmm2/m128 from packed signed word integers in xmm1 and saturate results. | 
| VPSUBSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E8 /r | avx | Subtract packed signed byte integers in xmm3/m128 from packed signed byte integers in xmm2 and saturate results. | 
| VPSUBSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG E9 /r | avx | Subtract packed signed word integers in xmm3/m128 from packed signed word integers in xmm2 and saturate results. | 
| VPSUBSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E8 /r | avx2 | Subtract packed signed byte integers in ymm3/m256 from packed signed byte integers in ymm2 and saturate results. | 
| VPSUBSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG E9 /r | avx2 | Subtract packed signed word integers in ymm3/m256 from packed signed word integers in ymm2 and saturate results. | 
| VPSUBSB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E8 /r | avx512 | Subtract packed signed byte integers in xmm3/m128 from packed signed byte integers in xmm2 and saturate results and store in xmm1 using writemask k1. | 
| VPSUBSB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E8 /r | avx512 | Subtract packed signed byte integers in ymm3/m256 from packed signed byte integers in ymm2 and saturate results and store in ymm1 using writemask k1. | 
| VPSUBSB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG E8 /r | avx512 | Subtract packed signed byte integers in zmm3/m512 from packed signed byte integers in zmm2 and saturate results and store in zmm1 using writemask k1. | 
| VPSUBSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG E9 /r | avx512 | Subtract packed signed word integers in xmm3/m128 from packed signed word integers in xmm2 and saturate results and store in xmm1 using writemask k1. | 
| VPSUBSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG E9 /r | avx512 | Subtract packed signed word integers in ymm3/m256 from packed signed word integers in ymm2 and saturate results and store in ymm1 using writemask k1. | 
| PSUBUSB mm, mm/m64 | 0F D8 /r | mmx | Subtract unsigned packed bytes in mm/m64 from unsigned packed bytes in mm and saturate result. | 
| PSUBUSB xmm1, xmm2/m128 | 66 0F D8 /r | sse2 | Subtract packed unsigned byte integers in xmm2/m128 from packed unsigned byte integers in xmm1 and saturate result. | 
| PSUBUSW mm, mm/m64 | 0F D9 /r | mmx | Subtract unsigned packed words in mm/m64 from unsigned packed words in mm and saturate result. | 
| PSUBUSW xmm1, xmm2/m128 | 66 0F D9 /r | sse2 | Subtract packed unsigned word integers in xmm2/m128 from packed unsigned word integers in xmm1 and saturate result. | 
| VPSUBUSB xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D8 /r | avx | Subtract packed unsigned byte integers in xmm3/m128 from packed unsigned byte integers in xmm2 and saturate result. | 
| VPSUBUSW xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG D9 /r | avx | Subtract packed unsigned word integers in xmm3/m128 from packed unsigned word integers in xmm2 and saturate result. | 
| VPSUBUSB ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG D8 /r | avx2 | Subtract packed unsigned byte integers in ymm3/m256 from packed unsigned byte integers in ymm2 and saturate result. | 
| VPSUBUSW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG D9 /r | avx2 | Subtract packed unsigned word integers in ymm3/m256 from packed unsigned word integers in ymm2 and saturate result. | 
| VPSUBUSB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG D8 /r | avx512 | Subtract packed unsigned byte integers in xmm3/m128 from packed unsigned byte integers in xmm2, saturate results and store in xmm1 using writemask k1. | 
| VPSUBUSB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG D8 /r | avx512 | Subtract packed unsigned byte integers in ymm3/m256 from packed unsigned byte integers in ymm2, saturate results and store in ymm1 using writemask k1. | 
| VPSUBUSB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F.WIG D8 /r | avx512 | Subtract packed unsigned byte integers in zmm3/m512 from packed unsigned byte integers in zmm2, saturate results and store in zmm1 using writemask k1. | 
| VPSUBUSW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG D9 /r | avx512 | Subtract packed unsigned word integers in xmm3/m128 from packed unsigned word integers in xmm2 and saturate results and store in xmm1 using writemask k1. | 
| VPSUBUSW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F.WIG D9 /r | avx512 | Subtract packed unsigned word integers in ymm3/m256 from packed unsigned word integers in ymm2, saturate results and store in ymm1 using writemask k1. | 
| PTEST xmm1, xmm2/m128 | 66 0F 38 17 /r | sse4.1 | Set ZF if xmm2/m128 AND xmm1 result is all 0s. Set CF if xmm2/m128 AND NOT xmm1 result is all 0s. | 
| VPTEST xmm1, xmm2/m128 | VEX.128.66.0F38.WIG 17 /r | avx | Set ZF and CF depending on bitwise AND and ANDN of sources. | 
| VPTEST ymm1, ymm2/m256 | VEX.256.66.0F38.WIG 17 /r | avx | Set ZF and CF depending on bitwise AND and ANDN of sources. | 
| PTWRITE r64/m64 | F3 REX.W 0F AE /4 | Reads the data from r64/m64 to encod into a PTW packet if dependencies are met (see details below). | |
| PTWRITE r32/m32 | F3 0F AE /4 | Reads the data from r32/m32 to encode into a PTW packet if dependencies are met (see details below). | |
| PUNPCKHBW mm, mm/m64 | 0F 68 /r | mmx | Unpack and interleave high-order bytes from mm and mm/m64 into mm. | 
| PUNPCKHBW xmm1, xmm2/m128 | 66 0F 68 /r | sse2 | Unpack and interleave high-order bytes from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKHWD mm, mm/m64 | 0F 69 /r | mmx | Unpack and interleave high-order words from mm and mm/m64 into mm. | 
| PUNPCKHWD xmm1, xmm2/m128 | 66 0F 69 /r | sse2 | Unpack and interleave high-order words from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKHDQ mm, mm/m64 | 0F 6A /r | mmx | Unpack and interleave high-order doublewords from mm and mm/m64 into mm. | 
| PUNPCKHDQ xmm1, xmm2/m128 | 66 0F 6A /r | sse2 | Unpack and interleave high-order doublewords from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKHQDQ xmm1, xmm2/m128 | 66 0F 6D /r | sse2 | Unpack and interleave high-order quadwords from xmm1 and xmm2/m128 into xmm1. | 
| VPUNPCKHBW xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 68/r | avx | Interleave high-order bytes from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKHWD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 69/r | avx | Interleave high-order words from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKHDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 6A/r | avx | Interleave high-order doublewords from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKHQDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 6D/r | avx | Interleave high-order quadword from xmm2 and xmm3/m128 into xmm1 register. | 
| VPUNPCKHBW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 68 /r | avx2 | Interleave high-order bytes from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKHWD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 69 /r | avx2 | Interleave high-order words from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKHDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 6A /r | avx2 | Interleave high-order doublewords from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKHQDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 6D /r | avx2 | Interleave high-order quadword from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKHBW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 68 /r | avx512 | Interleave high-order bytes from xmm2 and xmm3/m128 into xmm1 register using k1 write mask. | 
| VPUNPCKHWD xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 69 /r | avx512 | Interleave high-order words from xmm2 and xmm3/m128 into xmm1 register using k1 write mask. | 
| VPUNPCKHDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 6A /r | avx512 | Interleave high-order doublewords from xmm2 and xmm3/m128/m32bcst into xmm1 register using k1 write mask. | 
| PUNPCKLBW mm, mm/m32 | 0F 60 /r | mmx | Interleave low-order bytes from mm and mm/m32 into mm. | 
| PUNPCKLBW xmm1, xmm2/m128 | 66 0F 60 /r | sse2 | Interleave low-order bytes from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKLWD mm, mm/m32 | 0F 61 /r | mmx | Interleave low-order words from mm and mm/m32 into mm. | 
| PUNPCKLWD xmm1, xmm2/m128 | 66 0F 61 /r | sse2 | Interleave low-order words from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKLDQ mm, mm/m32 | 0F 62 /r | mmx | Interleave low-order doublewords from mm and mm/m32 into mm. | 
| PUNPCKLDQ xmm1, xmm2/m128 | 66 0F 62 /r | sse2 | Interleave low-order doublewords from xmm1 and xmm2/m128 into xmm1. | 
| PUNPCKLQDQ xmm1, xmm2/m128 | 66 0F 6C /r | sse2 | Interleave low-order quadword from xmm1 and xmm2/m128 into xmm1 register. | 
| VPUNPCKLBW xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 60/r | avx | Interleave low-order bytes from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKLWD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 61/r | avx | Interleave low-order words from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKLDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 62/r | avx | Interleave low-order doublewords from xmm2 and xmm3/m128 into xmm1. | 
| VPUNPCKLQDQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 6C/r | avx | Interleave low-order quadword from xmm2 and xmm3/m128 into xmm1 register. | 
| VPUNPCKLBW ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 60 /r | avx2 | Interleave low-order bytes from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKLWD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 61 /r | avx2 | Interleave low-order words from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKLDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 62 /r | avx2 | Interleave low-order doublewords from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKLQDQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 6C /r | avx2 | Interleave low-order quadword from ymm2 and ymm3/m256 into ymm1 register. | 
| VPUNPCKLBW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 60 /r | avx512 | Interleave low-order bytes from xmm2 and xmm3/m128 into xmm1 register subject to write mask k1. | 
| VPUNPCKLWD xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F.WIG 61 /r | avx512 | Interleave low-order words from xmm2 and xmm3/m128 into xmm1 register subject to write mask k1. | 
| VPUNPCKLDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 62 /r | avx512 | Interleave low-order doublewords from xmm2 and xmm3/m128/m32bcst into xmm1 register subject to write mask k1. | 
| VPUNPCKLQDQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 6C /r | avx512 | Interleave low-order quadword from zmm2 and zmm3/m512/m64bcst into zmm1 register subject to write mask k1. | 
| PUSH r/m16 | FF /6 | Push r/m16. | |
| PUSH r/m32 | FF /6 | Push r/m32. | |
| PUSH r/m64 | FF /6 | Push r/m64. | |
| PUSH r16 | 50+rw | Push r16. | |
| PUSH r32 | 50+rd | Push r32. | |
| PUSH r64 | 50+rd | Push r64. | |
| PUSH imm8 | 6A ib | Push imm8. | |
| PUSH imm16 | 68 iw | Push imm16. | |
| PUSH imm32 | 68 id | Push imm32. | |
| PUSH CS | 0E | Push CS. | |
| PUSH SS | 16 | Push SS. | |
| PUSH DS | 1E | Push DS. | |
| PUSH ES | 06 | Push ES. | |
| PUSH FS | 0F A0 | Push FS. | |
| PUSH GS | 0F A8 | Push GS. | |
| PUSHA | 60 | Push AX, CX, DX, BX, original SP, BP, SI, and DI. | |
| PUSHAD | 60 | Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI. | |
| PUSHF | 9C | Push lower 16 bits of EFLAGS. | |
| PUSHFD | 9C | Push EFLAGS. | |
| PUSHFQ | 9C | Push RFLAGS. | |
| PXOR mm, mm/m64 | 0F EF /r | mmx | Bitwise XOR of mm/m64 and mm. | 
| PXOR xmm1, xmm2/m128 | 66 0F EF /r | sse2 | Bitwise XOR of xmm2/m128 and xmm1. | 
| VPXOR xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG EF /r | avx | Bitwise XOR of xmm3/m128 and xmm2. | 
| VPXOR ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG EF /r | avx2 | Bitwise XOR of ymm3/m256 and ymm2. | 
| VPXORD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F.W0 EF /r | avx512 | Bitwise XOR of packed doubleword integers in xmm2 and xmm3/m128 using writemask k1. | 
| VPXORD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F.W0 EF /r | avx512 | Bitwise XOR of packed doubleword integers in ymm2 and ymm3/m256 using writemask k1. | 
| VPXORD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F.W0 EF /r | avx512 | Bitwise XOR of packed doubleword integers in zmm2 and zmm3/m512/m32bcst using writemask k1. | 
| VPXORQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 EF /r | avx512 | Bitwise XOR of packed quadword integers in xmm2 and xmm3/m128 using writemask k1. | 
| VPXORQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 EF /r | avx512 | Bitwise XOR of packed quadword integers in ymm2 and ymm3/m256 using writemask k1. | 
| VPXORQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F.W1 EF /r | avx512 | Bitwise XOR of packed quadword integers in zmm2 and zmm3/m512/m64bcst using writemask k1. | 
| RCL r/m8, 1 | D0 /2 | Rotate 9 bits (CF, r/m8) left once. | |
| RCL r/m8*, 1 | REX + D0 /2 | Rotate 9 bits (CF, r/m8) left once. | |
| RCL r/m8, CL | D2 /2 | Rotate 9 bits (CF, r/m8) left CL times. | |
| RCL r/m8*, CL | REX + D2 /2 | Rotate 9 bits (CF, r/m8) left CL times. | |
| RCL r/m8, imm8 | C0 /2 ib | Rotate 9 bits (CF, r/m8) left imm8 times. | |
| RCL r/m8*, imm8 | REX + C0 /2 ib | Rotate 9 bits (CF, r/m8) left imm8 times. | |
| RCL r/m16, 1 | D1 /2 | Rotate 17 bits (CF, r/m16) left once. | |
| RCL r/m16, CL | D3 /2 | Rotate 17 bits (CF, r/m16) left CL times. | |
| RCL r/m16, imm8 | C1 /2 ib | Rotate 17 bits (CF, r/m16) left imm8 times. | |
| RCL r/m32, 1 | D1 /2 | Rotate 33 bits (CF, r/m32) left once. | |
| RCL r/m64, 1 | REX.W + D1 /2 | Rotate 65 bits (CF, r/m64) left once. Uses a 6 bit count. | |
| RCL r/m32, CL | D3 /2 | Rotate 33 bits (CF, r/m32) left CL times. | |
| RCL r/m64, CL | REX.W + D3 /2 | Rotate 65 bits (CF, r/m64) left CL times. Uses a 6 bit count. | |
| RCL r/m32, imm8 | C1 /2 ib | Rotate 33 bits (CF, r/m32) left imm8 times. | |
| RCL r/m64, imm8 | REX.W + C1 /2 ib | Rotate 65 bits (CF, r/m64) left imm8 times. Uses a 6 bit count. | |
| RCR r/m8, 1 | D0 /3 | Rotate 9 bits (CF, r/m8) right once. | |
| RCR r/m8*, 1 | REX + D0 /3 | Rotate 9 bits (CF, r/m8) right once. | |
| RCR r/m8, CL | D2 /3 | Rotate 9 bits (CF, r/m8) right CL times. | |
| RCR r/m8*, CL | REX + D2 /3 | Rotate 9 bits (CF, r/m8) right CL times. | |
| RCR r/m8, imm8 | C0 /3 ib | Rotate 9 bits (CF, r/m8) right imm8 times. | |
| RCR r/m8*, imm8 | REX + C0 /3 ib | Rotate 9 bits (CF, r/m8) right imm8 times. | |
| RCR r/m16, 1 | D1 /3 | Rotate 17 bits (CF, r/m16) right once. | |
| RCR r/m16, CL | D3 /3 | Rotate 17 bits (CF, r/m16) right CL times. | |
| RCR r/m16, imm8 | C1 /3 ib | Rotate 17 bits (CF, r/m16) right imm8 times. | |
| RCR r/m32, 1 | D1 /3 | Rotate 33 bits (CF, r/m32) right once. Uses a 6 bit count. | |
| RCR r/m64, 1 | REX.W + D1 /3 | Rotate 65 bits (CF, r/m64) right once. Uses a 6 bit count. | |
| RCR r/m32, CL | D3 /3 | Rotate 33 bits (CF, r/m32) right CL times. | |
| RCR r/m64, CL | REX.W + D3 /3 | Rotate 65 bits (CF, r/m64) right CL times. Uses a 6 bit count. | |
| RCR r/m32, imm8 | C1 /3 ib | Rotate 33 bits (CF, r/m32) right imm8 times. | |
| RCR r/m64, imm8 | REX.W + C1 /3 ib | Rotate 65 bits (CF, r/m64) right imm8 times. Uses a 6 bit count. | |
| ROL r/m8, 1 | D0 /0 | Rotate 8 bits r/m8 left once. | |
| ROL r/m8, 1 | REX + D0 /0 | Rotate 8 bits r/m8 left once | |
| ROL r/m8, CL | D2 /0 | Rotate 8 bits r/m8 left CL times. | |
| ROL r/m8, CL | REX + D2 /0 | Rotate 8 bits r/m8 left CL times. | |
| ROL r/m8, imm8 | C0 /0 ib | Rotate 8 bits r/m8 left imm8 times. | |
| ROL r/m8, imm8 | REX + C0 /0 ib | Rotate 8 bits r/m8 left imm8 times. | |
| ROL r/m16, 1 | D1 /0 | Rotate 16 bits r/m16 left once. | |
| ROL r/m16, CL | D3 /0 | Rotate 16 bits r/m16 left CL times. | |
| ROL r/m16, imm8 | C1 /0 ib | Rotate 16 bits r/m16 left imm8 times. | |
| ROL r/m32, 1 | D1 /0 | Rotate 32 bits r/m32 left once. | |
| ROL r/m64, 1 | REX.W + D1 /0 | Rotate 64 bits r/m64 left once. Uses a 6 bit count. | |
| ROL r/m32, CL | D3 /0 | Rotate 32 bits r/m32 left CL times. | |
| ROL r/m64, CL | REX.W + D3 /0 | Rotate 64 bits r/m64 left CL times. Uses a 6 bit count. | |
| ROL r/m32, imm8 | C1 /0 ib | Rotate 32 bits r/m32 left imm8 times. | |
| ROL r/m64, imm8 | REX.W + C1 /0 ib | Rotate 64 bits r/m64 left imm8 times. Uses a 6 bit count. | |
| ROR r/m8, 1 | D0 /1 | Rotate 8 bits r/m8 right once. | |
| ROR r/m8, 1 | REX + D0 /1 | Rotate 8 bits r/m8 right once. | |
| ROR r/m8, CL | D2 /1 | Rotate 8 bits r/m8 right CL times. | |
| ROR r/m8, CL | REX + D2 /1 | Rotate 8 bits r/m8 right CL times. | |
| ROR r/m8, imm8 | C0 /1 ib | Rotate 8 bits r/m16 right imm8 times. | |
| ROR r/m8, imm8 | REX + C0 /1 ib | Rotate 8 bits r/m16 right imm8 times. | |
| ROR r/m16, 1 | D1 /1 | Rotate 16 bits r/m16 right once. | |
| ROR r/m16, CL | D3 /1 | Rotate 16 bits r/m16 right CL times. | |
| ROR r/m16, imm8 | C1 /1 ib | Rotate 16 bits r/m16 right imm8 times. | |
| ROR r/m32, 1 | D1 /1 | Rotate 32 bits r/m32 right once. | |
| ROR r/m64, 1 | REX.W + D1 /1 | Rotate 64 bits r/m64 right once. Uses a 6 bit count. | |
| ROR r/m32, CL | D3 /1 | Rotate 32 bits r/m32 right CL times. | |
| ROR r/m64, CL | REX.W + D3 /1 | Rotate 64 bits r/m64 right CL times. Uses a 6 bit count. | |
| ROR r/m32, imm8 | C1 /1 ib | Rotate 32 bits r/m32 right imm8 times. | |
| ROR r/m64, imm8 | REX.W + C1 /1 ib | Rotate 64 bits r/m64 right imm8 times. Uses a 6 bit count. | |
| RCPPS xmm1, xmm2/m128 | 0F 53 /r | sse | Computes the approximate reciprocals of the packed single-precision floating-point values in xmm2/m128 and stores the results in xmm1. | 
| VRCPPS xmm1, xmm2/m128 | VEX.128.0F.WIG 53 /r | avx | Computes the approximate reciprocals of packed single-precision values in xmm2/mem and stores the results in xmm1. | 
| VRCPPS ymm1, ymm2/m256 | VEX.256.0F.WIG 53 /r | avx | Computes the approximate reciprocals of packed single-precision values in ymm2/mem and stores the results in ymm1. | 
| RCPSS xmm1, xmm2/m32 | F3 0F 53 /r | sse | Computes the approximate reciprocal of the scalar single-precision floating-point value in xmm2/m32 and stores the result in xmm1. | 
| VRCPSS xmm1, xmm2, xmm3/m32 | VEX.NDS.LIG.F3.0F.WIG 53 /r | avx | Computes the approximate reciprocal of the scalar single-precision floating-point value in xmm3/m32 and stores the result in xmm1. Also, upper single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. | 
| RDFSBASE r32 | F3 0F AE /0 | fsgsbase | Load the 32-bit destination register with the FS base address. | 
| RDFSBASE r64 | F3 REX.W 0F AE /0 | fsgsbase | Load the 64-bit destination register with the FS base address. | 
| RDGSBASE r32 | F3 0F AE /1 | fsgsbase | Load the 32-bit destination register with the GS base address. | 
| RDGSBASE r64 | F3 REX.W 0F AE /1 | fsgsbase | Load the 64-bit destination register with the GS base address. | 
| RDMSR | 0F 32 | Read MSR specified by ECX into EDX:EAX. | |
| RDPID r32 | F3 0F C7 /7 | rdpid | Read IA32_TSC_AUX into r32. | 
| RDPID r64 | F3 0F C7 /7 | rdpid | Read IA32_TSC_AUX into r64. | 
| RDPKRU | 0F 01 EE | ospke | Reads PKRU into EAX. | 
| RDPMC | 0F 33 | Read performance-monitoring counter specified by ECX into EDX:EAX. | |
| RDRAND r16 | 0F C7 /6 | rdrand | Read a 16-bit random number and store in the destination register. | 
| RDRAND r32 | 0F C7 /6 | rdrand | Read a 32-bit random number and store in the destination register. | 
| RDRAND r64 | REX.W + 0F C7 /6 | rdrand | Read a 64-bit random number and store in the destination register. | 
| RDSEED r16 | 0F C7 /7 | rdseed | Read a 16-bit NIST SP800-90B & C compliant random value and store in the destination register. | 
| RDSEED r32 | 0F C7 /7 | rdseed | Read a 32-bit NIST SP800-90B & C compliant random value and store in the destination register. | 
| RDSEED r64 | REX.W + 0F C7 /7 | rdseed | Read a 64-bit NIST SP800-90B & C compliant random value and store in the destination register. | 
| RDTSC | 0F 31 | Read time-stamp counter into EDX:EAX. | |
| RDTSCP | 0F 01 F9 | Read 64-bit time-stamp counter and IA32_TSC_AUX value into EDX:EAX and ECX. | |
| REP INS m8, DX | F3 6C | Input (E)CX bytes from port DX into ES:[(E)DI]. | |
| REP INS m8, DX | F3 6C | Input RCX bytes from port DX into [RDI]. | |
| REP INS m16, DX | F3 6D | Input (E)CX words from port DX into ES:[(E)DI.] | |
| REP INS m32, DX | F3 6D | Input (E)CX doublewords from port DX into ES:[(E)DI]. | |
| REP INS r/m32, DX | F3 6D | Input RCX default size from port DX into [RDI]. | |
| REP MOVS m8, m8 | F3 A4 | Move (E)CX bytes from DS:[(E)SI] to ES:[(E)DI]. | |
| REP MOVS m8, m8 | F3 REX.W A4 | Move RCX bytes from [RSI] to [RDI]. | |
| REP MOVS m16, m16 | F3 A5 | Move (E)CX words from DS:[(E)SI] to ES:[(E)DI]. | |
| REP MOVS m32, m32 | F3 A5 | Move (E)CX doublewords from DS:[(E)SI] to ES:[(E)DI]. | |
| REP MOVS m64, m64 | F3 REX.W A5 | Move RCX quadwords from [RSI] to [RDI]. | |
| REP OUTS DX, r/m8 | F3 6E | Output (E)CX bytes from DS:[(E)SI] to port DX. | |
| REP OUTS DX, r/m8* | F3 REX.W 6E | Output RCX bytes from [RSI] to port DX. | |
| REP OUTS DX, r/m16 | F3 6F | Output (E)CX words from DS:[(E)SI] to port DX. | |
| REP OUTS DX, r/m32 | F3 6F | Output (E)CX doublewords from DS:[(E)SI] to port DX. | |
| REP OUTS DX, r/m32 | F3 REX.W 6F | Output RCX default size from [RSI] to port DX. | |
| REP LODS AL | F3 AC | Load (E)CX bytes from DS:[(E)SI] to AL. | |
| REP LODS AL | F3 REX.W AC | Load RCX bytes from [RSI] to AL. | |
| REP LODS AX | F3 AD | Load (E)CX words from DS:[(E)SI] to AX. | |
| REP LODS EAX | F3 AD | Load (E)CX doublewords from DS:[(E)SI] to EAX. | |
| REP LODS RAX | F3 REX.W AD | Load RCX quadwords from [RSI] to RAX. | |
| REP STOS m8 | F3 AA | Fill (E)CX bytes at ES:[(E)DI] with AL. | |
| REP STOS m8 | F3 REX.W AA | Fill RCX bytes at [RDI] with AL. | |
| REP STOS m16 | F3 AB | Fill (E)CX words at ES:[(E)DI] with AX. | |
| REP STOS m32 | F3 AB | Fill (E)CX doublewords at ES:[(E)DI] with EAX. | |
| REP STOS m64 | F3 REX.W AB | Fill RCX quadwords at [RDI] with RAX. | |
| REPE CMPS m8, m8 | F3 A6 | Find nonmatching bytes in ES:[(E)DI] and DS:[(E)SI]. | |
| REPE CMPS m8, m8 | F3 REX.W A6 | Find non-matching bytes in [RDI] and [RSI]. | |
| REPE CMPS m16, m16 | F3 A7 | Find nonmatching words in ES:[(E)DI] and DS:[(E)SI]. | |
| REPE CMPS m32, m32 | F3 A7 | Find nonmatching doublewords in ES:[(E)DI] and DS:[(E)SI]. | |
| REPE CMPS m64, m64 | F3 REX.W A7 | Find non-matching quadwords in [RDI] and [RSI]. | |
| REPE SCAS m8 | F3 AE | Find non-AL byte starting at ES:[(E)DI]. | |
| REPE SCAS m8 | F3 REX.W AE | Find non-AL byte starting at [RDI]. | |
| REPE SCAS m16 | F3 AF | Find non-AX word starting at ES:[(E)DI]. | |
| REPE SCAS m32 | F3 AF | Find non-EAX doubleword starting at ES:[(E)DI]. | |
| REPE SCAS m64 | F3 REX.W AF | Find non-RAX quadword starting at [RDI]. | |
| REPNE CMPS m8, m8 | F2 A6 | Find matching bytes in ES:[(E)DI] and DS:[(E)SI]. | |
| REPNE CMPS m8, m8 | F2 REX.W A6 | Find matching bytes in [RDI] and [RSI]. | |
| REPNE CMPS m16, m16 | F2 A7 | Find matching words in ES:[(E)DI] and DS:[(E)SI]. | |
| REPNE CMPS m32, m32 | F2 A7 | Find matching doublewords in ES:[(E)DI] and DS:[(E)SI]. | |
| REPNE CMPS m64, m64 | F2 REX.W A7 | Find matching doublewords in [RDI] and [RSI]. | |
| REPNE SCAS m8 | F2 AE | Find AL, starting at ES:[(E)DI]. | |
| REPNE SCAS m8 | F2 REX.W AE | Find AL, starting at [RDI]. | |
| REPNE SCAS m16 | F2 AF | Find AX, starting at ES:[(E)DI]. | |
| REPNE SCAS m32 | F2 AF | Find EAX, starting at ES:[(E)DI]. | |
| REPNE SCAS m64 | F2 REX.W AF | Find RAX, starting at [RDI]. | |
| RET | C3 | Near return to calling procedure. | |
| RET | CB | Far return to calling procedure. | |
| RET imm16 | C2 iw | Near return to calling procedure and pop imm16 bytes from stack. | |
| RET imm16 | CA iw | Far return to calling procedure and pop imm16 bytes from stack. | |
| RORX r32, r/m32, imm8 | VEX.LZ.F2.0F3A.W0 F0 /r ib | bmi2 | Rotate 32-bit r/m32 right imm8 times without affecting arithmetic flags. | 
| RORX r64, r/m64, imm8 | VEX.LZ.F2.0F3A.W1 F0 /r ib | bmi2 | Rotate 64-bit r/m64 right imm8 times without affecting arithmetic flags. | 
| ROUNDPD xmm1, xmm2/m128, imm8 | 66 0F 3A 09 /r ib | sse4.1 | Round packed double precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDPD xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 09 /r ib | avx | Round packed double-precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDPD ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.WIG 09 /r ib | avx | Round packed double-precision floating-point values in ymm2/m256 and place the result in ymm1. The rounding mode is determined by imm8. | 
| ROUNDPS xmm1, xmm2/m128, imm8 | 66 0F 3A 08 /r ib | sse4.1 | Round packed single precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDPS xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.WIG 08 /r ib | avx | Round packed single-precision floating-point values in xmm2/m128 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDPS ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.WIG 08 /r ib | avx | Round packed single-precision floating-point values in ymm2/m256 and place the result in ymm1. The rounding mode is determined by imm8. | 
| ROUNDSD xmm1, xmm2/m64, imm8 | 66 0F 3A 0B /r ib | sse4.1 | Round the low packed double precision floating-point value in xmm2/m64 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDSD xmm1, xmm2, xmm3/m64, imm8 | VEX.NDS.LIG.66.0F3A.WIG 0B /r ib | avx | Round the low packed double precision floating-point value in xmm3/m64 and place the result in xmm1. The rounding mode is determined by imm8. Upper packed double precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| ROUNDSS xmm1, xmm2/m32, imm8 | 66 0F 3A 0A /r ib | sse4.1 | Round the low packed single precision floating-point value in xmm2/m32 and place the result in xmm1. The rounding mode is determined by imm8. | 
| VROUNDSS xmm1, xmm2, xmm3/m32, imm8 | VEX.NDS.LIG.66.0F3A.WIG 0A /r ib | avx | Round the low packed single precision floating-point value in xmm3/m32 and place the result in xmm1. The rounding mode is determined by imm8. Also, upper packed single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. | 
| RSM | 0F AA | Resume operation of interrupted program. | |
| RSQRTPS xmm1, xmm2/m128 | 0F 52 /r | sse | Computes the approximate reciprocals of the square roots of the packed single-precision floating-point values in xmm2/m128 and stores the results in xmm1. | 
| VRSQRTPS xmm1, xmm2/m128 | VEX.128.0F.WIG 52 /r | avx | Computes the approximate reciprocals of the square roots of packed single-precision values in xmm2/mem and stores the results in xmm1. | 
| VRSQRTPS ymm1, ymm2/m256 | VEX.256.0F.WIG 52 /r | avx | Computes the approximate reciprocals of the square roots of packed single-precision values in ymm2/mem and stores the results in ymm1. | 
| RSQRTSS xmm1, xmm2/m32 | F3 0F 52 /r | sse | Computes the approximate reciprocal of the square root of the low single-precision floating-point value in xmm2/m32 and stores the results in xmm1. | 
| VRSQRTSS xmm1, xmm2, xmm3/m32 | VEX.NDS.LIG.F3.0F.WIG 52 /r | avx | Computes the approximate reciprocal of the square root of the low single precision floating-point value in xmm3/m32 and stores the results in xmm1. Also, upper single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. | 
| SAHF | 9E | Loads SF, ZF, AF, PF, and CF from AH into EFLAGS register. | |
| SAL r/m8, 1 | D0 /4 | Multiply r/m8 by 2, once. | |
| SAL r/m8, 1 | REX + D0 /4 | Multiply r/m8 by 2, once. | |
| SAL r/m8, CL | D2 /4 | Multiply r/m8 by 2, CL times. | |
| SAL r/m8, CL | REX + D2 /4 | Multiply r/m8 by 2, CL times. | |
| SAL r/m8, imm8 | C0 /4 ib | Multiply r/m8 by 2, imm8 times. | |
| SAL r/m8, imm8 | REX + C0 /4 ib | Multiply r/m8 by 2, imm8 times. | |
| SAL r/m16, 1 | D1 /4 | Multiply r/m16 by 2, once. | |
| SAL r/m16, CL | D3 /4 | Multiply r/m16 by 2, CL times. | |
| SAL r/m16, imm8 | C1 /4 ib | Multiply r/m16 by 2, imm8 times. | |
| SAL r/m32, 1 | D1 /4 | Multiply r/m32 by 2, once. | |
| SAL r/m64, 1 | REX.W + D1 /4 | Multiply r/m64 by 2, once. | |
| SAL r/m32, CL | D3 /4 | Multiply r/m32 by 2, CL times. | |
| SAL r/m64, CL | REX.W + D3 /4 | Multiply r/m64 by 2, CL times. | |
| SAL r/m32, imm8 | C1 /4 ib | Multiply r/m32 by 2, imm8 times. | |
| SAL r/m64, imm8 | REX.W + C1 /4 ib | Multiply r/m64 by 2, imm8 times. | |
| SAR r/m8, 1 | D0 /7 | Signed divide* r/m8 by 2, once. | |
| SAR r/m8, 1 | REX + D0 /7 | Signed divide* r/m8 by 2, once. | |
| SAR r/m8, CL | D2 /7 | Signed divide* r/m8 by 2, CL times. | |
| SAR r/m8, CL | REX + D2 /7 | Signed divide* r/m8 by 2, CL times. | |
| SAR r/m8, imm8 | C0 /7 ib | Signed divide* r/m8 by 2, imm8 time. | |
| SAR r/m8, imm8 | REX + C0 /7 ib | Signed divide* r/m8 by 2, imm8 times. | |
| SAR r/m16,1 | D1 /7 | Signed divide* r/m16 by 2, once. | |
| SAR r/m16, CL | D3 /7 | Signed divide* r/m16 by 2, CL times. | |
| SAR r/m16, imm8 | C1 /7 ib | Signed divide* r/m16 by 2, imm8 times. | |
| SAR r/m32, 1 | D1 /7 | Signed divide* r/m32 by 2, once. | |
| SAR r/m64, 1 | REX.W + D1 /7 | Signed divide* r/m64 by 2, once. | |
| SAR r/m32, CL | D3 /7 | Signed divide* r/m32 by 2, CL times. | |
| SAR r/m64, CL | REX.W + D3 /7 | Signed divide* r/m64 by 2, CL times. | |
| SAR r/m32, imm8 | C1 /7 ib | Signed divide* r/m32 by 2, imm8 times. | |
| SAR r/m64, imm8 | REX.W + C1 /7 ib | Signed divide* r/m64 by 2, imm8 times | |
| SHL r/m8, 1 | D0 /4 | Multiply r/m8 by 2, once. | |
| SHL r/m8, 1 | REX + D0 /4 | Multiply r/m8 by 2, once. | |
| SHL r/m8, CL | D2 /4 | Multiply r/m8 by 2, CL times. | |
| SHL r/m8, CL | REX + D2 /4 | Multiply r/m8 by 2, CL times. | |
| SHL r/m8, imm8 | C0 /4 ib | Multiply r/m8 by 2, imm8 times. | |
| SHL r/m8, imm8 | REX + C0 /4 ib | Multiply r/m8 by 2, imm8 times. | |
| SHL r/m16,1 | D1 /4 | Multiply r/m16 by 2, once. | |
| SHL r/m16, CL | D3 /4 | Multiply r/m16 by 2, CL times. | |
| SHL r/m16, imm8 | C1 /4 ib | Multiply r/m16 by 2, imm8 times. | |
| SHL r/m32,1 | D1 /4 | Multiply r/m32 by 2, once. | |
| SHL r/m64,1 | REX.W + D1 /4 | Multiply r/m64 by 2, once. | |
| SHL r/m32, CL | D3 /4 | Multiply r/m32 by 2, CL times. | |
| SHL r/m64, CL | REX.W + D3 /4 | Multiply r/m64 by 2, CL times. | |
| SHL r/m32, imm8 | C1 /4 ib | Multiply r/m32 by 2, imm8 times. | |
| SHL r/m64, imm8 | REX.W + C1 /4 ib | Multiply r/m64 by 2, imm8 times. | |
| SHR r/m8,1 | D0 /5 | Unsigned divide r/m8 by 2, once. | |
| SHR r/m8, 1 | REX + D0 /5 | Unsigned divide r/m8 by 2, once. | |
| SHR r/m8, CL | D2 /5 | Unsigned divide r/m8 by 2, CL times. | |
| SHR r/m8, CL | REX + D2 /5 | Unsigned divide r/m8 by 2, CL times. | |
| SHR r/m8, imm8 | C0 /5 ib | Unsigned divide r/m8 by 2, imm8 times. | |
| SHR r/m8, imm8 | REX + C0 /5 ib | Unsigned divide r/m8 by 2, imm8 times. | |
| SHR r/m16, 1 | D1 /5 | Unsigned divide r/m16 by 2, once. | |
| SHR r/m16, CL | D3 /5 | Unsigned divide r/m16 by 2, CL times | |
| SHR r/m16, imm8 | C1 /5 ib | Unsigned divide r/m16 by 2, imm8 times. | |
| SHR r/m32, 1 | D1 /5 | Unsigned divide r/m32 by 2, once. | |
| SHR r/m64, 1 | REX.W + D1 /5 | Unsigned divide r/m64 by 2, once. | |
| SHR r/m32, CL | D3 /5 | Unsigned divide r/m32 by 2, CL times. | |
| SHR r/m64, CL | REX.W + D3 /5 | Unsigned divide r/m64 by 2, CL times. | |
| SHR r/m32, imm8 | C1 /5 ib | Unsigned divide r/m32 by 2, imm8 times. | |
| SHR r/m64, imm8 | REX.W + C1 /5 ib | Unsigned divide r/m64 by 2, imm8 times. | |
| SARX r32a, r/m32, r32b | VEX.NDS.LZ.F3.0F38.W0 F7 /r | bmi2 | Shift r/m32 arithmetically right with count specified in r32b. | 
| SHLX r32a, r/m32, r32b | VEX.NDS.LZ.66.0F38.W0 F7 /r | bmi2 | Shift r/m32 logically left with count specified in r32b. | 
| SHRX r32a, r/m32, r32b | VEX.NDS.LZ.F2.0F38.W0 F7 /r | bmi2 | Shift r/m32 logically right with count specified in r32b. | 
| SARX r64a, r/m64, r64b | VEX.NDS.LZ.F3.0F38.W1 F7 /r | bmi2 | Shift r/m64 arithmetically right with count specified in r64b. | 
| SHLX r64a, r/m64, r64b | VEX.NDS.LZ.66.0F38.W1 F7 /r | bmi2 | Shift r/m64 logically left with count specified in r64b. | 
| SHRX r64a, r/m64, r64b | VEX.NDS.LZ.F2.0F38.W1 F7 /r | bmi2 | Shift r/m64 logically right with count specified in r64b. | 
| SBB AL, imm8 | 1C ib | Subtract with borrow imm8 from AL. | |
| SBB AX, imm16 | 1D iw | Subtract with borrow imm16 from AX. | |
| SBB EAX, imm32 | 1D id | Subtract with borrow imm32 from EAX. | |
| SBB RAX, imm32 | REX.W + 1D id | Subtract with borrow sign-extended imm.32 to 64-bits from RAX. | |
| SBB r/m8, imm8 | 80 /3 ib | Subtract with borrow imm8 from r/m8. | |
| SBB r/m8, imm8 | REX + 80 /3 ib | Subtract with borrow imm8 from r/m8. | |
| SBB r/m16, imm16 | 81 /3 iw | Subtract with borrow imm16 from r/m16. | |
| SBB r/m32, imm32 | 81 /3 id | Subtract with borrow imm32 from r/m32. | |
| SBB r/m64, imm32 | REX.W + 81 /3 id | Subtract with borrow sign-extended imm32 to 64-bits from r/m64. | |
| SBB r/m16, imm8 | 83 /3 ib | Subtract with borrow sign-extended imm8 from r/m16. | |
| SBB r/m32, imm8 | 83 /3 ib | Subtract with borrow sign-extended imm8 from r/m32. | |
| SBB r/m64, imm8 | REX.W + 83 /3 ib | Subtract with borrow sign-extended imm8 from r/m64. | |
| SBB r/m8, r8 | 18 /r | Subtract with borrow r8 from r/m8. | |
| SBB r/m8, r8 | REX + 18 /r | Subtract with borrow r8 from r/m8. | |
| SBB r/m16, r16 | 19 /r | Subtract with borrow r16 from r/m16. | |
| SBB r/m32, r32 | 19 /r | Subtract with borrow r32 from r/m32. | |
| SBB r/m64, r64 | REX.W + 19 /r | Subtract with borrow r64 from r/m64. | |
| SBB r8, r/m8 | 1A /r | Subtract with borrow r/m8 from r8. | |
| SBB r8, r/m8 | REX + 1A /r | Subtract with borrow r/m8 from r8. | |
| SBB r16, r/m16 | 1B /r | Subtract with borrow r/m16 from r16. | |
| SBB r32, r/m32 | 1B /r | Subtract with borrow r/m32 from r32. | |
| SBB r64, r/m64 | REX.W + 1B /r | Subtract with borrow r/m64 from r64. | |
| SCAS m8 | AE | Compare AL with byte at ES:(E)DI or RDI, then set status flags. | |
| SCAS m16 | AF | Compare AX with word at ES:(E)DI or RDI, then set status flags. | |
| SCAS m32 | AF | Compare EAX with doubleword at ES(E)DI or RDI then set status flags. | |
| SCAS m64 | REX.W + AF | Compare RAX with quadword at RDI or EDI then set status flags. | |
| SCASB | AE | Compare AL with byte at ES:(E)DI or RDI then set status flags. | |
| SCASW | AF | Compare AX with word at ES:(E)DI or RDI then set status flags. | |
| SCASD | AF | Compare EAX with doubleword at ES:(E)DI or RDI then set status flags. | |
| SCASQ | REX.W + AF | Compare RAX with quadword at RDI or EDI then set status flags. | |
| SETA r/m8 | 0F 97 | Set byte if above (CF=0 and ZF=0). | |
| SETA r/m8 | REX + 0F 97 | Set byte if above (CF=0 and ZF=0). | |
| SETAE r/m8 | 0F 93 | Set byte if above or equal (CF=0). | |
| SETAE r/m8 | REX + 0F 93 | Set byte if above or equal (CF=0). | |
| SETB r/m8 | 0F 92 | Set byte if below (CF=1). | |
| SETB r/m8 | REX + 0F 92 | Set byte if below (CF=1). | |
| SETBE r/m8 | 0F 96 | Set byte if below or equal (CF=1 or ZF=1). | |
| SETBE r/m8 | REX + 0F 96 | Set byte if below or equal (CF=1 or ZF=1). | |
| SETC r/m8 | 0F 92 | Set byte if carry (CF=1). | |
| SETC r/m8 | REX + 0F 92 | Set byte if carry (CF=1). | |
| SETE r/m8 | 0F 94 | Set byte if equal (ZF=1). | |
| SETE r/m8 | REX + 0F 94 | Set byte if equal (ZF=1). | |
| SETG r/m8 | 0F 9F | Set byte if greater (ZF=0 and SF=OF). | |
| SETG r/m8 | REX + 0F 9F | Set byte if greater (ZF=0 and SF=OF). | |
| SETGE r/m8 | 0F 9D | Set byte if greater or equal (SF=OF). | |
| SETGE r/m8 | REX + 0F 9D | Set byte if greater or equal (SF=OF). | |
| SETL r/m8 | 0F 9C | Set byte if less (SF≠ OF). | |
| SETL r/m8 | REX + 0F 9C | Set byte if less (SF≠ OF). | |
| SETLE r/m8 | 0F 9E | Set byte if less or equal (ZF=1 or SF≠ OF). | |
| SETLE r/m8 | REX + 0F 9E | Set byte if less or equal (ZF=1 or SF≠ OF). | |
| SETNA r/m8 | 0F 96 | Set byte if not above (CF=1 or ZF=1). | |
| SETNA r/m8 | REX + 0F 96 | Set byte if not above (CF=1 or ZF=1). | |
| SETNAE r/m8 | 0F 92 | Set byte if not above or equal (CF=1). | |
| SETNAE r/m8 | REX + 0F 92 | Set byte if not above or equal (CF=1). | |
| SETNB r/m8 | 0F 93 | Set byte if not below (CF=0). | |
| SETNB r/m8 | REX + 0F 93 | Set byte if not below (CF=0). | |
| SETNBE r/m8 | 0F 97 | Set byte if not below or equal (CF=0 and ZF=0). | |
| SETNBE r/m8 | REX + 0F 97 | Set byte if not below or equal (CF=0 and ZF=0). | |
| SETNC r/m8 | 0F 93 | Set byte if not carry (CF=0). | |
| SETNC r/m8 | REX + 0F 93 | Set byte if not carry (CF=0). | |
| SETNE r/m8 | 0F 95 | Set byte if not equal (ZF=0). | |
| SETNE r/m8 | REX + 0F 95 | Set byte if not equal (ZF=0). | |
| SETNG r/m8 | 0F 9E | Set byte if not greater (ZF=1 or SF≠ OF) | |
| SETNG r/m8 | REX + 0F 9E | Set byte if not greater (ZF=1 or SF≠ OF). | |
| SETNGE r/m8 | 0F 9C | Set byte if not greater or equal (SF≠ OF). | |
| SETNGE r/m8 | REX + 0F 9C | Set byte if not greater or equal (SF≠ OF). | |
| SETNL r/m8 | 0F 9D | Set byte if not less (SF=OF). | |
| SETNL r/m8 | REX + 0F 9D | Set byte if not less (SF=OF). | |
| SETNLE r/m8 | 0F 9F | Set byte if not less or equal (ZF=0 and SF=OF). | |
| SETNLE r/m8 | REX + 0F 9F | Set byte if not less or equal (ZF=0 and SF=OF). | |
| SETNO r/m8 | 0F 91 | Set byte if not overflow (OF=0). | |
| SETNO r/m8 | REX + 0F 91 | Set byte if not overflow (OF=0). | |
| SETNP r/m8 | 0F 9B | Set byte if not parity (PF=0). | |
| SETNP r/m8 | REX + 0F 9B | Set byte if not parity (PF=0). | |
| SETNS r/m8 | 0F 99 | Set byte if not sign (SF=0). | |
| SETNS r/m8 | REX + 0F 99 | Set byte if not sign (SF=0). | |
| SETNZ r/m8 | 0F 95 | Set byte if not zero (ZF=0). | |
| SETNZ r/m8 | REX + 0F 95 | Set byte if not zero (ZF=0). | |
| SETO r/m8 | 0F 90 | Set byte if overflow (OF=1) | |
| SETO r/m8 | REX + 0F 90 | Set byte if overflow (OF=1). | |
| SETP r/m8 | 0F 9A | Set byte if parity (PF=1). | |
| SETP r/m8 | REX + 0F 9A | Set byte if parity (PF=1). | |
| SETPE r/m8 | 0F 9A | Set byte if parity even (PF=1). | |
| SETPE r/m8 | REX + 0F 9A | Set byte if parity even (PF=1). | |
| SETPO r/m8 | 0F 9B | Set byte if parity odd (PF=0). | |
| SETPO r/m8 | REX + 0F 9B | Set byte if parity odd (PF=0). | |
| SETS r/m8 | 0F 98 | Set byte if sign (SF=1). | |
| SETS r/m8 | REX + 0F 98 | Set byte if sign (SF=1). | |
| SETZ r/m8 | 0F 94 | Set byte if zero (ZF=1). | |
| SETZ r/m8 | REX + 0F 94 | Set byte if zero (ZF=1). | |
| SFENCE | 0F AE F8 | Serializes store operations. | |
| SGDT m | 0F 01 /0 | Store GDTR to m. | |
| SHA1MSG1 xmm1, xmm2/m128 | 0F 38 C9 /r | sha | Performs an intermediate calculation for the next four SHA1 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. | 
| SHA1MSG2 xmm1, xmm2/m128 | 0F 38 CA /r | sha | Performs the final calculation for the next four SHA1 message dwords using intermediate results from xmm1 and the previous message dwords from xmm2/m128, storing the result in xmm1. | 
| SHA1NEXTE xmm1, xmm2/m128 | 0F 38 C8 /r | sha | Calculates SHA1 state variable E after four rounds of operation from the current SHA1 state variable A in xmm1. The calculated value of the SHA1 state variable E is added to the scheduled dwords in xmm2/m128, and stored with some of the scheduled dwords in xmm1. | 
| SHA1RNDS4 xmm1, xmm2/m128, imm8 | 0F 3A CC /r ib | sha | Performs four rounds of SHA1 operation operating on SHA1 state (A,B,C,D) from xmm1, with a pre-computed sum of the next 4 round message dwords and state variable E from xmm2/m128. The immediate byte controls logic functions and round constants. | 
| SHA256MSG1 xmm1, xmm2/m128 | 0F 38 CC /r | sha | Performs an intermediate calculation for the next four SHA256 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. | 
| SHA256MSG2 xmm1, xmm2/m128 | 0F 38 CD /r | sha | Performs the final calculation for the next four SHA256 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. | 
| SHA256RNDS2 xmm1, xmm2/m128, <XMM0> | 0F 38 CB /r | sha | Perform 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from xmm1, an initial SHA256 state (A,B,E,F) from xmm2/m128, and a pre-computed sum of the next 2 round mes-sage dwords and the corresponding round constants from the implicit operand XMM0, storing the updated SHA256 state (A,B,E,F) result in xmm1. | 
| SHLD r/m16, r16, imm8 | 0F A4 /r ib | Shift r/m16 to left imm8 places while shifting bits from r16 in from the right. | |
| SHLD r/m16, r16, CL | 0F A5 /r | Shift r/m16 to left CL places while shifting bits from r16 in from the right. | |
| SHLD r/m32, r32, imm8 | 0F A4 /r ib | Shift r/m32 to left imm8 places while shifting bits from r32 in from the right. | |
| SHLD r/m64, r64, imm8 | REX.W + 0F A4 /r ib | Shift r/m64 to left imm8 places while shifting bits from r64 in from the right. | |
| SHLD r/m32, r32, CL | 0F A5 /r | Shift r/m32 to left CL places while shifting bits from r32 in from the right. | |
| SHLD r/m64, r64, CL | REX.W + 0F A5 /r | Shift r/m64 to left CL places while shifting bits from r64 in from the right. | |
| SHRD r/m16, r16, imm8 | 0F AC /r ib | Shift r/m16 to right imm8 places while shifting bits from r16 in from the left. | |
| SHRD r/m16, r16, CL | 0F AD /r | Shift r/m16 to right CL places while shifting bits from r16 in from the left. | |
| SHRD r/m32, r32, imm8 | 0F AC /r ib | Shift r/m32 to right imm8 places while shifting bits from r32 in from the left. | |
| SHRD r/m64, r64, imm8 | REX.W + 0F AC /r ib | Shift r/m64 to right imm8 places while shifting bits from r64 in from the left. | |
| SHRD r/m32, r32, CL | 0F AD /r | Shift r/m32 to right CL places while shifting bits from r32 in from the left. | |
| SHRD r/m64, r64, CL | REX.W + 0F AD /r | Shift r/m64 to right CL places while shifting bits from r64 in from the left. | |
| SHUFPD xmm1, xmm2/m128, imm8 | 66 0F C6 /r ib | sse2 | Shuffle two pairs of double-precision floating-point values from xmm1 and xmm2/m128 using imm8 to select from each pair, interleaved result is stored in xmm1. | 
| VSHUFPD xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F.WIG C6 /r ib | avx | Shuffle two pairs of double-precision floating-point values from xmm2 and xmm3/m128 using imm8 to select from each pair, interleaved result is stored in xmm1. | 
| VSHUFPD ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F.WIG C6 /r ib | avx | Shuffle four pairs of double-precision floating-point values from ymm2 and ymm3/m256 using imm8 to select from each pair, interleaved result is stored in xmm1. | 
| VSHUFPD xmm1{k1}{z}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F.W1 C6 /r ib | avx512 | Shuffle two paris of double-precision floating-point values from xmm2 and xmm3/m128/m64bcst using imm8 to select from each pair. store interleaved results in xmm1 subject to writemask k1. | 
| VSHUFPD ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F.W1 C6 /r ib | avx512 | Shuffle four paris of double-precision floating-point values from ymm2 and ymm3/m256/m64bcst using imm8 to select from each pair. store interleaved results in ymm1 subject to writemask k1. | 
| VSHUFPD zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F.W1 C6 /r ib | avx512 | Shuffle eight paris of double-precision floating-point values from zmm2 and zmm3/m512/m64bcst using imm8 to select from each pair. store interleaved results in zmm1 subject to writemask k1. | 
| SHUFPS xmm1, xmm3/m128, imm8 | 0F C6 /r ib | sse | Select from quadruplet of single-precision floating-point values in xmm1 and xmm2/m128 using imm8, interleaved result pairs are stored in xmm1. | 
| VSHUFPS xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.0F.WIG C6 /r ib | avx | Select from quadruplet of single-precision floating-point values in xmm1 and xmm2/m128 using imm8, interleaved result pairs are stored in xmm1. | 
| VSHUFPS ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.0F.WIG C6 /r ib | avx | Select from quadruplet of single-precision floating-point values in ymm2 and ymm3/m256 using imm8, interleaved result pairs are stored in ymm1. | 
| VSHUFPS xmm1{k1}{z}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.0F.W0 C6 /r ib | avx512 | Select from quadruplet of single-precision floating-point values in xmm1 and xmm2/m128 using imm8, interleaved result pairs are stored in xmm1, subject to writemask k1. | 
| VSHUFPS ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.0F.W0 C6 /r ib | avx512 | Select from quadruplet of single-precision floating-point values in ymm2 and ymm3/m256 using imm8, interleaved result pairs are stored in ymm1, subject to writemask k1. | 
| VSHUFPS zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.0F.W0 C6 /r ib | avx512 | Select from quadruplet of single-precision floating-point values in zmm2 and zmm3/m512 using imm8, interleaved result pairs are stored in zmm1, subject to writemask k1. | 
| SIDT m | 0F 01 /1 | Store IDTR to m. | |
| SLDT r/m16 | 0F 00 /0 | Stores segment selector from LDTR in r/m16. | |
| SLDT r64/m16 | REX.W + 0F 00 /0 | Stores segment selector from LDTR in r64/m16. | |
| SMSW r/m16 | 0F 01 /4 | Store machine status word to r/m16. | |
| SMSW r32/m16 | 0F 01 /4 | Store machine status word in low-order 16 bits of r32/m16; high-order 16 bits of r32 are undefined. | |
| SMSW r64/m16 | REX.W + 0F 01 /4 | Store machine status word in low-order 16 bits of r64/m16; high-order 16 bits of r32 are undefined. | |
| SQRTPD xmm1, xmm2/m128 | 66 0F 51 /r | sse2 | Computes Square Roots of the packed double-precision floating-point values in xmm2/m128 and stores the result in xmm1. | 
| VSQRTPD xmm1, xmm2/m128 | VEX.128.66.0F.WIG 51 /r | avx | Computes Square Roots of the packed double-precision floating-point values in xmm2/m128 and stores the result in xmm1. | 
| VSQRTPD ymm1, ymm2/m256 | VEX.256.66.0F.WIG 51 /r | avx | Computes Square Roots of the packed double-precision floating-point values in ymm2/m256 and stores the result in ymm1. | 
| VSQRTPD xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F.W1 51 /r | avx512 | Computes Square Roots of the packed double-precision floating-point values in xmm2/m128/m64bcst and stores the result in xmm1 subject to writemask k1. | 
| VSQRTPD ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F.W1 51 /r | avx512 | Computes Square Roots of the packed double-precision floating-point values in ymm2/m256/m64bcst and stores the result in ymm1 subject to writemask k1. | 
| VSQRTPD zmm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.66.0F.W1 51 /r | avx512 | Computes Square Roots of the packed double-precision floating-point values in zmm2/m512/m64bcst and stores the result in zmm1 subject to writemask k1. | 
| SQRTPS xmm1, xmm2/m128 | 0F 51 /r | sse | Computes Square Roots of the packed single-precision floating-point values in xmm2/m128 and stores the result in xmm1. | 
| VSQRTPS xmm1, xmm2/m128 | VEX.128.0F.WIG 51 /r | avx | Computes Square Roots of the packed single-precision floating-point values in xmm2/m128 and stores the result in xmm1. | 
| VSQRTPS ymm1, ymm2/m256 | VEX.256.0F.WIG 51/r | avx | Computes Square Roots of the packed single-precision floating-point values in ymm2/m256 and stores the result in ymm1. | 
| VSQRTPS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.0F.W0 51 /r | avx512 | Computes Square Roots of the packed single-precision floating-point values in xmm2/m128/m32bcst and stores the result in xmm1 subject to writemask k1. | 
| VSQRTPS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.0F.W0 51 /r | avx512 | Computes Square Roots of the packed single-precision floating-point values in ymm2/m256/m32bcst and stores the result in ymm1 subject to writemask k1. | 
| VSQRTPS zmm1 {k1}{z}, zmm2/m512/m32bcst{er} | EVEX.512.0F.W0 51/r | avx512 | Computes Square Roots of the packed single-precision floating-point values in zmm2/m512/m32bcst and stores the result in zmm1 subject to writemask k1. | 
| SQRTSD xmm1,xmm2/m64 | F2 0F 51/r | sse2 | Computes square root of the low double-precision floating-point value in xmm2/m64 and stores the results in xmm1. | 
| VSQRTSD xmm1,xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 51/r | avx | Computes square root of the low double-precision floating-point value in xmm3/m64 and stores the results in xmm1. Also, upper double-precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| VSQRTSD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.F2.0F.W1 51/r | avx512 | Computes square root of the low double-precision floating-point value in xmm3/m64 and stores the results in xmm1 under writemask k1. Also, upper double-precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| SQRTSS xmm1, xmm2/m32 | F3 0F 51 /r | sse | Computes square root of the low single-precision floating-point value in xmm2/m32 and stores the results in xmm1. | 
| VSQRTSS xmm1, xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 51 /r | avx | Computes square root of the low single-precision floating-point value in xmm3/m32 and stores the results in xmm1. Also, upper single-precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. | 
| VSQRTSS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.NDS.LIG.F3.0F.W0 51 /r | avx512 | Computes square root of the low single-precision floating-point value in xmm3/m32 and stores the results in xmm1 under writemask k1. Also, upper single-precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. | 
| STAC | 0F 01 CB | Set the AC flag in the EFLAGS register. | |
| STC | F9 | Set CF flag. | |
| STD | FD | Set DF flag. | |
| STI | FB | Set interrupt flag; external, maskable interrupts enabled at the end of the next instruction. | |
| STMXCSR m32 | 0F AE /3 | sse | Store contents of MXCSR register to m32. | 
| VSTMXCSR m32 | VEX.LZ.0F.WIG AE /3 | avx | Store contents of MXCSR register to m32. | 
| STOS m8 | AA | For legacy mode, store AL at address ES:(E)DI; For 64-bit mode store AL at address RDI or EDI. | |
| STOS m16 | AB | For legacy mode, store AX at address ES:(E)DI; For 64-bit mode store AX at address RDI or EDI. | |
| STOS m32 | AB | For legacy mode, store EAX at address ES:(E)DI; For 64-bit mode store EAX at address RDI or EDI. | |
| STOS m64 | REX.W + AB | Store RAX at address RDI or EDI. | |
| STOSB | AA | For legacy mode, store AL at address ES:(E)DI; For 64-bit mode store AL at address RDI or EDI. | |
| STOSW | AB | For legacy mode, store AX at address ES:(E)DI; For 64-bit mode store AX at address RDI or EDI. | |
| STOSD | AB | For legacy mode, store EAX at address ES:(E)DI; For 64-bit mode store EAX at address RDI or EDI. | |
| STOSQ | REX.W + AB | Store RAX at address RDI or EDI. | |
| STR r/m16 | 0F 00 /1 | Stores segment selector from TR in r/m16. | |
| SUB AL, imm8 | 2C ib | Subtract imm8 from AL. | |
| SUB AX, imm16 | 2D iw | Subtract imm16 from AX. | |
| SUB EAX, imm32 | 2D id | Subtract imm32 from EAX. | |
| SUB RAX, imm32 | REX.W + 2D id | Subtract imm32 sign-extended to 64-bits from RAX. | |
| SUB r/m8, imm8 | 80 /5 ib | Subtract imm8 from r/m8. | |
| SUB r/m8, imm8 | REX + 80 /5 ib | Subtract imm8 from r/m8. | |
| SUB r/m16, imm16 | 81 /5 iw | Subtract imm16 from r/m16. | |
| SUB r/m32, imm32 | 81 /5 id | Subtract imm32 from r/m32. | |
| SUB r/m64, imm32 | REX.W + 81 /5 id | Subtract imm32 sign-extended to 64-bits from r/m64. | |
| SUB r/m16, imm8 | 83 /5 ib | Subtract sign-extended imm8 from r/m16. | |
| SUB r/m32, imm8 | 83 /5 ib | Subtract sign-extended imm8 from r/m32. | |
| SUB r/m64, imm8 | REX.W + 83 /5 ib | Subtract sign-extended imm8 from r/m64. | |
| SUB r/m8, r8 | 28 /r | Subtract r8 from r/m8. | |
| SUB r/m8, r8 | REX + 28 /r | Subtract r8 from r/m8. | |
| SUB r/m16, r16 | 29 /r | Subtract r16 from r/m16. | |
| SUB r/m32, r32 | 29 /r | Subtract r32 from r/m32. | |
| SUB r/m64, r64 | REX.W + 29 /r | Subtract r64 from r/m64. | |
| SUB r8, r/m8 | 2A /r | Subtract r/m8 from r8. | |
| SUB r8, r/m8 | REX + 2A /r | Subtract r/m8 from r8. | |
| SUB r16, r/m16 | 2B /r | Subtract r/m16 from r16. | |
| SUB r32, r/m32 | 2B /r | Subtract r/m32 from r32. | |
| SUB r64, r/m64 | REX.W + 2B /r | Subtract r/m64 from r64. | |
| SUBPD xmm1, xmm2/m128 | 66 0F 5C /r | sse2 | Subtract packed double-precision floating-point values in xmm2/mem from xmm1 and store result in xmm1. | 
| VSUBPD xmm1,xmm2, xmm3/m128 | VEX.NDS.128.66.0F.WIG 5C /r | avx | Subtract packed double-precision floating-point values in xmm3/mem from xmm2 and store result in xmm1. | 
| VSUBPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F.WIG 5C /r | avx | Subtract packed double-precision floating-point values in ymm3/mem from ymm2 and store result in ymm1. | 
| VSUBPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F.W1 5C /r | avx512 | Subtract packed double-precision floating-point values from xmm3/m128/m64bcst to xmm2 and store result in xmm1 with writemask k1. | 
| VSUBPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F.W1 5C /r | avx512 | Subtract packed double-precision floating-point values from ymm3/m256/m64bcst to ymm2 and store result in ymm1 with writemask k1. | 
| VSUBPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F.W1 5C /r | avx512 | Subtract packed double-precision floating-point values from zmm3/m512/m64bcst to zmm2 and store result in zmm1 with writemask k1. | 
| SUBPS xmm1, xmm2/m128 | 0F 5C /r | sse | Subtract packed single-precision floating-point values in xmm2/mem from xmm1 and store result in xmm1. | 
| VSUBPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 5C /r | avx | Subtract packed single-precision floating-point values in xmm3/mem from xmm2 and stores result in xmm1. | 
| VSUBPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 5C /r | avx | Subtract packed single-precision floating-point values in ymm3/mem from ymm2 and stores result in ymm1. | 
| VSUBPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 5C /r | avx512 | Subtract packed single-precision floating-point values from xmm3/m128/m32bcst to xmm2 and stores result in xmm1 with writemask k1. | 
| VSUBPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 5C /r | avx512 | Subtract packed single-precision floating-point values from ymm3/m256/m32bcst to ymm2 and stores result in ymm1 with writemask k1. | 
| VSUBPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.0F.W0 5C /r | avx512 | Subtract packed single-precision floating-point values in zmm3/m512/m32bcst from zmm2 and stores result in zmm1 with writemask k1. | 
| SUBSD xmm1, xmm2/m64 | F2 0F 5C /r | sse2 | Subtract the low double-precision floating-point value in xmm2/m64 from xmm1 and store the result in xmm1. | 
| VSUBSD xmm1,xmm2, xmm3/m64 | VEX.NDS.128.F2.0F.WIG 5C /r | avx | Subtract the low double-precision floating-point value in xmm3/m64 from xmm2 and store the result in xmm1. | 
| VSUBSD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.F2.0F.W1 5C /r | avx512 | Subtract the low double-precision floating-point value in xmm3/m64 from xmm2 and store the result in xmm1 under writemask k1. | 
| SUBSS xmm1, xmm2/m32 | F3 0F 5C /r | sse | Subtract the low single-precision floating-point value in xmm2/m32 from xmm1 and store the result in xmm1. | 
| VSUBSS xmm1,xmm2, xmm3/m32 | VEX.NDS.128.F3.0F.WIG 5C /r | avx | Subtract the low single-precision floating-point value in xmm3/m32 from xmm2 and store the result in xmm1. | 
| VSUBSS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.NDS.LIG.F3.0F.W0 5C /r | avx512 | Subtract the low single-precision floating-point value in xmm3/m32 from xmm2 and store the result in xmm1 under writemask k1. | 
| SWAPGS | 0F 01 F8 | Exchanges the current GS base register value with the value contained in MSR address C0000102H. | |
| SYSCALL | 0F 05 | Fast call to privilege level 0 system procedures. | |
| SYSENTER | 0F 34 | Fast call to privilege level 0 system procedures. | |
| SYSEXIT | 0F 35 | Fast return to privilege level 3 user code. | |
| SYSEXIT | REX.W + 0F 35 | Fast return to 64-bit mode privilege level 3 user code. | |
| SYSRET | 0F 07 | Return to compatibility mode from fast system call | |
| SYSRET | REX.W + 0F 07 | Return to 64-bit mode from fast system call | |
| TEST AL, imm8 | A8 ib | AND imm8 with AL; set SF, ZF, PF according to result. | |
| TEST AX, imm16 | A9 iw | AND imm16 with AX; set SF, ZF, PF according to result. | |
| TEST EAX, imm32 | A9 id | AND imm32 with EAX; set SF, ZF, PF according to result. | |
| TEST RAX, imm32 | REX.W + A9 id | AND imm32 sign-extended to 64-bits with RAX; set SF, ZF, PF according to result. | |
| TEST r/m8, imm8 | F6 /0 ib | AND imm8 with r/m8; set SF, ZF, PF according to result. | |
| TEST r/m8, imm8 | REX + F6 /0 ib | AND imm8 with r/m8; set SF, ZF, PF according to result. | |
| TEST r/m16, imm16 | F7 /0 iw | AND imm16 with r/m16; set SF, ZF, PF according to result. | |
| TEST r/m32, imm32 | F7 /0 id | AND imm32 with r/m32; set SF, ZF, PF according to result. | |
| TEST r/m64, imm32 | REX.W + F7 /0 id | AND imm32 sign-extended to 64-bits with r/m64; set SF, ZF, PF according to result. | |
| TEST r/m8, r8 | 84 /r | AND r8 with r/m8; set SF, ZF, PF according to result. | |
| TEST r/m8, r8 | REX + 84 /r | AND r8 with r/m8; set SF, ZF, PF according to result. | |
| TEST r/m16, r16 | 85 /r | AND r16 with r/m16; set SF, ZF, PF according to result. | |
| TEST r/m32, r32 | 85 /r | AND r32 with r/m32; set SF, ZF, PF according to result. | |
| TEST r/m64, r64 | REX.W + 85 /r | AND r64 with r/m64; set SF, ZF, PF according to result. | |
| TZCNT r16, r/m16 | F3 0F BC /r | bmi1 | Count the number of trailing zero bits in r/m16, return result in r16. | 
| TZCNT r32, r/m32 | F3 0F BC /r | bmi1 | Count the number of trailing zero bits in r/m32, return result in r32. | 
| TZCNT r64, r/m64 | F3 REX.W 0F BC /r | bmi1 | Count the number of trailing zero bits in r/m64, return result in r64. | 
| UCOMISD xmm1, xmm2/m64 | 66 0F 2E /r | sse2 | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | 
| VUCOMISD xmm1, xmm2/m64 | VEX.128.66.0F.WIG 2E /r | avx | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | 
| VUCOMISD xmm1, xmm2/m64{sae} | EVEX.LIG.66.0F.W1 2E /r | avx512 | Compare low double-precision floating-point values in xmm1 and xmm2/m64 and set the EFLAGS flags accordingly. | 
| UCOMISS xmm1, xmm2/m32 | 0F 2E /r | sse | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| VUCOMISS xmm1, xmm2/m32 | VEX.128.0F.WIG 2E /r | avx | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| VUCOMISS xmm1, xmm2/m32{sae} | EVEX.LIG.0F.W0 2E /r | avx512 | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | 
| UD2 | 0F 0B | Raise invalid opcode exception. | |
| UNPCKHPD xmm1, xmm2/m128 | 66 0F 15 /r | sse2 | Unpacks and Interleaves double-precision floating-point values from high quadwords of xmm1 and xmm2/m128. | 
| VUNPCKHPD xmm1,xmm2, xmm3/m128 | VEX.128.66.0F.WIG 15 /r | avx | Unpacks and Interleaves double-precision floating-point values from high quadwords of xmm2 and xmm3/m128. | 
| VUNPCKHPD ymm1,ymm2, ymm3/m256 | VEX.256.66.0F.WIG 15 /r | avx | Unpacks and Interleaves double-precision floating-point values from high quadwords of ymm2 and ymm3/m256. | 
| VUNPCKHPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.128.66.0F.W1 15 /r | avx512 | Unpacks and Interleaves double precision floating-point values from high quadwords of xmm2 and xmm3/m128/m64bcst subject to writemask k1. | 
| VUNPCKHPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.256.66.0F.W1 15 /r | avx512 | Unpacks and Interleaves double precision floating-point values from high quadwords of ymm2 and ymm3/m256/m64bcst subject to writemask k1. | 
| VUNPCKHPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.512.66.0F.W1 15 /r | avx512 | Unpacks and Interleaves double-precision floating-point values from high quadwords of zmm2 and zmm3/m512/m64bcst subject to writemask k1. | 
| UNPCKHPS xmm1, xmm2/m128 | NP 0F 15 /r | sse | Unpacks and Interleaves single-precision floating-point values from high quadwords of xmm1 and xmm2/m128. | 
| VUNPCKHPS xmm1, xmm2, xmm3/m128 | VEX.128.0F.WIG 15 /r | avx | Unpacks and Interleaves single-precision floating-point values from high quadwords of xmm2 and xmm3/m128. | 
| VUNPCKHPS ymm1, ymm2, ymm3/m256 | VEX.256.0F.WIG 15 /r | avx | Unpacks and Interleaves single-precision floating-point values from high quadwords of ymm2 and ymm3/m256. | 
| VUNPCKHPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.128.0F.W0 15 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from high quadwords of xmm2 and xmm3/m128/m32bcst and write result to xmm1 subject to writemask k1. | 
| VUNPCKHPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.256.0F.W0 15 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from high quadwords of ymm2 and ymm3/m256/m32bcst and write result to ymm1 subject to writemask k1. | 
| VUNPCKHPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.512.0F.W0 15 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from high quadwords of zmm2 and zmm3/m512/m32bcst and write result to zmm1 subject to writemask k1. | 
| UNPCKLPD xmm1, xmm2/m128 | 66 0F 14 /r | sse | Unpacks and Interleaves double-precision floating-point values from low quadwords of xmm1 and xmm2/m128. | 
| VUNPCKLPD xmm1,xmm2, xmm3/m128 | VEX.128.66.0F.WIG 14 /r | avx | Unpacks and Interleaves double-precision floating-point values from low quadwords of xmm2 and xmm3/m128. | 
| VUNPCKLPD ymm1,ymm2, ymm3/m256 | VEX.256.66.0F.WIG 14 /r | avx | Unpacks and Interleaves double-precision floating-point values from low quadwords of ymm2 and ymm3/m256. | 
| VUNPCKLPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.128.66.0F.W1 14 /r | avx512 | Unpacks and Interleaves double precision floating-point values from low quadwords of xmm2 and xmm3/m128/m64bcst subject to write mask k1. | 
| VUNPCKLPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.256.66.0F.W1 14 /r | avx512 | Unpacks and Interleaves double precision floating-point values from low quadwords of ymm2 and ymm3/m256/m64bcst subject to write mask k1. | 
| VUNPCKLPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.512.66.0F.W1 14 /r | avx512 | Unpacks and Interleaves double-precision floating-point values from low quadwords of zmm2 and zmm3/m512/m64bcst subject to write mask k1. | 
| UNPCKLPS xmm1, xmm2/m128 | NP 0F 14 /r | sse | Unpacks and Interleaves single-precision floating-point values from low quadwords of xmm1 and xmm2/m128. | 
| VUNPCKLPS xmm1,xmm2, xmm3/m128 | VEX.128.0F.WIG 14 /r | avx | Unpacks and Interleaves single-precision floating-point values from low quadwords of xmm2 and xmm3/m128. | 
| VUNPCKLPS ymm1,ymm2,ymm3/m256 | VEX.256.0F.WIG 14 /r | avx | Unpacks and Interleaves single-precision floating-point values from low quadwords of ymm2 and ymm3/m256. | 
| VUNPCKLPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.128.0F.W0 14 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from low quadwords of xmm2 and xmm3/mem and write result to xmm1 subject to write mask k1. | 
| VUNPCKLPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.256.0F.W0 14 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from low quadwords of ymm2 and ymm3/mem and write result to ymm1 subject to write mask k1. | 
| VUNPCKLPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.512.0F.W0 14 /r | avx512 | Unpacks and Interleaves single-precision floating-point values from low quadwords of zmm2 and zmm3/m512/m32bcst and write result to zmm1 subject to write mask k1. | 
| VALIGND xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.66.0F3A.W0 03 /r ib | avx512 | Shift right and merge vectors xmm2 and xmm3/m128/m32bcst with double-word granularity using imm8 as number of elements to shift, and store the final result in xmm1, under writemask. | 
| VALIGNQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F3A.W1 03 /r ib | avx512 | Shift right and merge vectors xmm2 and xmm3/m128/m64bcst with quad-word granularity using imm8 as number of elements to shift, and store the final result in xmm1, under writemask. | 
| VALIGND ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 03 /r ib | avx512 | Shift right and merge vectors ymm2 and ymm3/m256/m32bcst with double-word granularity using imm8 as number of elements to shift, and store the final result in ymm1, under writemask. | 
| VALIGNQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 03 /r ib | avx512 | Shift right and merge vectors ymm2 and ymm3/m256/m64bcst with quad-word granularity using imm8 as number of elements to shift, and store the final result in ymm1, under writemask. | 
| VALIGND zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.66.0F3A.W0 03 /r ib | avx512 | Shift right and merge vectors zmm2 and zmm3/m512/m32bcst with double-word granularity using imm8 as number of elements to shift, and store the final result in zmm1, under writemask. | 
| VALIGNQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F3A.W1 03 /r ib | avx512 | Shift right and merge vectors zmm2 and zmm3/m512/m64bcst with quad-word granularity using imm8 as number of elements to shift, and store the final result in zmm1, under writemask. | 
| VBLENDMPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 65 /r | avx512 | Blend double-precision vector xmm2 and double-precision vector xmm3/m128/m64bcst and store the result in xmm1, under control mask. | 
| VBLENDMPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 65 /r | avx512 | Blend double-precision vector ymm2 and double-precision vector ymm3/m256/m64bcst and store the result in ymm1, under control mask. | 
| VBLENDMPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 65 /r | avx512 | Blend double-precision vector zmm2 and double-precision vector zmm3/m512/m64bcst and store the result in zmm1, under control mask. | 
| VBLENDMPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 65 /r | avx512 | Blend single-precision vector xmm2 and single-precision vector xmm3/m128/m32bcst and store the result in xmm1, under control mask. | 
| VBLENDMPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 65 /r | avx512 | Blend single-precision vector ymm2 and single-precision vector ymm3/m256/m32bcst and store the result in ymm1, under control mask. | 
| VBLENDMPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 65 /r | avx512 | Blend single-precision vector zmm2 and single-precision vector zmm3/m512/m32bcst using k1 as select control and store the result in zmm1. | 
| VBROADCASTSS xmm1, m32 | VEX.128.66.0F38.W0 18 /r | avx | Broadcast single-precision floating-point element in mem to four locations in xmm1. | 
| VBROADCASTSS ymm1, m32 | VEX.256.66.0F38.W0 18 /r | avx | Broadcast single-precision floating-point element in mem to eight locations in ymm1. | 
| VBROADCASTSD ymm1, m64 | VEX.256.66.0F38.W0 19 /r | avx | Broadcast double-precision floating-point element in mem to four locations in ymm1. | 
| VBROADCASTF128 ymm1, m128 | VEX.256.66.0F38.W0 1A /r | avx | Broadcast 128 bits of floating-point data in mem to low and high 128-bits in ymm1. | 
| VBROADCASTSD ymm1 {k1}{z}, xmm2/m64 | EVEX.256.66.0F38.W1 19 /r | avx512 | Broadcast low double-precision floating-point element in xmm2/m64 to four locations in ymm1 using writemask k1. | 
| VBROADCASTSD zmm1 {k1}{z}, xmm2/m64 | EVEX.512.66.0F38.W1 19 /r | avx512 | Broadcast low double-precision floating-point element in xmm2/m64 to eight locations in zmm1 using writemask k1. | 
| VBROADCASTF32X2 ymm1 {k1}{z}, xmm2/m64 | EVEX.256.66.0F38.W0 19 /r | avx512 | Broadcast two single-precision floating-point elements in xmm2/m64 to locations in ymm1 using writemask k1. | 
| VBROADCASTF32X2 zmm1 {k1}{z}, xmm2/m64 | EVEX.512.66.0F38.W0 19 /r | avx512 | Broadcast two single-precision floating-point elements in xmm2/m64 to locations in zmm1 using writemask k1. | 
| VBROADCASTSS xmm1 {k1}{z}, xmm2/m32 | EVEX.128.66.0F38.W0 18 /r | avx512 | Broadcast low single-precision floating-point element in xmm2/m32 to all locations in xmm1 using writemask k1. | 
| VBROADCASTSS ymm1 {k1}{z}, xmm2/m32 | EVEX.256.66.0F38.W0 18 /r | avx512 | Broadcast low single-precision floating-point element in xmm2/m32 to all locations in ymm1 using writemask k1. | 
| VBROADCASTSS zmm1 {k1}{z}, xmm2/m32 | EVEX.512.66.0F38.W0 18 /r | avx512 | Broadcast low single-precision floating-point element in xmm2/m32 to all locations in zmm1 using writemask k1. | 
| VBROADCASTF32X4 ymm1 {k1}{z}, m128 | EVEX.256.66.0F38.W0 1A /r | avx512 | Broadcast 128 bits of 4 single-precision floating-point data in mem to locations in ymm1 using writemask k1. | 
| VBROADCASTF32X4 zmm1 {k1}{z}, m128 | EVEX.512.66.0F38.W0 1A /r | avx512 | Broadcast 128 bits of 4 single-precision floating-point data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTF64X2 ymm1 {k1}{z}, m128 | EVEX.256.66.0F38.W1 1A /r | avx512 | Broadcast 128 bits of 2 double-precision floating-point data in mem to locations in ymm1 using writemask k1. | 
| VBROADCASTF64X2 zmm1 {k1}{z}, m128 | EVEX.512.66.0F38.W1 1A /r | avx512 | Broadcast 128 bits of 2 double-precision floating-point data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTF32X8 zmm1 {k1}{z}, m256 | EVEX.512.66.0F38.W0 1B /r | avx512 | Broadcast 256 bits of 8 single-precision floating-point data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTF64X4 zmm1 {k1}{z}, m256 | EVEX.512.66.0F38.W1 1B /r | avx512 | Broadcast 256 bits of 4 double-precision floating-point data in mem to locations in zmm1 using writemask k1. | 
| VCOMPRESSPD xmm1/m128 {k1}{z}, xmm2 | EVEX.128.66.0F38.W1 8A /r | avx512 | Compress packed double-precision floating-point values from xmm2 to xmm1/m128 using writemask k1. | 
| VCOMPRESSPD ymm1/m256 {k1}{z}, ymm2 | EVEX.256.66.0F38.W1 8A /r | avx512 | Compress packed double-precision floating-point values from ymm2 to ymm1/m256 using writemask k1. | 
| VCOMPRESSPD zmm1/m512 {k1}{z}, zmm2 | EVEX.512.66.0F38.W1 8A /r | avx512 | Compress packed double-precision floating-point values from zmm2 using control mask k1 to zmm1/m512. | 
| VCOMPRESSPS xmm1/m128 {k1}{z}, xmm2 | EVEX.128.66.0F38.W0 8A /r | avx512 | Compress packed single-precision floating-point values from xmm2 to xmm1/m128 using writemask k1. | 
| VCOMPRESSPS ymm1/m256 {k1}{z}, ymm2 | EVEX.256.66.0F38.W0 8A /r | avx512 | Compress packed single-precision floating-point values from ymm2 to ymm1/m256 using writemask k1. | 
| VCOMPRESSPS zmm1/m512 {k1}{z}, zmm2 | EVEX.512.66.0F38.W0 8A /r | avx512 | Compress packed single-precision floating-point values from zmm2 using control mask k1 to zmm1/m512. | 
| VCVTPD2QQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 7B /r | avx512 | Convert two packed double-precision floating-point values from xmm2/m128/m64bcst to two packed quadword integers in xmm1 with writemask k1. | 
| VCVTPD2QQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 7B /r | avx512 | Convert four packed double-precision floating-point values from ymm2/m256/m64bcst to four packed quadword integers in ymm1 with writemask k1. | 
| VCVTPD2QQ zmm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.66.0F.W1 7B /r | avx512 | Convert eight packed double-precision floating-point values from zmm2/m512/m64bcst to eight packed quadword integers in zmm1 with writemask k1. | 
| VCVTPD2UDQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.0F.W1 79 /r | avx512 | Convert two packed double-precision floating-point values in xmm2/m128/m64bcst to two unsigned doubleword integers in xmm1 subject to writemask k1. | 
| VCVTPD2UDQ xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.0F.W1 79 /r | avx512 | Convert four packed double-precision floating-point values in ymm2/m256/m64bcst to four unsigned doubleword integers in xmm1 subject to writemask k1. | 
| VCVTPD2UDQ ymm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.0F.W1 79 /r | avx512 | Convert eight packed double-precision floating-point values in zmm2/m512/m64bcst to eight unsigned doubleword integers in ymm1 subject to writemask k1. | 
| VCVTPD2UQQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 79 /r | avx512 | Convert two packed double-precision floating-point values from xmm2/mem to two packed unsigned quadword integers in xmm1 with writemask k1. | 
| VCVTPD2UQQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 79 /r | avx512 | Convert fourth packed double-precision floating-point values from ymm2/mem to four packed unsigned quadword integers in ymm1 with writemask k1. | 
| VCVTPD2UQQ zmm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.66.0F.W1 79 /r | avx512 | Convert eight packed double-precision floating-point values from zmm2/mem to eight packed unsigned quadword integers in zmm1 with writemask k1. | 
| VCVTPH2PS xmm1, xmm2/m64 | VEX.128.66.0F38.W0 1313 /r | f16c | Convert four packed half precision (16-bit) floating-point values in xmm2/m64 to packed single-precision floating-point value in xmm1. | 
| VCVTPH2PS ymm1, xmm2/m128 | VEX.256.66.0F38.W0 1313 /r | f16c | Convert eight packed half precision (16-bit) floating-point values in xmm2/m128 to packed single-precision floating-point value in ymm1. | 
| VCVTPH2PS xmm1 {k1}{z}, xmm2/m64 | EVEX.128.66.0F38.W0 1313 /r | avx512 | Convert four packed half precision (16-bit) floating-point values in xmm2/m64 to packed single-precision floating-point values in xmm1. | 
| VCVTPH2PS ymm1 {k1}{z}, xmm2/m128 | EVEX.256.66.0F38.W0 1313 /r | avx512 | Convert eight packed half precision (16-bit) floating-point values in xmm2/m128 to packed single-precision floating-point values in ymm1. | 
| VCVTPH2PS zmm1 {k1}{z}, ymm2/m256 {sae} | EVEX.512.66.0F38.W0 1313 /r | avx512 | Convert sixteen packed half precision (16-bit) floating-point values in ymm2/m256 to packed single-precision floating-point values in zmm1. | 
| VCVTPS2PH xmm1/m64, xmm2, imm8 | VEX.128.66.0F3A.W0 1D 1D/r ib | f16c | Convert four packed single-precision floating-point values in xmm2 to packed half-precision (16-bit) floating-point values in xmm1/m64. Imm8 provides rounding controls. | 
| VCVTPS2PH xmm1/m128, ymm2, imm8 | VEX.256.66.0F3A.W0 1D1D /r ib | f16c | Convert eight packed single-precision floating-point values in ymm2 to packed half-precision (16-bit) floating-point values in xmm1/m128. Imm8 provides rounding controls. | 
| VCVTPS2PH xmm1/m64 {k1}{z}, xmm2, imm8 | EVEX.128.66.0F3A.W0 1D1D /r ib | avx512 | Convert four packed single-precision floating-point values in xmm2 to packed half-precision (16-bit) floating-point values in xmm1/m64. Imm8 provides rounding controls. | 
| VCVTPS2PH xmm1/m128 {k1}{z}, ymm2, imm8 | EVEX.256.66.0F3A.W0 1D1D /r ib | avx512 | Convert eight packed single-precision floating-point values in ymm2 to packed half-precision (16-bit) floating-point values in xmm1/m128. Imm8 provides rounding controls. | 
| VCVTPS2PH ymm1/m256 {k1}{z}, zmm2{sae}, imm8 | EVEX.512.66.0F3A.W0 1D1D /r ib | avx512 | Convert sixteen packed single-precision floating-point values in zmm2 to packed half-precision (16-bit) floating-point values in ymm1/m256. Imm8 provides rounding controls. | 
| VCVTPS2QQ xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.66.0F.W0 7B /r | avx512 | Convert two packed single precision floating-point values from xmm2/m64/m32bcst to two packed signed quadword values in xmm1 subject to writemask k1. | 
| VCVTPS2QQ ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.66.0F.W0 7B /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed signed quadword values in ymm1 subject to writemask k1. | 
| VCVTPS2QQ zmm1 {k1}{z}, ymm2/m256/m32bcst{er} | EVEX.512.66.0F.W0 7B /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed signed quadword values in zmm1 subject to writemask k1. | 
| VCVTPS2UDQ xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.0F.W0 79 /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed unsigned doubleword values in xmm1 subject to writemask k1. | 
| VCVTPS2UDQ ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.0F.W0 79 /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed unsigned doubleword values in ymm1 subject to writemask k1. | 
| VCVTPS2UDQ zmm1 {k1}{z}, zmm2/m512/m32bcst{er} | EVEX.512.0F.W0 79 /r | avx512 | Convert sixteen packed single-precision floating-point values from zmm2/m512/m32bcst to sixteen packed unsigned doubleword values in zmm1 subject to writemask k1. | 
| VCVTPS2UQQ xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.66.0F.W0 79 /r | avx512 | Convert two packed single precision floating-point values from zmm2/m64/m32bcst to two packed unsigned quadword values in zmm1 subject to writemask k1. | 
| VCVTPS2UQQ ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.66.0F.W0 79 /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed unsigned quadword values in ymm1 subject to writemask k1. | 
| VCVTPS2UQQ zmm1 {k1}{z}, ymm2/m256/m32bcst{er} | EVEX.512.66.0F.W0 79 /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed unsigned quadword values in zmm1 subject to writemask k1. | 
| VCVTQQ2PD xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.F3.0F.W1 E6 /r | avx512 | Convert two packed quadword integers from xmm2/m128/m64bcst to packed double-precision floating-point values in xmm1 with writemask k1. | 
| VCVTQQ2PD ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.F3.0F.W1 E6 /r | avx512 | Convert four packed quadword integers from ymm2/m256/m64bcst to packed double-precision floating-point values in ymm1 with writemask k1. | 
| VCVTQQ2PD zmm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.F3.0F.W1 E6 /r | avx512 | Convert eight packed quadword integers from zmm2/m512/m64bcst to eight packed double-precision floating-point values in zmm1 with writemask k1. | 
| VCVTQQ2PS xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.0F.W1 5B /r | avx512 | Convert two packed quadword integers from xmm2/mem to packed single-precision floating-point values in xmm1 with writemask k1. | 
| VCVTQQ2PS xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.0F.W1 5B /r | avx512 | Convert four packed quadword integers from ymm2/mem to packed single-precision floating-point values in xmm1 with writemask k1. | 
| VCVTQQ2PS ymm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.0F.W1 5B /r | avx512 | Convert eight packed quadword integers from zmm2/mem to eight packed single-precision floating-point values in ymm1 with writemask k1. | 
| VCVTSD2USI r32, xmm1/m64{er} | EVEX.LIG.F2.0F.W0 79 /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one unsigned doubleword integer r32. | 
| VCVTSD2USI r64, xmm1/m64{er} | EVEX.LIG.F2.0F.W1 79 /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one unsigned quadword integer zero-extended into r64. | 
| VCVTSS2USI r32, xmm1/m32{er} | EVEX.LIG.F3.0F.W0 79 /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one unsigned doubleword integer in r32. | 
| VCVTSS2USI r64, xmm1/m32{er} | EVEX.LIG.F3.0F.W1 79 /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one unsigned quadword integer in r64. | 
| VCVTTPD2QQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 7A /r | avx512 | Convert two packed double-precision floating-point values from zmm2/m128/m64bcst to two packed quadword integers in zmm1 using truncation with writemask k1. | 
| VCVTTPD2QQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 7A /r | avx512 | Convert four packed double-precision floating-point values from ymm2/m256/m64bcst to four packed quadword integers in ymm1 using truncation with writemask k1. | 
| VCVTTPD2QQ zmm1 {k1}{z}, zmm2/m512/m64bcst{sae} | EVEX.512.66.0F.W1 7A /r | avx512 | Convert eight packed double-precision floating-point values from zmm2/m512 to eight packed quadword integers in zmm1 using truncation with writemask k1. | 
| VCVTTPD2UDQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.0F.W1 78 /r | avx512 | Convert two packed double-precision floating-point values in xmm2/m128/m64bcst to two unsigned doubleword integers in xmm1 using truncation subject to writemask k1. | 
| VCVTTPD2UDQ xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.0F.W1 78 02 /r | avx512 | Convert four packed double-precision floating-point values in ymm2/m256/m64bcst to four unsigned doubleword integers in xmm1 using truncation subject to writemask k1. | 
| VCVTTPD2UDQ ymm1 {k1}{z}, zmm2/m512/m64bcst{sae} | EVEX.512.0F.W1 78 /r | avx512 | Convert eight packed double-precision floating-point values in zmm2/m512/m64bcst to eight unsigned doubleword integers in ymm1 using truncation subject to writemask k1. | 
| VCVTTPD2UQQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F.W1 78 /r | avx512 | Convert two packed double-precision floating-point values from xmm2/m128/m64bcst to two packed unsigned quadword integers in xmm1 using truncation with writemask k1. | 
| VCVTTPD2UQQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F.W1 78 /r | avx512 | Convert four packed double-precision floating-point values from ymm2/m256/m64bcst to four packed unsigned quadword integers in ymm1 using truncation with writemask k1. | 
| VCVTTPD2UQQ zmm1 {k1}{z}, zmm2/m512/m64bcst{sae} | EVEX.512.66.0F.W1 78 /r | avx512 | Convert eight packed double-precision floating-point values from zmm2/mem to eight packed unsigned quadword integers in zmm1 using truncation with writemask k1. | 
| VCVTTPS2QQ xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.66.0F.W0 7A /r | avx512 | Convert two packed single precision floating-point values from xmm2/m64/m32bcst to two packed signed quadword values in xmm1 using truncation subject to writemask k1. | 
| VCVTTPS2QQ ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.66.0F.W0 7A /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed signed quadword values in ymm1 using truncation subject to writemask k1. | 
| VCVTTPS2QQ zmm1 {k1}{z}, ymm2/m256/m32bcst{sae} | EVEX.512.66.0F.W0 7A /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed signed quadword values in zmm1 using truncation subject to writemask k1. | 
| VCVTTPS2UDQ xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.0F.W0 78 /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed unsigned doubleword values in xmm1 using truncation subject to writemask k1. | 
| VCVTTPS2UDQ ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.0F.W0 78 /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed unsigned doubleword values in ymm1 using truncation subject to writemask k1. | 
| VCVTTPS2UDQ zmm1 {k1}{z}, zmm2/m512/m32bcst{sae} | EVEX.512.0F.W0 78 /r | avx512 | Convert sixteen packed single-precision floating-point values from zmm2/m512/m32bcst to sixteen packed unsigned doubleword values in zmm1 using truncation subject to writemask k1. | 
| VCVTTPS2UQQ xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.66.0F.W0 78 /r | avx512 | Convert two packed single precision floating-point values from xmm2/m64/m32bcst to two packed unsigned quadword values in xmm1 using truncation subject to writemask k1. | 
| VCVTTPS2UQQ ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.66.0F.W0 78 /r | avx512 | Convert four packed single precision floating-point values from xmm2/m128/m32bcst to four packed unsigned quadword values in ymm1 using truncation subject to writemask k1. | 
| VCVTTPS2UQQ zmm1 {k1}{z}, ymm2/m256/m32bcst{sae} | EVEX.512.66.0F.W0 78 /r | avx512 | Convert eight packed single precision floating-point values from ymm2/m256/m32bcst to eight packed unsigned quadword values in zmm1 using truncation subject to writemask k1. | 
| VCVTTSD2USI r32, xmm1/m64{sae} | EVEX.LIG.F2.0F.W0 78 /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one unsigned doubleword integer r32 using truncation. | 
| VCVTTSD2USI r64, xmm1/m64{sae} | EVEX.LIG.F2.0F.W1 78 /r | avx512 | Convert one double-precision floating-point value from xmm1/m64 to one unsigned quadword integer zero-extended into r64 using truncation. | 
| VCVTTSS2USI r32, xmm1/m32{sae} | EVEX.LIG.F3.0F.W0 78 /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one unsigned doubleword integer in r32 using truncation. | 
| VCVTTSS2USI r64, xmm1/m32{sae} | EVEX.LIG.F3.0F.W1 78 /r | avx512 | Convert one single-precision floating-point value from xmm1/m32 to one unsigned quadword integer in r64 using truncation. | 
| VCVTUDQ2PD xmm1 {k1}{z}, xmm2/m64/m32bcst | EVEX.128.F3.0F.W0 7A /r | avx512 | Convert two packed unsigned doubleword integers from ymm2/m64/m32bcst to packed double-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUDQ2PD ymm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.256.F3.0F.W0 7A /r | avx512 | Convert four packed unsigned doubleword integers from xmm2/m128/m32bcst to packed double-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUDQ2PD zmm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.512.F3.0F.W0 7A /r | avx512 | Convert eight packed unsigned doubleword integers from ymm2/m256/m32bcst to eight packed double-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUDQ2PS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.F2.0F.W0 7A /r | avx512 | Convert four packed unsigned doubleword integers from xmm2/m128/m32bcst to packed single-precision floating-point values in xmm1 with writemask k1. | 
| VCVTUDQ2PS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.F2.0F.W0 7A /r | avx512 | Convert eight packed unsigned doubleword integers from ymm2/m256/m32bcst to packed single-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUDQ2PS zmm1 {k1}{z}, zmm2/m512/m32bcst{er} | EVEX.512.F2.0F.W0 7A /r | avx512 | Convert sixteen packed unsigned doubleword integers from zmm2/m512/m32bcst to sixteen packed single-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUQQ2PD xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.F3.0F.W1 7A /r | avx512 | Convert two packed unsigned quadword integers from xmm2/m128/m64bcst to two packed double-precision floating-point values in xmm1 with writemask k1. | 
| VCVTUQQ2PD ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.F3.0F.W1 7A /r | avx512 | Convert four packed unsigned quadword integers from ymm2/m256/m64bcst to packed double-precision floating-point values in ymm1 with writemask k1. | 
| VCVTUQQ2PD zmm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.F3.0F.W1 7A /r | avx512 | Convert eight packed unsigned quadword integers from zmm2/m512/m64bcst to eight packed double-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUQQ2PS xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.F2.0F.W1 7A /r | avx512 | Convert two packed unsigned quadword integers from xmm2/m128/m64bcst to packed single-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUQQ2PS xmm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.F2.0F.W1 7A /r | avx512 | Convert four packed unsigned quadword integers from ymm2/m256/m64bcst to packed single-precision floating-point values in xmm1 with writemask k1. | 
| VCVTUQQ2PS ymm1 {k1}{z}, zmm2/m512/m64bcst{er} | EVEX.512.F2.0F.W1 7A /r | avx512 | Convert eight packed unsigned quadword integers from zmm2/m512/m64bcst to eight packed single-precision floating-point values in zmm1 with writemask k1. | 
| VCVTUSI2SD xmm1, xmm2, r/m32 | EVEX.NDS.LIG.F2.0F.W0 7B /r | avx512 | Convert one unsigned doubleword integer from r/m32 to one double-precision floating-point value in xmm1. | 
| VCVTUSI2SD xmm1, xmm2, r/m64{er} | EVEX.NDS.LIG.F2.0F.W1 7B /r | avx512 | Convert one unsigned quadword integer from r/m64 to one double-precision floating-point value in xmm1. | 
| VCVTUSI2SS xmm1, xmm2, r/m32{er} | EVEX.NDS.LIG.F3.0F.W0 7B /r | avx512 | Convert one signed doubleword integer from r/m32 to one single-precision floating-point value in xmm1. | 
| VCVTUSI2SS xmm1, xmm2, r/m64{er} | EVEX.NDS.LIG.F3.0F.W1 7B /r | avx512 | Convert one signed quadword integer from r/m64 to one single-precision floating-point value in xmm1. | 
| VDBPSADBW xmm1 {k1}{z}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.W0 42 /r ib | avx512 | Compute packed SAD word results of unsigned bytes in dword block from xmm2 with unsigned bytes of dword blocks transformed from xmm3/m128 using the shuffle controls in imm8. Results are written to xmm1 under the writemask k1. | 
| VDBPSADBW ymm1 {k1}{z}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.W0 42 /r ib | avx512 | Compute packed SAD word results of unsigned bytes in dword block from ymm2 with unsigned bytes of dword blocks transformed from ymm3/m256 using the shuffle controls in imm8. Results are written to ymm1 under the writemask k1. | 
| VDBPSADBW zmm1 {k1}{z}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.W0 42 /r ib | avx512 | Compute packed SAD word results of unsigned bytes in dword block from zmm2 with unsigned bytes of dword blocks transformed from zmm3/m512 using the shuffle controls in imm8. Results are written to zmm1 under the writemask k1. | 
| VERR r/m16 | 0F 00 /4 | Set ZF=1 if segment specified with r/m16 can be read. | |
| VERW r/m16 | 0F 00 /5 | Set ZF=1 if segment specified with r/m16 can be written. | |
| VEXP2PD zmm1 {k1}{z}, zmm2/m512/m64bcst {sae} | EVEX.512.66.0F38.W1 C8 /r | avx512 | Computes approximations to the exponential 2^x (with less than 2^-23 of maximum relative error) of the packed double-precision floating-point values from zmm2/m512/m64bcst and stores the floating-point result in zmm1with writemask k1. | 
| VEXP2PS zmm1 {k1}{z}, zmm2/m512/m32bcst {sae} | EVEX.512.66.0F38.W0 C8 /r | avx512 | Computes approximations to the exponential 2^x (with less than 2^-23 of maximum relative error) of the packed single-precision floating-point values from zmm2/m512/m32bcst and stores the floating-point result in zmm1with writemask k1. | 
| VEXPANDPD xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.W1 88 /r | avx512 | Expand packed double-precision floating-point values from xmm2/m128 to xmm1 using writemask k1. | 
| VEXPANDPD ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F38.W1 88 /r | avx512 | Expand packed double-precision floating-point values from ymm2/m256 to ymm1 using writemask k1. | 
| VEXPANDPD zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F38.W1 88 /r | avx512 | Expand packed double-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VEXPANDPS xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.W0 88 /r | avx512 | Expand packed single-precision floating-point values from xmm2/m128 to xmm1 using writemask k1. | 
| VEXPANDPS ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F38.W0 88 /r | avx512 | Expand packed single-precision floating-point values from ymm2/m256 to ymm1 using writemask k1. | 
| VEXPANDPS zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F38.W0 88 /r | avx512 | Expand packed single-precision floating-point values from zmm2/m512 to zmm1 using writemask k1. | 
| VEXTRACTF128 xmm1/m128, ymm2, imm8 | VEX.256.66.0F3A.W0 19 /r ib | avx | Extract 128 bits of packed floating-point values from ymm2 and store results in xmm1/m128. | 
| VEXTRACTF32X4 xmm1/m128 {k1}{z}, ymm2, imm8 | EVEX.256.66.0F3A.W0 19 /r ib | avx512 | Extract 128 bits of packed single-precision floating-point values from ymm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTF32x4 xmm1/m128 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W0 19 /r ib | avx512 | Extract 128 bits of packed single-precision floating-point values from zmm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTF64X2 xmm1/m128 {k1}{z}, ymm2, imm8 | EVEX.256.66.0F3A.W1 19 /r ib | avx512 | Extract 128 bits of packed double-precision floating-point values from ymm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTF64X2 xmm1/m128 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W1 19 /r ib | avx512 | Extract 128 bits of packed double-precision floating-point values from zmm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTF32X8 ymm1/m256 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W0 1B /r ib | avx512 | Extract 256 bits of packed single-precision floating-point values from zmm2 and store results in ymm1/m256 subject to writemask k1. | 
| VEXTRACTF64x4 ymm1/m256 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W1 1B /r ib | avx512 | Extract 256 bits of packed double-precision floating-point values from zmm2 and store results in ymm1/m256 subject to writemask k1. | 
| VEXTRACTI128 xmm1/m128, ymm2, imm8 | VEX.256.66.0F3A.W0 39 /r ib | avx2 | Extract 128 bits of integer data from ymm2 and store results in xmm1/m128. | 
| VEXTRACTI32X4 xmm1/m128 {k1}{z}, ymm2, imm8 | EVEX.256.66.0F3A.W0 39 /r ib | avx512 | Extract 128 bits of double-word integer values from ymm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTI32x4 xmm1/m128 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W0 39 /r ib | avx512 | Extract 128 bits of double-word integer values from zmm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTI64X2 xmm1/m128 {k1}{z}, ymm2, imm8 | EVEX.256.66.0F3A.W1 39 /r ib | avx512 | Extract 128 bits of quad-word integer values from ymm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTI64X2 xmm1/m128 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W1 39 /r ib | avx512 | Extract 128 bits of quad-word integer values from zmm2 and store results in xmm1/m128 subject to writemask k1. | 
| VEXTRACTI32X8 ymm1/m256 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W0 3B /r ib | avx512 | Extract 256 bits of double-word integer values from zmm2 and store results in ymm1/m256 subject to writemask k1. | 
| VEXTRACTI64x4 ymm1/m256 {k1}{z}, zmm2, imm8 | EVEX.512.66.0F3A.W1 3B /r ib | avx512 | Extract 256 bits of quad-word integer values from zmm2 and store results in ymm1/m256 subject to writemask k1. | 
| VFIXUPIMMPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F3A.W1 54 /r ib | avx512 | Fix up special numbers in float64 vector xmm1, float64 vector xmm2 and int64 vector xmm3/m128/m64bcst and store the result in xmm1, under writemask. | 
| VFIXUPIMMPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 54 /r ib | avx512 | Fix up special numbers in float64 vector ymm1, float64 vector ymm2 and int64 vector ymm3/m256/m64bcst and store the result in ymm1, under writemask. | 
| VFIXUPIMMPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae}, imm8 | EVEX.NDS.512.66.0F3A.W1 54 /r ib | avx512 | Fix up elements of float64 vector in zmm2 using int64 vector table in zmm3/m512/m64bcst, combine with preserved elements from zmm1, and store the result in zmm1. | 
| VFIXUPIMMPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.66.0F3A.W0 54 /r | avx512 | Fix up special numbers in float32 vector xmm1, float32 vector xmm2 and int32 vector xmm3/m128/m32bcst and store the result in xmm1, under writemask. | 
| VFIXUPIMMPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 54 /r | avx512 | Fix up special numbers in float32 vector ymm1, float32 vector ymm2 and int32 vector ymm3/m256/m32bcst and store the result in ymm1, under writemask. | 
| VFIXUPIMMPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{sae}, imm8 | EVEX.NDS.512.66.0F3A.W0 54 /r ib | avx512 | Fix up elements of float32 vector in zmm2 using int32 vector table in zmm3/m512/m32bcst, combine with preserved elements from zmm1, and store the result in zmm1. | 
| VFIXUPIMMSD xmm1 {k1}{z}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W1 55 /r ib | avx512 | Fix up a float64 number in the low quadword element of xmm2 using scalar int32 table in xmm3/m64 and store the result in xmm1. | 
| VFIXUPIMMSS xmm1 {k1}{z}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W0 55 /r ib | avx512 | Fix up a float32 number in the low doubleword element in xmm2 using scalar int32 table in xmm3/m32 and store the result in xmm1. | 
| VFMADD132PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 98 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, add to xmm2 and put result in xmm1. | 
| VFMADD213PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 A8 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, add to xmm3/mem and put result in xmm1. | 
| VFMADD231PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 B8 /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, add to xmm1 and put result in xmm1. | 
| VFMADD132PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 98 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, add to ymm2 and put result in ymm1. | 
| VFMADD213PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 A8 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, add to ymm3/mem and put result in ymm1. | 
| VFMADD231PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 B8 /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, add to ymm1 and put result in ymm1. | 
| VFMADD132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 98 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, add to xmm2 and put result in xmm1. | 
| VFMADD213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 A8 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, add to xmm3/m128/m64bcst and put result in xmm1. | 
| VFMADD231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 B8 /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, add to xmm1 and put result in xmm1. | 
| VFMADD132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 98 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, add to ymm2 and put result in ymm1. | 
| VFMADD213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 A8 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, add to ymm3/m256/m64bcst and put result in ymm1. | 
| VFMADD231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 B8 /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, add to ymm1 and put result in ymm1. | 
| VFMADD132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 98 /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, add to zmm2 and put result in zmm1. | 
| VFMADD213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 A8 /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm2, add to zmm3/m512/m64bcst and put result in zmm1. | 
| VFMADD231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 B8 /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, add to zmm1 and put result in zmm1. | 
| VFMADD132PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 98 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, add to xmm2 and put result in xmm1. | 
| VFMADD213PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 A8 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, add to xmm3/mem and put result in xmm1. | 
| VFMADD231PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 B8 /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, add to xmm1 and put result in xmm1. | 
| VFMADD132PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 98 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, add to ymm2 and put result in ymm1. | 
| VFMADD213PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 A8 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, add to ymm3/mem and put result in ymm1. | 
| VFMADD231PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.0 B8 /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, add to ymm1 and put result in ymm1. | 
| VFMADD132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 98 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, add to xmm2 and put result in xmm1. | 
| VFMADD213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 A8 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, add to xmm3/m128/m32bcst and put result in xmm1. | 
| VFMADD231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 B8 /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, add to xmm1 and put result in xmm1. | 
| VFMADD132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 98 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, add to ymm2 and put result in ymm1. | 
| VFMADD213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 A8 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, add to ymm3/m256/m32bcst and put result in ymm1. | 
| VFMADD231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 B8 /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, add to ymm1 and put result in ymm1. | 
| VFMADD132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 98 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, add to zmm2 and put result in zmm1. | 
| VFMADD213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 A8 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, add to zmm3/m512/m32bcst and put result in zmm1. | 
| VFMADD231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 B8 /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, add to zmm1 and put result in zmm1. | 
| VFMADD132SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 99 /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, add to xmm2 and put result in xmm1. | 
| VFMADD213SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 A9 /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm2, add to xmm3/m64 and put result in xmm1. | 
| VFMADD231SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 B9 /r | fma | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, add to xmm1 and put result in xmm1. | 
| VFMADD132SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 99 /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, add to xmm2 and put result in xmm1. | 
| VFMADD213SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 A9 /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm2, add to xmm3/m64 and put result in xmm1. | 
| VFMADD231SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 B9 /r | avx512 | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, add to xmm1 and put result in xmm1. | 
| VFMADD132SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 99 /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, add to xmm2 and put result in xmm1. | 
| VFMADD213SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 A9 /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm2, add to xmm3/m32 and put result in xmm1. | 
| VFMADD231SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 B9 /r | fma | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, add to xmm1 and put result in xmm1. | 
| VFMADD132SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 99 /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, add to xmm2 and put result in xmm1. | 
| VFMADD213SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 A9 /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm2, add to xmm3/m32 and put result in xmm1. | 
| VFMADD231SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 B9 /r | avx512 | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, add to xmm1 and put result in xmm1. | 
| VFMADDSUB132PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 96 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, add/subtract elements in xmm2 and put result in xmm1. | 
| VFMADDSUB213PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 A6 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, add/subtract elements in xmm3/mem and put result in xmm1. | 
| VFMADDSUB231PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 B6 /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, add/subtract elements in xmm1 and put result in xmm1. | 
| VFMADDSUB132PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 96 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, add/subtract elements in ymm2 and put result in ymm1. | 
| VFMADDSUB213PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 A6 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, add/subtract elements in ymm3/mem and put result in ymm1. | 
| VFMADDSUB231PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 B6 /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, add/subtract elements in ymm1 and put result in ymm1. | 
| VFMADDSUB213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 A6 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, add/subtract elements in xmm3/m128/m64bcst and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 B6 /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, add/subtract elements in xmm1 and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 96 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, add/subtract elements in xmm2 and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 A6 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, add/subtract elements in ymm3/m256/m64bcst and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 B6 /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, add/subtract elements in ymm1 and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 96 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, add/subtract elements in ymm2 and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 A6 /r | avx512 | Multiply packed double-precision floating-point values from zmm1and zmm2, add/subtract elements in zmm3/m512/m64bcst and put result in zmm1 subject to writemask k1. | 
| VFMADDSUB231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 B6 /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, add/subtract elements in zmm1 and put result in zmm1 subject to writemask k1. | 
| VFMADDSUB132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 96 /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, add/subtract elements in zmm2 and put result in zmm1 subject to writemask k1. | 
| VFMADDSUB132PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 96 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, add/subtract elements in xmm2 and put result in xmm1. | 
| VFMADDSUB213PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 A6 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, add/subtract elements in xmm3/mem and put result in xmm1. | 
| VFMADDSUB231PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 B6 /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, add/subtract elements in xmm1 and put result in xmm1. | 
| VFMADDSUB132PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 96 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, add/subtract elements in ymm2 and put result in ymm1. | 
| VFMADDSUB213PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 A6 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, add/subtract elements in ymm3/mem and put result in ymm1. | 
| VFMADDSUB231PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 B6 /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, add/subtract elements in ymm1 and put result in ymm1. | 
| VFMADDSUB213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 A6 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, add/subtract elements in xmm3/m128/m32bcst and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 B6 /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, add/subtract elements in xmm1 and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 96 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, add/subtract elements in zmm2 and put result in xmm1 subject to writemask k1. | 
| VFMADDSUB213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 A6 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, add/subtract elements in ymm3/m256/m32bcst and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 B6 /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, add/subtract elements in ymm1 and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 96 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, add/subtract elements in ymm2 and put result in ymm1 subject to writemask k1. | 
| VFMADDSUB213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 A6 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, add/subtract elements in zmm3/m512/m32bcst and put result in zmm1 subject to writemask k1. | 
| VFMADDSUB231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 B6 /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, add/subtract elements in zmm1 and put result in zmm1 subject to writemask k1. | 
| VFMADDSUB132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 96 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, add/subtract elements in zmm2 and put result in zmm1 subject to writemask k1. | 
| VFMSUB132PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 9A /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, subtract xmm2 and put result in xmm1. | 
| VFMSUB213PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 AA /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, subtract xmm3/mem and put result in xmm1. | 
| VFMSUB231PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 BA /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, subtract xmm1 and put result in xmm1. | 
| VFMSUB132PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 9A /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, subtract ymm2 and put result in ymm1. | 
| VFMSUB213PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 AA /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, subtract ymm3/mem and put result in ymm1. | 
| VFMSUB231PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 BA /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, subtract ymm1 and put result in ymm1.S | 
| VFMSUB132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 9A /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, subtract xmm2 and put result in xmm1 subject to writemask k1. | 
| VFMSUB213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 AA /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, subtract xmm3/m128/m64bcst and put result in xmm1 subject to writemask k1. | 
| VFMSUB231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 BA /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, subtract xmm1 and put result in xmm1 subject to writemask k1. | 
| VFMSUB132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 9A /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, subtract ymm2 and put result in ymm1 subject to writemask k1. | 
| VFMSUB213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 AA /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, subtract ymm3/m256/m64bcst and put result in ymm1 subject to writemask k1. | 
| VFMSUB231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 BA /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, subtract ymm1 and put result in ymm1 subject to writemask k1. | 
| VFMSUB132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 9A /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, subtract zmm2 and put result in zmm1 subject to writemask k1. | 
| VFMSUB213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 AA /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm2, subtract zmm3/m512/m64bcst and put result in zmm1 subject to writemask k1. | 
| VFMSUB231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 BA /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, subtract zmm1 and put result in zmm1 subject to writemask k1. | 
| VFMSUB132PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 9A /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, subtract xmm2 and put result in xmm1. | 
| VFMSUB213PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 AA /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, subtract xmm3/mem and put result in xmm1. | 
| VFMSUB231PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 BA /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, subtract xmm1 and put result in xmm1. | 
| VFMSUB132PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 9A /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, subtract ymm2 and put result in ymm1. | 
| VFMSUB213PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 AA /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, subtract ymm3/mem and put result in ymm1. | 
| VFMSUB231PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.0 BA /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, subtract ymm1 and put result in ymm1. | 
| VFMSUB132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 9A /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, subtract xmm2 and put result in xmm1. | 
| VFMSUB213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 AA /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, subtract xmm3/m128/m32bcst and put result in xmm1. | 
| VFMSUB231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 BA /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, subtract xmm1 and put result in xmm1. | 
| VFMSUB132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 9A /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, subtract ymm2 and put result in ymm1. | 
| VFMSUB213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 AA /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, subtract ymm3/m256/m32bcst and put result in ymm1. | 
| VFMSUB231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 BA /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, subtract ymm1 and put result in ymm1. | 
| VFMSUB132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 9A /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, subtract zmm2 and put result in zmm1. | 
| VFMSUB213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 AA /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, subtract zmm3/m512/m32bcst and put result in zmm1. | 
| VFMSUB231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 BA /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, subtract zmm1 and put result in zmm1. | 
| VFMSUB132SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 9B /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, subtract xmm2 and put result in xmm1. | 
| VFMSUB213SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 AB /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm2, subtract xmm3/m64 and put result in xmm1. | 
| VFMSUB231SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 BB /r | fma | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, subtract xmm1 and put result in xmm1. | 
| VFMSUB132SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 9B /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, subtract xmm2 and put result in xmm1. | 
| VFMSUB213SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 AB /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm2, subtract xmm3/m64 and put result in xmm1. | 
| VFMSUB231SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 BB /r | avx512 | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, subtract xmm1 and put result in xmm1. | 
| VFMSUB132SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 9B /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, subtract xmm2 and put result in xmm1. | 
| VFMSUB213SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 AB /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm2, subtract xmm3/m32 and put result in xmm1. | 
| VFMSUB231SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 BB /r | fma | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, subtract xmm1 and put result in xmm1. | 
| VFMSUB132SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 9B /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, subtract xmm2 and put result in xmm1. | 
| VFMSUB213SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 AB /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm2, subtract xmm3/m32 and put result in xmm1. | 
| VFMSUB231SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 BB /r | avx512 | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, subtract xmm1 and put result in xmm1. | 
| VFMSUBADD132PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 97 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, subtract/add elements in xmm2 and put result in xmm1. | 
| VFMSUBADD213PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 A7 /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, subtract/add elements in xmm3/mem and put result in xmm1. | 
| VFMSUBADD231PD xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W1 B7 /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, subtract/add elements in xmm1 and put result in xmm1. | 
| VFMSUBADD132PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 97 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, subtract/add elements in ymm2 and put result in ymm1. | 
| VFMSUBADD213PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 A7 /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, subtract/add elements in ymm3/mem and put result in ymm1. | 
| VFMSUBADD231PD ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W1 B7 /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, subtract/add elements in ymm1 and put result in ymm1. | 
| VFMSUBADD132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 97 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, subtract/add elements in xmm2 and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 A7 /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, subtract/add elements in xmm3/m128/m64bcst and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 B7 /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, subtract/add elements in xmm1 and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 97 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, subtract/add elements in ymm2 and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 A7 /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, subtract/add elements in ymm3/m256/m64bcst and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 B7 /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, subtract/add elements in ymm1 and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 97 /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, subtract/add elements in zmm2 and put result in zmm1 subject to writemask k1. | 
| VFMSUBADD213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 A7 /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm2, subtract/add elements in zmm3/m512/m64bcst and put result in zmm1 subject to writemask k1. | 
| VFMSUBADD231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.DDS.512.66.0F38.W1 B7 /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, subtract/add elements in zmm1 and put result in zmm1 subject to writemask k1. | 
| VFMSUBADD132PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 97 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, subtract/add elements in xmm2 and put result in xmm1. | 
| VFMSUBADD213PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 A7 /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, subtract/add elements in xmm3/mem and put result in xmm1. | 
| VFMSUBADD231PS xmm1, xmm2, xmm3/m128 | VEX.DDS.128.66.0F38.W0 B7 /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, subtract/add elements in xmm1 and put result in xmm1. | 
| VFMSUBADD132PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 97 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, subtract/add elements in ymm2 and put result in ymm1. | 
| VFMSUBADD213PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 A7 /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, subtract/add elements in ymm3/mem and put result in ymm1. | 
| VFMSUBADD231PS ymm1, ymm2, ymm3/m256 | VEX.DDS.256.66.0F38.W0 B7 /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, subtract/add elements in ymm1 and put result in ymm1. | 
| VFMSUBADD132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 97 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, subtract/add elements in xmm2 and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 A7 /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, subtract/add elements in xmm3/m128/m32bcst and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 B7 /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, subtract/add elements in xmm1 and put result in xmm1 subject to writemask k1. | 
| VFMSUBADD132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 97 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, subtract/add elements in ymm2 and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 A7 /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, subtract/add elements in ymm3/m256/m32bcst and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 B7 /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, subtract/add elements in ymm1 and put result in ymm1 subject to writemask k1. | 
| VFMSUBADD132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 97 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, subtract/add elements in zmm2 and put result in zmm1 subject to writemask k1. | 
| VFMSUBADD213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 A7 /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, subtract/add elements in zmm3/m512/m32bcst and put result in zmm1 subject to writemask k1. | 
| VFMSUBADD231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.DDS.512.66.0F38.W0 B7 /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, subtract/add elements in zmm1 and put result in zmm1 subject to writemask k1. | 
| VFNMADD132PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 9C /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 AC /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, negate the multiplication result and add to xmm3/mem and put result in xmm1. | 
| VFNMADD231PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 BC /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 9C /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, negate the multiplication result and add to ymm2 and put result in ymm1. | 
| VFNMADD213PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 AC /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, negate the multiplication result and add to ymm3/mem and put result in ymm1. | 
| VFNMADD231PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 BC /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, negate the multiplication result and add to ymm1 and put result in ymm1. | 
| VFNMADD132PD xmm0 {k1}{z}, xmm1, xmm2/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 9C /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 AC /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, negate the multiplication result and add to xmm3/m128/m64bcst and put result in xmm1. | 
| VFNMADD231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 BC /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 9C /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, negate the multiplication result and add to ymm2 and put result in ymm1. | 
| VFNMADD213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 AC /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, negate the multiplication result and add to ymm3/m256/m64bcst and put result in ymm1. | 
| VFNMADD231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 BC /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, negate the multiplication result and add to ymm1 and put result in ymm1. | 
| VFNMADD132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 9C /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, negate the multiplication result and add to zmm2 and put result in zmm1. | 
| VFNMADD213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 AC /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm2, negate the multiplication result and add to zmm3/m512/m64bcst and put result in zmm1. | 
| VFNMADD231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 BC /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, negate the multiplication result and add to zmm1 and put result in zmm1. | 
| VFNMADD132PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 9C /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 AC /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, negate the multiplication result and add to xmm3/mem and put result in xmm1. | 
| VFNMADD231PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 BC /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 9C /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, negate the multiplication result and add to ymm2 and put result in ymm1. | 
| VFNMADD213PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 AC /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, negate the multiplication result and add to ymm3/mem and put result in ymm1. | 
| VFNMADD231PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.0 BC /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, negate the multiplication result and add to ymm1 and put result in ymm1. | 
| VFNMADD132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 9C /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 AC /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, negate the multiplication result and add to xmm3/m128/m32bcst and put result in xmm1. | 
| VFNMADD231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 BC /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 9C /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, negate the multiplication result and add to ymm2 and put result in ymm1. | 
| VFNMADD213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 AC /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, negate the multiplication result and add to ymm3/m256/m32bcst and put result in ymm1. | 
| VFNMADD231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 BC /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, negate the multiplication result and add to ymm1 and put result in ymm1. | 
| VFNMADD132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 9C /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, negate the multiplication result and add to zmm2 and put result in zmm1. | 
| VFNMADD213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 AC /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, negate the multiplication result and add to zmm3/m512/m32bcst and put result in zmm1. | 
| VFNMADD231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 BC /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, negate the multiplication result and add to zmm1 and put result in zmm1. | 
| VFNMADD132SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 9D /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm3/mem, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 AD /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm2, negate the multiplication result and add to xmm3/mem and put result in xmm1. | 
| VFNMADD231SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 BD /r | fma | Multiply scalar double-precision floating-point value from xmm2 and xmm3/mem, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 9D /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 AD /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm2, negate the multiplication result and add to xmm3/m64 and put result in xmm1. | 
| VFNMADD231SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 BD /r | avx512 | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 9D /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 AD /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm2, negate the multiplication result and add to xmm3/m32 and put result in xmm1. | 
| VFNMADD231SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 BD /r | fma | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMADD132SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 9D /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, negate the multiplication result and add to xmm2 and put result in xmm1. | 
| VFNMADD213SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 AD /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm2, negate the multiplication result and add to xmm3/m32 and put result in xmm1. | 
| VFNMADD231SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 BD /r | avx512 | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, negate the multiplication result and add to xmm1 and put result in xmm1. | 
| VFNMSUB132PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 9E /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm3/mem, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 AE /r | fma | Multiply packed double-precision floating-point values from xmm1 and xmm2, negate the multiplication result and subtract xmm3/mem and put result in xmm1. | 
| VFNMSUB231PD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 BE /r | fma | Multiply packed double-precision floating-point values from xmm2 and xmm3/mem, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 9E /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm3/mem, negate the multiplication result and subtract ymm2 and put result in ymm1. | 
| VFNMSUB213PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 AE /r | fma | Multiply packed double-precision floating-point values from ymm1 and ymm2, negate the multiplication result and subtract ymm3/mem and put result in ymm1. | 
| VFNMSUB231PD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 BE /r | fma | Multiply packed double-precision floating-point values from ymm2 and ymm3/mem, negate the multiplication result and subtract ymm1 and put result in ymm1. | 
| VFNMSUB132PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 9E /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm3/m128/m64bcst, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 AE /r | avx512 | Multiply packed double-precision floating-point values from xmm1 and xmm2, negate the multiplication result and subtract xmm3/m128/m64bcst and put result in xmm1. | 
| VFNMSUB231PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 BE /r | avx512 | Multiply packed double-precision floating-point values from xmm2 and xmm3/m128/m64bcst, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 9E /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm3/m256/m64bcst, negate the multiplication result and subtract ymm2 and put result in ymm1. | 
| VFNMSUB213PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 AE /r | avx512 | Multiply packed double-precision floating-point values from ymm1 and ymm2, negate the multiplication result and subtract ymm3/m256/m64bcst and put result in ymm1. | 
| VFNMSUB231PD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 BE /r | avx512 | Multiply packed double-precision floating-point values from ymm2 and ymm3/m256/m64bcst, negate the multiplication result and subtract ymm1 and put result in ymm1. | 
| VFNMSUB132PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 9E /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm3/m512/m64bcst, negate the multiplication result and subtract zmm2 and put result in zmm1. | 
| VFNMSUB213PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 AE /r | avx512 | Multiply packed double-precision floating-point values from zmm1 and zmm2, negate the multiplication result and subtract zmm3/m512/m64bcst and put result in zmm1. | 
| VFNMSUB231PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 BE /r | avx512 | Multiply packed double-precision floating-point values from zmm2 and zmm3/m512/m64bcst, negate the multiplication result and subtract zmm1 and put result in zmm1. | 
| VFNMSUB132PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 9E /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm3/mem, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 AE /r | fma | Multiply packed single-precision floating-point values from xmm1 and xmm2, negate the multiplication result and subtract xmm3/mem and put result in xmm1. | 
| VFNMSUB231PS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 BE /r | fma | Multiply packed single-precision floating-point values from xmm2 and xmm3/mem, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 9E /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm3/mem, negate the multiplication result and subtract ymm2 and put result in ymm1. | 
| VFNMSUB213PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 AE /r | fma | Multiply packed single-precision floating-point values from ymm1 and ymm2, negate the multiplication result and subtract ymm3/mem and put result in ymm1. | 
| VFNMSUB231PS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.0 BE /r | fma | Multiply packed single-precision floating-point values from ymm2 and ymm3/mem, negate the multiplication result and subtract ymm1 and put result in ymm1. | 
| VFNMSUB132PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 9E /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm3/m128/m32bcst, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 AE /r | avx512 | Multiply packed single-precision floating-point values from xmm1 and xmm2, negate the multiplication result and subtract xmm3/m128/m32bcst and put result in xmm1. | 
| VFNMSUB231PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 BE /r | avx512 | Multiply packed single-precision floating-point values from xmm2 and xmm3/m128/m32bcst, negate the multiplication result subtract add to xmm1 and put result in xmm1. | 
| VFNMSUB132PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 9E /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm3/m256/m32bcst, negate the multiplication result and subtract ymm2 and put result in ymm1. | 
| VFNMSUB213PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 AE /r | avx512 | Multiply packed single-precision floating-point values from ymm1 and ymm2, negate the multiplication result and subtract ymm3/m256/m32bcst and put result in ymm1. | 
| VFNMSUB231PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 BE /r | avx512 | Multiply packed single-precision floating-point values from ymm2 and ymm3/m256/m32bcst, negate the multiplication result subtract add to ymm1 and put result in ymm1. | 
| VFNMSUB132PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 9E /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm3/m512/m32bcst, negate the multiplication result and subtract zmm2 and put result in zmm1. | 
| VFNMSUB213PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 AE /r | avx512 | Multiply packed single-precision floating-point values from zmm1 and zmm2, negate the multiplication result and subtract zmm3/m512/m32bcst and put result in zmm1. | 
| VFNMSUB231PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 BE /r | avx512 | Multiply packed single-precision floating-point values from zmm2 and zmm3/m512/m32bcst, negate the multiplication result subtract add to zmm1 and put result in zmm1. | 
| VFNMSUB132SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 9F /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm3/mem, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 AF /r | fma | Multiply scalar double-precision floating-point value from xmm1 and xmm2, negate the multiplication result and subtract xmm3/mem and put result in xmm1. | 
| VFNMSUB231SD xmm1, xmm2, xmm3/m64 | VEX.DDS.LIG.66.0F38.W1 BF /r | fma | Multiply scalar double-precision floating-point value from xmm2 and xmm3/mem, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 9F /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm3/m64, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 AF /r | avx512 | Multiply scalar double-precision floating-point value from xmm1 and xmm2, negate the multiplication result and subtract xmm3/m64 and put result in xmm1. | 
| VFNMSUB231SD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.DDS.LIG.66.0F38.W1 BF /r | avx512 | Multiply scalar double-precision floating-point value from xmm2 and xmm3/m64, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 9F /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 AF /r | fma | Multiply scalar single-precision floating-point value from xmm1 and xmm2, negate the multiplication result and subtract xmm3/m32 and put result in xmm1. | 
| VFNMSUB231SS xmm1, xmm2, xmm3/m32 | VEX.DDS.LIG.66.0F38.W0 BF /r | fma | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFNMSUB132SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 9F /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm3/m32, negate the multiplication result and subtract xmm2 and put result in xmm1. | 
| VFNMSUB213SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 AF /r | avx512 | Multiply scalar single-precision floating-point value from xmm1 and xmm2, negate the multiplication result and subtract xmm3/m32 and put result in xmm1. | 
| VFNMSUB231SS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.DDS.LIG.66.0F38.W0 BF /r | avx512 | Multiply scalar single-precision floating-point value from xmm2 and xmm3/m32, negate the multiplication result and subtract xmm1 and put result in xmm1. | 
| VFPCLASSPD k2 {k1}, xmm2/m128/m64bcst, imm8 | EVEX.128.66.0F3A.W1 66 /r ib | avx512 | NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSPD k2 {k1}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 66 /r ib | avx512 | NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSPD k2 {k1}, zmm2/m512/m64bcst, imm8 | EVEX.512.66.0F3A.W1 66 /r ib | avx512 | NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSPS k2 {k1}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F3A.W0 66 /r ib | avx512 | Tests the input for the following categories: NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSPS k2 {k1}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F3A.W0 66 /r ib | avx512 | Tests the input for the following categories: NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSPS k2 {k1}, zmm2/m512/m32bcst, imm8 | EVEX.512.66.0F3A.W0 66 /r ib | avx512 | Tests the input for the following categories: NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSSD k2 {k1}, xmm2/m64, imm8 | EVEX.LIG.66.0F3A.W1 67 /r ib | avx512 | Tests the input for the following categories: NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VFPCLASSSS k2 {k1}, xmm2/m32, imm8 | EVEX.LIG.66.0F3A.W0 67 /r | avx512 | Tests the input for the following categories: NaN, +0, -0, +Infinity, -Infinity, denormal, finite negative. The immediate field provides a mask bit for each of these category tests. The masked test results are OR-ed together to form a mask result. | 
| VGATHERDPD xmm1, vm32x, xmm2 | VEX.DDS.128.66.0F38.W1 92 /r | avx2 | Using dword indices specified in vm32x, gather double-pre-cision FP values from memory conditioned on mask speci-fied by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VGATHERQPD xmm1, vm64x, xmm2 | VEX.DDS.128.66.0F38.W1 93 /r | avx2 | Using qword indices specified in vm64x, gather double-pre-cision FP values from memory conditioned on mask speci-fied by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VGATHERDPD ymm1, vm32x, ymm2 | VEX.DDS.256.66.0F38.W1 92 /r | avx2 | Using dword indices specified in vm32x, gather double-pre-cision FP values from memory conditioned on mask speci-fied by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VGATHERQPD ymm1, vm64y, ymm2 | VEX.DDS.256.66.0F38.W1 93 /r | avx2 | Using qword indices specified in vm64y, gather double-pre-cision FP values from memory conditioned on mask speci-fied by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VGATHERDPS xmm1 {k1}, vm32x | EVEX.128.66.0F38.W0 92 /vsib | avx512 | Using signed dword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERDPS ymm1 {k1}, vm32y | EVEX.256.66.0F38.W0 92 /vsib | avx512 | Using signed dword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERDPS zmm1 {k1}, vm32z | EVEX.512.66.0F38.W0 92 /vsib | avx512 | Using signed dword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERDPD xmm1 {k1}, vm32x | EVEX.128.66.0F38.W1 92 /vsib | avx512 | Using signed dword indices, gather float64 vector into float64 vector xmm1 using k1 as completion mask. | 
| VGATHERDPD ymm1 {k1}, vm32x | EVEX.256.66.0F38.W1 92 /vsib | avx512 | Using signed dword indices, gather float64 vector into float64 vector ymm1 using k1 as completion mask. | 
| VGATHERDPD zmm1 {k1}, vm32y | EVEX.512.66.0F38.W1 92 /vsib | avx512 | Using signed dword indices, gather float64 vector into float64 vector zmm1 using k1 as completion mask. | 
| VGATHERDPS xmm1, vm32x, xmm2 | VEX.DDS.128.66.0F38.W0 92 /r | avx2 | Using dword indices specified in vm32x, gather single-preci-sion FP values from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VGATHERQPS xmm1, vm64x, xmm2 | VEX.DDS.128.66.0F38.W0 93 /r | avx2 | Using qword indices specified in vm64x, gather single-preci-sion FP values from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VGATHERDPS ymm1, vm32y, ymm2 | VEX.DDS.256.66.0F38.W0 92 /r | avx2 | Using dword indices specified in vm32y, gather single-preci-sion FP values from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VGATHERQPS xmm1, vm64y, xmm2 | VEX.DDS.256.66.0F38.W0 93 /r | avx2 | Using qword indices specified in vm64y, gather single-preci-sion FP values from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VGATHERPF0DPS vm32z {k1} | EVEX.512.66.0F38.W0 C6 /1 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing single-precision data using opmask k1 and T0 hint. | 
| VGATHERPF0QPS vm64z {k1} | EVEX.512.66.0F38.W0 C7 /1 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing single-precision data using opmask k1 and T0 hint. | 
| VGATHERPF0DPD vm32y {k1} | EVEX.512.66.0F38.W1 C6 /1 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing double-precision data using opmask k1 and T0 hint. | 
| VGATHERPF0QPD vm64z {k1} | EVEX.512.66.0F38.W1 C7 /1 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing double-precision data using opmask k1 and T0 hint. | 
| VGATHERPF1DPS vm32z {k1} | EVEX.512.66.0F38.W0 C6 /2 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing single-precision data using opmask k1 and T1 hint. | 
| VGATHERPF1QPS vm64z {k1} | EVEX.512.66.0F38.W0 C7 /2 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing single-precision data using opmask k1 and T1 hint. | 
| VGATHERPF1DPD vm32y {k1} | EVEX.512.66.0F38.W1 C6 /2 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing double-precision data using opmask k1 and T1 hint. | 
| VGATHERPF1QPD vm64z {k1} | EVEX.512.66.0F38.W1 C7 /2 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing double-precision data using opmask k1 and T1 hint. | 
| VGATHERQPS xmm1 {k1}, vm64x | EVEX.128.66.0F38.W0 93 /vsib | avx512 | Using signed qword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERQPS xmm1 {k1}, vm64y | EVEX.256.66.0F38.W0 93 /vsib | avx512 | Using signed qword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERQPS ymm1 {k1}, vm64z | EVEX.512.66.0F38.W0 93 /vsib | avx512 | Using signed qword indices, gather single-precision floating-point values from memory using k1 as completion mask. | 
| VGATHERQPD xmm1 {k1}, vm64x | EVEX.128.66.0F38.W1 93 /vsib | avx512 | Using signed qword indices, gather float64 vector into float64 vector xmm1 using k1 as completion mask. | 
| VGATHERQPD ymm1 {k1}, vm64y | EVEX.256.66.0F38.W1 93 /vsib | avx512 | Using signed qword indices, gather float64 vector into float64 vector ymm1 using k1 as completion mask. | 
| VGATHERQPD zmm1 {k1}, vm64z | EVEX.512.66.0F38.W1 93 /vsib | avx512 | Using signed qword indices, gather float64 vector into float64 vector zmm1 using k1 as completion mask. | 
| VGETEXPPD xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F38.W1 42 /r | avx512 | Convert the exponent of packed double-precision floating-point values in the source operand to DP FP results representing unbiased integer exponents and stores the results in the destination register. | 
| VGETEXPPD ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F38.W1 42 /r | avx512 | Convert the exponent of packed double-precision floating-point values in the source operand to DP FP results representing unbiased integer exponents and stores the results in the destination register. | 
| VGETEXPPD zmm1 {k1}{z}, zmm2/m512/m64bcst{sae} | EVEX.512.66.0F38.W1 42 /r | avx512 | Convert the exponent of packed double-precision floating-point values in the source operand to DP FP results representing unbiased integer exponents and stores the results in the destination under writemask k1. | 
| VGETEXPPS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F38.W0 42 /r | avx512 | Convert the exponent of packed single-precision floating-point values in the source operand to SP FP results representing unbiased integer exponents and stores the results in the destination register. | 
| VGETEXPPS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F38.W0 42 /r | avx512 | Convert the exponent of packed single-precision floating-point values in the source operand to SP FP results representing unbiased integer exponents and stores the results in the destination register. | 
| VGETEXPPS zmm1 {k1}{z}, zmm2/m512/m32bcst{sae} | EVEX.512.66.0F38.W0 42 /r | avx512 | Convert the exponent of packed single-precision floating-point values in the source operand to SP FP results representing unbiased integer exponents and stores the results in the destination register. | 
| VGETEXPSD xmm1 {k1}{z}, xmm2, xmm3/m64{sae} | EVEX.NDS.LIG.66.0F38.W1 43 /r | avx512 | Convert the biased exponent (bits 62:52) of the low double-precision floating-point value in xmm3/m64 to a DP FP value representing unbiased integer exponent. Stores the result to the low 64-bit of xmm1 under the writemask k1 and merge with the other elements of xmm2. | 
| VGETEXPSS xmm1 {k1}{z}, xmm2, xmm3/m32{sae} | EVEX.NDS.LIG.66.0F38.W0 43 /r | avx512 | Convert the biased exponent (bits 30:23) of the low single-precision floating-point value in xmm3/m32 to a SP FP value representing unbiased integer exponent. Stores the result to xmm1 under the writemask k1 and merge with the other elements of xmm2. | 
| VGETMANTPD xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.128.66.0F3A.W1 26 /r ib | avx512 | Get Normalized Mantissa from float64 vector xmm2/m128/m64bcst and store the result in xmm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTPD ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 26 /r ib | avx512 | Get Normalized Mantissa from float64 vector ymm2/m256/m64bcst and store the result in ymm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTPD zmm1 {k1}{z}, zmm2/m512/m64bcst{sae}, imm8 | EVEX.512.66.0F3A.W1 26 /r ib | avx512 | Get Normalized Mantissa from float64 vector zmm2/m512/m64bcst and store the result in zmm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTPS xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F3A.W0 26 /r ib | avx512 | Get normalized mantissa from float32 vector xmm2/m128/m32bcst and store the result in xmm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTPS ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F3A.W0 26 /r ib | avx512 | Get normalized mantissa from float32 vector ymm2/m256/m32bcst and store the result in ymm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTPS zmm1 {k1}{z}, zmm2/m512/m32bcst{sae}, imm8 | EVEX.512.66.0F3A.W0 26 /r ib | avx512 | Get normalized mantissa from float32 vector zmm2/m512/m32bcst and store the result in zmm1, using imm8 for sign control and mantissa interval normalization, under writemask. | 
| VGETMANTSD xmm1 {k1}{z}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W1 27 /r ib | avx512 | Extract the normalized mantissa of the low float64 element in xmm3/m64 using imm8 for sign control and mantissa interval normalization. Store the mantissa to xmm1 under the writemask k1 and merge with the other elements of xmm2. | 
| VGETMANTSS xmm1 {k1}{z}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W0 27 /r ib | avx512 | Extract the normalized mantissa from the low float32 element of xmm3/m32 using imm8 for sign control and mantissa interval normalization, store the mantissa to xmm1 under the writemask k1 and merge with the other elements of xmm2. | 
| VINSERTF128 ymm1, ymm2, xmm3/m128, imm8 | VEX.NDS.256.66.0F3A.W0 18 /r ib | avx | Insert 128 bits of packed floating-point values from xmm3/m128 and the remaining values from ymm2 into ymm1. | 
| VINSERTF32X4 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8 | EVEX.NDS.256.66.0F3A.W0 18 /r ib | avx512 | Insert 128 bits of packed single-precision floating-point values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1. | 
| VINSERTF32X4 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8 | EVEX.NDS.512.66.0F3A.W0 18 /r ib | avx512 | Insert 128 bits of packed single-precision floating-point values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTF64X2 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8 | EVEX.NDS.256.66.0F3A.W1 18 /r ib | avx512 | Insert 128 bits of packed double-precision floating-point values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1. | 
| VINSERTF64X2 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8 | EVEX.NDS.512.66.0F3A.W1 18 /r ib | avx512 | Insert 128 bits of packed double-precision floating-point values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTF32X8 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8 | EVEX.NDS.512.66.0F3A.W0 1A /r ib | avx512 | Insert 256 bits of packed single-precision floating-point values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTF64X4 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8 | EVEX.NDS.512.66.0F3A.W1 1A /r ib | avx512 | Insert 256 bits of packed double-precision floating-point values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTI128 ymm1, ymm2, xmm3/m128, imm8 | VEX.NDS.256.66.0F3A.W0 38 /r ib | avx2 | Insert 128 bits of integer data from xmm3/m128 and the remaining values from ymm2 into ymm1. | 
| VINSERTI32X4 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8 | EVEX.NDS.256.66.0F3A.W0 38 /r ib | avx512 | Insert 128 bits of packed doubleword integer values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1. | 
| VINSERTI32X4 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8 | EVEX.NDS.512.66.0F3A.W0 38 /r ib | avx512 | Insert 128 bits of packed doubleword integer values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTI64X2 ymm1 {k1}{z}, ymm2, xmm3/m128, imm8 | EVEX.NDS.256.66.0F3A.W1 38 /r ib | avx512 | Insert 128 bits of packed quadword integer values from xmm3/m128 and the remaining values from ymm2 into ymm1 under writemask k1. | 
| VINSERTI64X2 zmm1 {k1}{z}, zmm2, xmm3/m128, imm8 | EVEX.NDS.512.66.0F3A.W1 38 /r ib | avx512 | Insert 128 bits of packed quadword integer values from xmm3/m128 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTI32X8 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8 | EVEX.NDS.512.66.0F3A.W0 3A /r ib | avx512 | Insert 256 bits of packed doubleword integer values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VINSERTI64X4 zmm1 {k1}{z}, zmm2, ymm3/m256, imm8 | EVEX.NDS.512.66.0F3A.W1 3A /r ib | avx512 | Insert 256 bits of packed quadword integer values from ymm3/m256 and the remaining values from zmm2 into zmm1 under writemask k1. | 
| VMASKMOVPS xmm1, xmm2, m128 | VEX.NDS.128.66.0F38.W0 2C /r | avx | Conditionally load packed single-precision values from m128 using mask in xmm2 and store in xmm1. | 
| VMASKMOVPS ymm1, ymm2, m256 | VEX.NDS.256.66.0F38.W0 2C /r | avx | Conditionally load packed single-precision values from m256 using mask in ymm2 and store in ymm1. | 
| VMASKMOVPD xmm1, xmm2, m128 | VEX.NDS.128.66.0F38.W0 2D /r | avx | Conditionally load packed double-precision values from m128 using mask in xmm2 and store in xmm1. | 
| VMASKMOVPD ymm1, ymm2, m256 | VEX.NDS.256.66.0F38.W0 2D /r | avx | Conditionally load packed double-precision values from m256 using mask in ymm2 and store in ymm1. | 
| VMASKMOVPS m128, xmm1, xmm2 | VEX.NDS.128.66.0F38.W0 2E /r | avx | Conditionally store packed single-precision values from xmm2 using mask in xmm1. | 
| VMASKMOVPS m256, ymm1, ymm2 | VEX.NDS.256.66.0F38.W0 2E /r | avx | Conditionally store packed single-precision values from ymm2 using mask in ymm1. | 
| VMASKMOVPD m128, xmm1, xmm2 | VEX.NDS.128.66.0F38.W0 2F /r | avx | Conditionally store packed double-precision values from xmm2 using mask in xmm1. | 
| VMASKMOVPD m256, ymm1, ymm2 | VEX.NDS.256.66.0F38.W0 2F /r | avx | Conditionally store packed double-precision values from ymm2 using mask in ymm1. | 
| VPBLENDD xmm1, xmm2, xmm3/m128, imm8 | VEX.NDS.128.66.0F3A.W0 02 /r ib | avx2 | Select dwords from xmm2 and xmm3/m128 from mask specified in imm8 and store the values into xmm1. | 
| VPBLENDD ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.W0 02 /r ib | avx2 | Select dwords from ymm2 and ymm3/m256 from mask specified in imm8 and store the values into ymm1. | 
| VPBLENDMB xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W0 66 /r | avx512 | Blend byte integer vector xmm2 and byte vector xmm3/m128 and store the result in xmm1, under control mask. | 
| VPBLENDMB ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W0 66 /r | avx512 | Blend byte integer vector ymm2 and byte vector ymm3/m256 and store the result in ymm1, under control mask. | 
| VPBLENDMB zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W0 66 /r | avx512 | Blend byte integer vector zmm2 and byte vector zmm3/m512 and store the result in zmm1, under control mask. | 
| VPBLENDMW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 66 /r | avx512 | Blend word integer vector xmm2 and word vector xmm3/m128 and store the result in xmm1, under control mask. | 
| VPBLENDMW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 66 /r | avx512 | Blend word integer vector ymm2 and word vector ymm3/m256 and store the result in ymm1, under control mask. | 
| VPBLENDMW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 66 /r | avx512 | Blend word integer vector zmm2 and word vector zmm3/m512 and store the result in zmm1, under control mask. | 
| VPBLENDMD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 64 /r | avx512 | Blend doubleword integer vector xmm2 and doubleword vector xmm3/m128/m32bcst and store the result in xmm1, under control mask. | 
| VPBLENDMD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 64 /r | avx512 | Blend doubleword integer vector ymm2 and doubleword vector ymm3/m256/m32bcst and store the result in ymm1, under control mask. | 
| VPBLENDMD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 64 /r | avx512 | Blend doubleword integer vector zmm2 and doubleword vector zmm3/m512/m32bcst and store the result in zmm1, under control mask. | 
| VPBLENDMQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 64 /r | avx512 | Blend quadword integer vector xmm2 and quadword vector xmm3/m128/m64bcst and store the result in xmm1, under control mask. | 
| VPBLENDMQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 64 /r | avx512 | Blend quadword integer vector ymm2 and quadword vector ymm3/m256/m64bcst and store the result in ymm1, under control mask. | 
| VPBLENDMQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 64 /r | avx512 | Blend quadword integer vector zmm2 and quadword vector zmm3/m512/m64bcst and store the result in zmm1, under control mask. | 
| VPBROADCASTB xmm1, xmm2/m8 | VEX.128.66.0F38.W0 78 /r | avx2 | Broadcast a byte integer in the source operand to sixteen locations in xmm1. | 
| VPBROADCASTB ymm1, xmm2/m8 | VEX.256.66.0F38.W0 78 /r | avx2 | Broadcast a byte integer in the source operand to thirty-two locations in ymm1. | 
| VPBROADCASTB xmm1{k1}{z}, xmm2/m8 | EVEX.128.66.0F38.W0 78 /r | avx512 | Broadcast a byte integer in the source operand to locations in xmm1 subject to writemask k1. | 
| VPBROADCASTB ymm1{k1}{z}, xmm2/m8 | EVEX.256.66.0F38.W0 78 /r | avx512 | Broadcast a byte integer in the source operand to locations in ymm1 subject to writemask k1. | 
| VPBROADCASTB zmm1{k1}{z}, xmm2/m8 | EVEX.512.66.0F38.W0 78 /r | avx512 | Broadcast a byte integer in the source operand to 64 locations in zmm1 subject to writemask k1. | 
| VPBROADCASTW xmm1, xmm2/m16 | VEX.128.66.0F38.W0 79 /r | avx2 | Broadcast a word integer in the source operand to eight locations in xmm1. | 
| VPBROADCASTW ymm1, xmm2/m16 | VEX.256.66.0F38.W0 79 /r | avx2 | Broadcast a word integer in the source operand to sixteen locations in ymm1. | 
| VPBROADCASTW xmm1{k1}{z}, xmm2/m16 | EVEX.128.66.0F38.W0 79 /r | avx512 | Broadcast a word integer in the source operand to locations in xmm1 subject to writemask k1. | 
| VPBROADCASTW ymm1{k1}{z}, xmm2/m16 | EVEX.256.66.0F38.W0 79 /r | avx512 | Broadcast a word integer in the source operand to locations in ymm1 subject to writemask k1. | 
| VPBROADCASTW zmm1{k1}{z}, xmm2/m16 | EVEX.512.66.0F38.W0 79 /r | avx512 | Broadcast a word integer in the source operand to 32 locations in zmm1 subject to writemask k1. | 
| VPBROADCASTD xmm1, xmm2/m32 | VEX.128.66.0F38.W0 58 /r | avx2 | Broadcast a dword integer in the source operand to four locations in xmm1. | 
| VPBROADCASTD ymm1, xmm2/m32 | VEX.256.66.0F38.W0 58 /r | avx2 | Broadcast a dword integer in the source operand to eight locations in ymm1. | 
| VPBROADCASTD xmm1 {k1}{z}, xmm2/m32 | EVEX.128.66.0F38.W0 58 /r | avx512 | Broadcast a dword integer in the source operand to locations in xmm1 subject to writemask k1. | 
| VPBROADCASTD ymm1 {k1}{z}, xmm2/m32 | EVEX.256.66.0F38.W0 58 /r | avx512 | Broadcast a dword integer in the source operand to locations in ymm1 subject to writemask k1. | 
| VPBROADCASTD zmm1 {k1}{z}, xmm2/m32 | EVEX.512.66.0F38.W0 58 /r | avx512 | Broadcast a dword integer in the source operand to locations in zmm1 subject to writemask k1. | 
| VPBROADCASTQ xmm1, xmm2/m64 | VEX.128.66.0F38.W0 59 /r | avx2 | Broadcast a qword element in source operand to two locations in xmm1. | 
| VPBROADCASTQ ymm1, xmm2/m64 | VEX.256.66.0F38.W0 59 /r | avx2 | Broadcast a qword element in source operand to four locations in ymm1. | 
| VPBROADCASTQ xmm1 {k1}{z}, xmm2/m64 | EVEX.128.66.0F38.W1 59 /r | avx512 | Broadcast a qword element in source operand to locations in xmm1 subject to writemask k1. | 
| VPBROADCASTQ ymm1 {k1}{z}, xmm2/m64 | EVEX.256.66.0F38.W1 59 /r | avx512 | Broadcast a qword element in source operand to locations in ymm1 subject to writemask k1. | 
| VPBROADCASTQ zmm1 {k1}{z}, xmm2/m64 | EVEX.512.66.0F38.W1 59 /r | avx512 | Broadcast a qword element in source operand to locations in zmm1 subject to writemask k1. | 
| VBROADCASTI32x2 xmm1 {k1}{z}, xmm2/m64 | EVEX.128.66.0F38.W0 59 /r | avx512 | Broadcast two dword elements in source operand to locations in xmm1 subject to writemask k1. | 
| VBROADCASTI32x2 ymm1 {k1}{z}, xmm2/m64 | EVEX.256.66.0F38.W0 59 /r | avx512 | Broadcast two dword elements in source operand to locations in ymm1 subject to writemask k1. | 
| VBROADCASTI32x2 zmm1 {k1}{z}, xmm2/m64 | EVEX.512.66.0F38.W0 59 /r | avx512 | Broadcast two dword elements in source operand to locations in zmm1 subject to writemask k1. | 
| VBROADCASTI128 ymm1, m128 | VEX.256.66.0F38.W0 5A /r | avx2 | Broadcast 128 bits of integer data in mem to low and high 128-bits in ymm1. | 
| VBROADCASTI32X4 ymm1 {k1}{z}, m128 | EVEX.256.66.0F38.W0 5A /r | avx512 | Broadcast 128 bits of 4 doubleword integer data in mem to locations in ymm1 using writemask k1. | 
| VBROADCASTI32X4 zmm1 {k1}{z}, m128 | EVEX.512.66.0F38.W0 5A /r | avx512 | Broadcast 128 bits of 4 doubleword integer data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTI64X2 ymm1 {k1}{z}, m128 | EVEX.256.66.0F38.W1 5A /r | avx512 | Broadcast 128 bits of 2 quadword integer data in mem to locations in ymm1 using writemask k1. | 
| VBROADCASTI64X2 zmm1 {k1}{z}, m128 | EVEX.512.66.0F38.W1 5A /r | avx512 | Broadcast 128 bits of 2 quadword integer data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTI32X8 zmm1 {k1}{z}, m256 | EVEX.512.66.0F38.W0 5B /r | avx512 | Broadcast 256 bits of 8 doubleword integer data in mem to locations in zmm1 using writemask k1. | 
| VBROADCASTI64X4 zmm1 {k1}{z}, m256 | EVEX.512.66.0F38.W1 5B /r | avx512 | Broadcast 256 bits of 4 quadword integer data in mem to locations in zmm1 using writemask k1. | 
| VPBROADCASTB xmm1 {k1}{z}, reg | EVEX.128.66.0F38.W0 7A /r | avx512 | Broadcast an 8-bit value from a GPR to all bytes in the 128-bit destination subject to writemask k1. | 
| VPBROADCASTB ymm1 {k1}{z}, reg | EVEX.256.66.0F38.W0 7A /r | avx512 | Broadcast an 8-bit value from a GPR to all bytes in the 256-bit destination subject to writemask k1. | 
| VPBROADCASTB zmm1 {k1}{z}, reg | EVEX.512.66.0F38.W0 7A /r | avx512 | Broadcast an 8-bit value from a GPR to all bytes in the 512-bit destination subject to writemask k1. | 
| VPBROADCASTW xmm1 {k1}{z}, reg | EVEX.128.66.0F38.W0 7B /r | avx512 | Broadcast a 16-bit value from a GPR to all words in the 128-bit destination subject to writemask k1. | 
| VPBROADCASTW ymm1 {k1}{z}, reg | EVEX.256.66.0F38.W0 7B /r | avx512 | Broadcast a 16-bit value from a GPR to all words in the 256-bit destination subject to writemask k1. | 
| VPBROADCASTW zmm1 {k1}{z}, reg | EVEX.512.66.0F38.W0 7B /r | avx512 | Broadcast a 16-bit value from a GPR to all words in the 512-bit destination subject to writemask k1. | 
| VPBROADCASTD xmm1 {k1}{z}, r32 | EVEX.128.66.0F38.W0 7C /r | avx512 | Broadcast a 32-bit value from a GPR to all double-words in the 128-bit destination subject to writemask k1. | 
| VPBROADCASTD ymm1 {k1}{z}, r32 | EVEX.256.66.0F38.W0 7C /r | avx512 | Broadcast a 32-bit value from a GPR to all double-words in the 256-bit destination subject to writemask k1. | 
| VPBROADCASTD zmm1 {k1}{z}, r32 | EVEX.512.66.0F38.W0 7C /r | avx512 | Broadcast a 32-bit value from a GPR to all double-words in the 512-bit destination subject to writemask k1. | 
| VPBROADCASTQ xmm1 {k1}{z}, r64 | EVEX.128.66.0F38.W1 7C /r | avx512 | Broadcast a 64-bit value from a GPR to all quad-words in the 128-bit destination subject to writemask k1. | 
| VPBROADCASTQ ymm1 {k1}{z}, r64 | EVEX.256.66.0F38.W1 7C /r | avx512 | Broadcast a 64-bit value from a GPR to all quad-words in the 256-bit destination subject to writemask k1. | 
| VPBROADCASTQ zmm1 {k1}{z}, r64 | EVEX.512.66.0F38.W1 7C /r | avx512 | Broadcast a 64-bit value from a GPR to all quad-words in the 512-bit destination subject to writemask k1. | 
| VPBROADCASTMB2Q xmm1, k1 | EVEX.128.F3.0F38.W1 2A /r | avx512 | Broadcast low byte value in k1 to two locations in xmm1. | 
| VPBROADCASTMB2Q ymm1, k1 | EVEX.256.F3.0F38.W1 2A /r | avx512 | Broadcast low byte value in k1 to four locations in ymm1. | 
| VPBROADCASTMB2Q zmm1, k1 | EVEX.512.F3.0F38.W1 2A /r | avx512 | Broadcast low byte value in k1 to eight locations in zmm1. | 
| VPBROADCASTMW2D xmm1, k1 | EVEX.128.F3.0F38.W0 3A /r | avx512 | Broadcast low word value in k1 to four locations in xmm1. | 
| VPBROADCASTMW2D ymm1, k1 | EVEX.256.F3.0F38.W0 3A /r | avx512 | Broadcast low word value in k1 to eight locations in ymm1. | 
| VPBROADCASTMW2D zmm1, k1 | EVEX.512.F3.0F38.W0 3A /r | avx512 | Broadcast low word value in k1 to sixteen locations in zmm1. | 
| VPCMPB k1 {k2}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.W0 3F /r ib | avx512 | Compare packed signed byte values in xmm3/m128 and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPB k1 {k2}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.W0 3F /r ib | avx512 | Compare packed signed byte values in ymm3/m256 and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPB k1 {k2}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.W0 3F /r ib | avx512 | Compare packed signed byte values in zmm3/m512 and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUB k1 {k2}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.W0 3E /r ib | avx512 | Compare packed unsigned byte values in xmm3/m128 and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUB k1 {k2}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.W0 3E /r ib | avx512 | Compare packed unsigned byte values in ymm3/m256 and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUB k1 {k2}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.W0 3E /r ib | avx512 | Compare packed unsigned byte values in zmm3/m512 and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPD k1 {k2}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.66.0F3A.W0 1F /r ib | avx512 | Compare packed signed doubleword integer values in xmm3/m128/m32bcst and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPD k1 {k2}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 1F /r ib | avx512 | Compare packed signed doubleword integer values in ymm3/m256/m32bcst and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPD k1 {k2}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.66.0F3A.W0 1F /r ib | avx512 | Compare packed signed doubleword integer values in zmm2 and zmm3/m512/m32bcst using bits 2:0 of imm8 as a comparison predicate. The comparison results are written to the destination k1 under writemask k2. | 
| VPCMPUD k1 {k2}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.66.0F3A.W0 1E /r ib | avx512 | Compare packed unsigned doubleword integer values in xmm3/m128/m32bcst and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUD k1 {k2}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 1E /r ib | avx512 | Compare packed unsigned doubleword integer values in ymm3/m256/m32bcst and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUD k1 {k2}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.66.0F3A.W0 1E /r ib | avx512 | Compare packed unsigned doubleword integer values in zmm2 and zmm3/m512/m32bcst using bits 2:0 of imm8 as a comparison predicate. The comparison results are written to the destination k1 under writemask k2. | 
| VPCMPQ k1 {k2}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F3A.W1 1F /r ib | avx512 | Compare packed signed quadword integer values in xmm3/m128/m64bcst and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPQ k1 {k2}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 1F /r ib | avx512 | Compare packed signed quadword integer values in ymm3/m256/m64bcst and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPQ k1 {k2}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F3A.W1 1F /r ib | avx512 | Compare packed signed quadword integer values in zmm3/m512/m64bcst and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUQ k1 {k2}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F3A.W1 1E /r ib | avx512 | Compare packed unsigned quadword integer values in xmm3/m128/m64bcst and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUQ k1 {k2}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 1E /r ib | avx512 | Compare packed unsigned quadword integer values in ymm3/m256/m64bcst and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUQ k1 {k2}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F3A.W1 1E /r ib | avx512 | Compare packed unsigned quadword integer values in zmm3/m512/m64bcst and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPW k1 {k2}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.W1 3F /r ib | avx512 | Compare packed signed word integers in xmm3/m128 and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPW k1 {k2}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.W1 3F /r ib | avx512 | Compare packed signed word integers in ymm3/m256 and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPW k1 {k2}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.W1 3F /r ib | avx512 | Compare packed signed word integers in zmm3/m512 and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUW k1 {k2}, xmm2, xmm3/m128, imm8 | EVEX.NDS.128.66.0F3A.W1 3E /r ib | avx512 | Compare packed unsigned word integers in xmm3/m128 and xmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUW k1 {k2}, ymm2, ymm3/m256, imm8 | EVEX.NDS.256.66.0F3A.W1 3E /r ib | avx512 | Compare packed unsigned word integers in ymm3/m256 and ymm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCMPUW k1 {k2}, zmm2, zmm3/m512, imm8 | EVEX.NDS.512.66.0F3A.W1 3E /r ib | avx512 | Compare packed unsigned word integers in zmm3/m512 and zmm2 using bits 2:0 of imm8 as a comparison predicate with writemask k2 and leave the result in mask register k1. | 
| VPCOMPRESSD xmm1/m128 {k1}{z}, xmm2 | EVEX.128.66.0F38.W0 8B /r | avx512 | Compress packed doubleword integer values from xmm2 to xmm1/m128 using controlmask k1. | 
| VPCOMPRESSD ymm1/m256 {k1}{z}, ymm2 | EVEX.256.66.0F38.W0 8B /r | avx512 | Compress packed doubleword integer values from ymm2 to ymm1/m256 using controlmask k1. | 
| VPCOMPRESSD zmm1/m512 {k1}{z}, zmm2 | EVEX.512.66.0F38.W0 8B /r | avx512 | Compress packed doubleword integer values from zmm2 to zmm1/m512 using controlmask k1. | 
| VPCOMPRESSQ xmm1/m128 {k1}{z}, xmm2 | EVEX.128.66.0F38.W1 8B /r | avx512 | Compress packed quadword integer values from xmm2 to xmm1/m128 using controlmask k1. | 
| VPCOMPRESSQ ymm1/m256 {k1}{z}, ymm2 | EVEX.256.66.0F38.W1 8B /r | avx512 | Compress packed quadword integer values from ymm2 to ymm1/m256 using controlmask k1. | 
| VPCOMPRESSQ zmm1/m512 {k1}{z}, zmm2 | EVEX.512.66.0F38.W1 8B /r | avx512 | Compress packed quadword integer values from zmm2 to zmm1/m512 using controlmask k1. | 
| VPCONFLICTD xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F38.W0 C4 /r | avx512 | Detect duplicate double-word values in xmm2/m128/m32bcst using writemask k1. | 
| VPCONFLICTD ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F38.W0 C4 /r | avx512 | Detect duplicate double-word values in ymm2/m256/m32bcst using writemask k1. | 
| VPCONFLICTD zmm1 {k1}{z}, zmm2/m512/m32bcst | EVEX.512.66.0F38.W0 C4 /r | avx512 | Detect duplicate double-word values in zmm2/m512/m32bcst using writemask k1. | 
| VPCONFLICTQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F38.W1 C4 /r | avx512 | Detect duplicate quad-word values in xmm2/m128/m64bcst using writemask k1. | 
| VPCONFLICTQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F38.W1 C4 /r | avx512 | Detect duplicate quad-word values in ymm2/m256/m64bcst using writemask k1. | 
| VPCONFLICTQ zmm1 {k1}{z}, zmm2/m512/m64bcst | EVEX.512.66.0F38.W1 C4 /r | avx512 | Detect duplicate quad-word values in zmm2/m512/m64bcst using writemask k1. | 
| VPERM2F128 ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.W0 06 /r ib | avx | Permute 128-bit floating-point fields in ymm2 and ymm3/mem using controls from imm8 and store result in ymm1. | 
| VPERM2I128 ymm1, ymm2, ymm3/m256, imm8 | VEX.NDS.256.66.0F3A.W0 46 /r ib | avx2 | Permute 128-bit integer data in ymm2 and ymm3/mem using controls from imm8 and store result in ymm1. | 
| VPERMD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 36 /r | avx2 | Permute doublewords in ymm3/m256 using indices in ymm2 and store the result in ymm1. | 
| VPERMD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 36 /r | avx512 | Permute doublewords in ymm3/m256/m32bcst using indexes in ymm2 and store the result in ymm1 using writemask k1. | 
| VPERMD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 36 /r | avx512 | Permute doublewords in zmm3/m512/m32bcst using indices in zmm2 and store the result in zmm1 using writemask k1. | 
| VPERMW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 8D /r | avx512 | Permute word integers in xmm3/m128 using indexes in xmm2 and store the result in xmm1 using writemask k1. | 
| VPERMW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 8D /r | avx512 | Permute word integers in ymm3/m256 using indexes in ymm2 and store the result in ymm1 using writemask k1. | 
| VPERMW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 8D /r | avx512 | Permute word integers in zmm3/m512 using indexes in zmm2 and store the result in zmm1 using writemask k1. | 
| VPERMI2W xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.DDS.128.66.0F38.W1 75 /r | avx512 | Permute word integers from two tables in xmm3/m128 and xmm2 using indexes in xmm1 and store the result in xmm1 using writemask k1. | 
| VPERMI2W ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.DDS.256.66.0F38.W1 75 /r | avx512 | Permute word integers from two tables in ymm3/m256 and ymm2 using indexes in ymm1 and store the result in ymm1 using writemask k1. | 
| VPERMI2W zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.DDS.512.66.0F38.W1 75 /r | avx512 | Permute word integers from two tables in zmm3/m512 and zmm2 using indexes in zmm1 and store the result in zmm1 using writemask k1. | 
| VPERMI2D xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 76 /r | avx512 | Permute double-words from two tables in xmm3/m128/m32bcst and xmm2 using indexes in xmm1 and store the result in xmm1 using writemask k1. | 
| VPERMI2D ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 76 /r | avx512 | Permute double-words from two tables in ymm3/m256/m32bcst and ymm2 using indexes in ymm1 and store the result in ymm1 using writemask k1. | 
| VPERMI2D zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.DDS.512.66.0F38.W0 76 /r | avx512 | Permute double-words from two tables in zmm3/m512/m32bcst and zmm2 using indices in zmm1 and store the result in zmm1 using writemask k1. | 
| VPERMI2Q xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 76 /r | avx512 | Permute quad-words from two tables in xmm3/m128/m64bcst and xmm2 using indexes in xmm1 and store the result in xmm1 using writemask k1. | 
| VPERMI2Q ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.DDS.256.66.0F38.W1 76 /r | avx512 | Permute quad-words from two tables in ymm3/m256/m64bcst and ymm2 using indexes in ymm1 and store the result in ymm1 using writemask k1. | 
| VPERMI2Q zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.DDS.512.66.0F38.W1 76 /r | avx512 | Permute quad-words from two tables in zmm3/m512/m64bcst and zmm2 using indices in zmm1 and store the result in zmm1 using writemask k1. | 
| VPERMI2PS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.DDS.128.66.0F38.W0 77 /r | avx512 | Permute single-precision FP values from two tables in xmm3/m128/m32bcst and xmm2 using indexes in xmm1 and store the result in xmm1 using writemask k1. | 
| VPERMI2PS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.DDS.256.66.0F38.W0 77 /r | avx512 | Permute single-precision FP values from two tables in ymm3/m256/m32bcst and ymm2 using indexes in ymm1 and store the result in ymm1 using writemask k1. | 
| VPERMI2PS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.DDS.512.66.0F38.W0 77 /r | avx512 | Permute single-precision FP values from two tables in zmm3/m512/m32bcst and zmm2 using indices in zmm1 and store the result in zmm1 using writemask k1. | 
| VPERMI2PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 77 /r | avx512 | Permute double-precision FP values from two tables in xmm3/m128/m64bcst and xmm2 using indexes in xmm1 and store the result in xmm1 using writemask k1. | 
| VPERMI2PD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.DDS.128.66.0F38.W1 77 /r | avx512 | Permute double-precision FP values from two tables in ymm3/m256/m64bcst and ymm2 using indexes in ymm1 and store the result in ymm1 using writemask k1. | 
| VPERMI2PD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.DDS.512.66.0F38.W1 77 /r | avx512 | Permute double-precision FP values from two tables in zmm3/m512/m64bcst and zmm2 using indices in zmm1 and store the result in zmm1 using writemask k1. | 
| VPERMILPD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 0D /r | avx | Permute double-precision floating-point values in xmm2 using controls from xmm3/m128 and store result in xmm1. | 
| VPERMILPD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 0D /r | avx | Permute double-precision floating-point values in ymm2 using controls from ymm3/m256 and store result in ymm1. | 
| VPERMILPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 0D /r | avx512 | Permute double-precision floating-point values in xmm2 using control from xmm3/m128/m64bcst and store the result in xmm1 using writemask k1. | 
| VPERMILPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 0D /r | avx512 | Permute double-precision floating-point values in ymm2 using control from ymm3/m256/m64bcst and store the result in ymm1 using writemask k1. | 
| VPERMILPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 0D /r | avx512 | Permute double-precision floating-point values in zmm2 using control from zmm3/m512/m64bcst and store the result in zmm1 using writemask k1. | 
| VPERMILPD xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.W0 05 /r ib | avx | Permute double-precision floating-point values in xmm2/m128 using controls from imm8. | 
| VPERMILPD ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.W0 05 /r ib | avx | Permute double-precision floating-point values in ymm2/m256 using controls from imm8. | 
| VPERMILPD xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.128.66.0F3A.W1 05 /r ib | avx512 | Permute double-precision floating-point values in xmm2/m128/m64bcst using controls from imm8 and store the result in xmm1 using writemask k1. | 
| VPERMILPD ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 05 /r ib | avx512 | Permute double-precision floating-point values in ymm2/m256/m64bcst using controls from imm8 and store the result in ymm1 using writemask k1. | 
| VPERMILPD zmm1 {k1}{z}, zmm2/m512/m64bcst, imm8 | EVEX.512.66.0F3A.W1 05 /r ib | avx512 | Permute double-precision floating-point values in zmm2/m512/m64bcst using controls from imm8 and store the result in zmm1 using writemask k1. | 
| VPERMILPS xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 0C /r | avx | Permute single-precision floating-point values in xmm2 using controls from xmm3/m128 and store result in xmm1. | 
| VPERMILPS xmm1, xmm2/m128, imm8 | VEX.128.66.0F3A.W0 04 /r ib | avx | Permute single-precision floating-point values in xmm2/m128 using controls from imm8 and store result in xmm1. | 
| VPERMILPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 0C /r | avx | Permute single-precision floating-point values in ymm2 using controls from ymm3/m256 and store result in ymm1. | 
| VPERMILPS ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.W0 04 /r ib | avx | Permute single-precision floating-point values in ymm2/m256 using controls from imm8 and store result in ymm1. | 
| VPERMILPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 0C /r | avx512 | Permute single-precision floating-point values xmm2 using control from xmm3/m128/m32bcst and store the result in xmm1 using writemask k1. | 
| VPERMILPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 0C /r | avx512 | Permute single-precision floating-point values ymm2 using control from ymm3/m256/m32bcst and store the result in ymm1 using writemask k1. | 
| VPERMILPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 0C /r | avx512 | Permute single-precision floating-point values zmm2 using control from zmm3/m512/m32bcst and store the result in zmm1 using writemask k1. | 
| VPERMILPS xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F3A.W0 04 /r ib | avx512 | Permute single-precision floating-point values xmm2/m128/m32bcst using controls from imm8 and store the result in xmm1 using writemask k1. | 
| VPERMILPS ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F3A.W0 04 /r ib | avx512 | Permute single-precision floating-point values ymm2/m256/m32bcst using controls from imm8 and store the result in ymm1 using writemask k1. | 
| VPERMILPS zmm1 {k1}{z}, zmm2/m512/m32bcst, imm8 | EVEX.512.66.0F3A.W0 04 /r ib | avx512 | Permute single-precision floating-point values zmm2/m512/m32bcst using controls from imm8 and store the result in zmm1 using writemask k1. | 
| VPERMPD ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.W1 01 /r ib | avx2 | Permute double-precision floating-point elements in ymm2/m256 using indices in imm8 and store the result in ymm1. | 
| VPERMPD ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 01 /r ib | avx512 | Permute double-precision floating-point elements in ymm2/m256/m64bcst using indexes in imm8 and store the result in ymm1 subject to writemask k1. | 
| VPERMPD zmm1 {k1}{z}, zmm2/m512/m64bcst, imm8 | EVEX.512.66.0F3A.W1 01 /r ib | avx512 | Permute double-precision floating-point elements in zmm2/m512/m64bcst using indices in imm8 and store the result in zmm1 subject to writemask k1. | 
| VPERMPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 16 /r | avx512 | Permute double-precision floating-point elements in ymm3/m256/m64bcst using indexes in ymm2 and store the result in ymm1 subject to writemask k1. | 
| VPERMPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 16 /r | avx512 | Permute double-precision floating-point elements in zmm3/m512/m64bcst using indices in zmm2 and store the result in zmm1 subject to writemask k1. | 
| VPERMPS ymm1, ymm2, ymm3/m256 | VEX.256.66.0F38.W0 16 /r | avx2 | Permute single-precision floating-point elements in ymm3/m256 using indices in ymm2 and store the result in ymm1. | 
| VPERMPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 16 /r | avx512 | Permute single-precision floating-point elements in ymm3/m256/m32bcst using indexes in ymm2 and store the result in ymm1 subject to write mask k1. | 
| VPERMPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 16 /r | avx512 | Permute single-precision floating-point values in zmm3/m512/m32bcst using indices in zmm2 and store the result in zmm1 subject to write mask k1. | 
| VPERMQ ymm1, ymm2/m256, imm8 | VEX.256.66.0F3A.W1 00 /r ib | avx2 | Permute qwords in ymm2/m256 using indices in imm8 and store the result in ymm1. | 
| VPERMQ ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 00 /r ib | avx512 | Permute qwords in ymm2/m256/m64bcst using indexes in imm8 and store the result in ymm1. | 
| VPERMQ zmm1 {k1}{z}, zmm2/m512/m64bcst, imm8 | EVEX.512.66.0F3A.W1 00 /r ib | avx512 | Permute qwords in zmm2/m512/m64bcst using indices in imm8 and store the result in zmm1. | 
| VPERMQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 36 /r | avx512 | Permute qwords in ymm3/m256/m64bcst using indexes in ymm2 and store the result in ymm1. | 
| VPERMQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 36 /r | avx512 | Permute qwords in zmm3/m512/m64bcst using indices in zmm2 and store the result in zmm1. | 
| VPEXPANDD xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.W0 89 /r | avx512 | Expand packed double-word integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VPEXPANDD ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F38.W0 89 /r | avx512 | Expand packed double-word integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VPEXPANDD zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F38.W0 89 /r | avx512 | Expand packed double-word integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VPEXPANDQ xmm1 {k1}{z}, xmm2/m128 | EVEX.128.66.0F38.W1 89 /r | avx512 | Expand packed quad-word integer values from xmm2/m128 to xmm1 using writemask k1. | 
| VPEXPANDQ ymm1 {k1}{z}, ymm2/m256 | EVEX.256.66.0F38.W1 89 /r | avx512 | Expand packed quad-word integer values from ymm2/m256 to ymm1 using writemask k1. | 
| VPEXPANDQ zmm1 {k1}{z}, zmm2/m512 | EVEX.512.66.0F38.W1 89 /r | avx512 | Expand packed quad-word integer values from zmm2/m512 to zmm1 using writemask k1. | 
| VPGATHERDD xmm1 {k1}, vm32x | EVEX.128.66.0F38.W0 90 /vsib | avx512 | Using signed dword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDD ymm1 {k1}, vm32y | EVEX.256.66.0F38.W0 90 /vsib | avx512 | Using signed dword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDD zmm1 {k1}, vm32z | EVEX.512.66.0F38.W0 90 /vsib | avx512 | Using signed dword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDQ xmm1 {k1}, vm32x | EVEX.128.66.0F38.W1 90 /vsib | avx512 | Using signed dword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDQ ymm1 {k1}, vm32x | EVEX.256.66.0F38.W1 90 /vsib | avx512 | Using signed dword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDQ zmm1 {k1}, vm32y | EVEX.512.66.0F38.W1 90 /vsib | avx512 | Using signed dword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPGATHERDD xmm1, vm32x, xmm2 | VEX.DDS.128.66.0F38.W0 90 /r | avx2 | Using dword indices specified in vm32x, gather dword val-ues from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VPGATHERQD xmm1, vm64x, xmm2 | VEX.DDS.128.66.0F38.W0 91 /r | avx2 | Using qword indices specified in vm64x, gather dword val-ues from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VPGATHERDD ymm1, vm32y, ymm2 | VEX.DDS.256.66.0F38.W0 90 /r | avx2 | Using dword indices specified in vm32y, gather dword from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VPGATHERQD xmm1, vm64y, xmm2 | VEX.DDS.256.66.0F38.W0 91 /r | avx2 | Using qword indices specified in vm64y, gather dword val-ues from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VPGATHERDQ xmm1, vm32x, xmm2 | VEX.DDS.128.66.0F38.W1 90 /r | avx2 | Using dword indices specified in vm32x, gather qword val-ues from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VPGATHERQQ xmm1, vm64x, xmm2 | VEX.DDS.128.66.0F38.W1 91 /r | avx2 | Using qword indices specified in vm64x, gather qword val-ues from memory conditioned on mask specified by xmm2. Conditionally gathered elements are merged into xmm1. | 
| VPGATHERDQ ymm1, vm32x, ymm2 | VEX.DDS.256.66.0F38.W1 90 /r | avx2 | Using dword indices specified in vm32x, gather qword val-ues from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VPGATHERQQ ymm1, vm64y, ymm2 | VEX.DDS.256.66.0F38.W1 91 /r | avx2 | Using qword indices specified in vm64y, gather qword val-ues from memory conditioned on mask specified by ymm2. Conditionally gathered elements are merged into ymm1. | 
| VPGATHERQD xmm1 {k1}, vm64x | EVEX.128.66.0F38.W0 91 /vsib | avx512 | Using signed qword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERQD xmm1 {k1}, vm64y | EVEX.256.66.0F38.W0 91 /vsib | avx512 | Using signed qword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERQD ymm1 {k1}, vm64z | EVEX.512.66.0F38.W0 91 /vsib | avx512 | Using signed qword indices, gather dword values from memory using writemask k1 for merging-masking. | 
| VPGATHERQQ xmm1 {k1}, vm64x | EVEX.128.66.0F38.W1 91 /vsib | avx512 | Using signed qword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPGATHERQQ ymm1 {k1}, vm64y | EVEX.256.66.0F38.W1 91 /vsib | avx512 | Using signed qword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPGATHERQQ zmm1 {k1}, vm64z | EVEX.512.66.0F38.W1 91 /vsib | avx512 | Using signed qword indices, gather quadword values from memory using writemask k1 for merging-masking. | 
| VPLZCNTD xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F38.W0 44 /r | avx512 | Count the number of leading zero bits in each dword element of xmm2/m128/m32bcst using writemask k1. | 
| VPLZCNTD ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F38.W0 44 /r | avx512 | Count the number of leading zero bits in each dword element of ymm2/m256/m32bcst using writemask k1. | 
| VPLZCNTD zmm1 {k1}{z}, zmm2/m512/m32bcst | EVEX.512.66.0F38.W0 44 /r | avx512 | Count the number of leading zero bits in each dword element of zmm2/m512/m32bcst using writemask k1. | 
| VPLZCNTQ xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F38.W1 44 /r | avx512 | Count the number of leading zero bits in each qword element of xmm2/m128/m64bcst using writemask k1. | 
| VPLZCNTQ ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F38.W1 44 /r | avx512 | Count the number of leading zero bits in each qword element of ymm2/m256/m64bcst using writemask k1. | 
| VPLZCNTQ zmm1 {k1}{z}, zmm2/m512/m64bcst | EVEX.512.66.0F38.W1 44 /r | avx512 | Count the number of leading zero bits in each qword element of zmm2/m512/m64bcst using writemask k1. | 
| VPMASKMOVD xmm1, xmm2, m128 | VEX.NDS.128.66.0F38.W0 8C /r | avx2 | Conditionally load dword values from m128 using mask in xmm2 and store in xmm1. | 
| VPMASKMOVD ymm1, ymm2, m256 | VEX.NDS.256.66.0F38.W0 8C /r | avx2 | Conditionally load dword values from m256 using mask in ymm2 and store in ymm1. | 
| VPMASKMOVQ xmm1, xmm2, m128 | VEX.NDS.128.66.0F38.W1 8C /r | avx2 | Conditionally load qword values from m128 using mask in xmm2 and store in xmm1. | 
| VPMASKMOVQ ymm1, ymm2, m256 | VEX.NDS.256.66.0F38.W1 8C /r | avx2 | Conditionally load qword values from m256 using mask in ymm2 and store in ymm1. | 
| VPMASKMOVD m128, xmm1, xmm2 | VEX.NDS.128.66.0F38.W0 8E /r | avx2 | Conditionally store dword values from xmm2 using mask in xmm1. | 
| VPMASKMOVD m256, ymm1, ymm2 | VEX.NDS.256.66.0F38.W0 8E /r | avx2 | Conditionally store dword values from ymm2 using mask in ymm1. | 
| VPMASKMOVQ m128, xmm1, xmm2 | VEX.NDS.128.66.0F38.W1 8E /r | avx2 | Conditionally store qword values from xmm2 using mask in xmm1. | 
| VPMASKMOVQ m256, ymm1, ymm2 | VEX.NDS.256.66.0F38.W1 8E /r | avx2 | Conditionally store qword values from ymm2 using mask in ymm1. | 
| VPMOVB2M k1, xmm1 | EVEX.128.F3.0F38.W0 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding byte in XMM1. | 
| VPMOVB2M k1, ymm1 | EVEX.256.F3.0F38.W0 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding byte in YMM1. | 
| VPMOVB2M k1, zmm1 | EVEX.512.F3.0F38.W0 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding byte in ZMM1. | 
| VPMOVW2M k1, xmm1 | EVEX.128.F3.0F38.W1 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding word in XMM1. | 
| VPMOVW2M k1, ymm1 | EVEX.256.F3.0F38.W1 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding word in YMM1. | 
| VPMOVW2M k1, zmm1 | EVEX.512.F3.0F38.W1 29 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding word in ZMM1. | 
| VPMOVD2M k1, xmm1 | EVEX.128.F3.0F38.W0 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding doubleword in XMM1. | 
| VPMOVD2M k1, ymm1 | EVEX.256.F3.0F38.W0 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding doubleword in YMM1. | 
| VPMOVD2M k1, zmm1 | EVEX.512.F3.0F38.W0 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding doubleword in ZMM1. | 
| VPMOVQ2M k1, xmm1 | EVEX.128.F3.0F38.W1 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding quadword in XMM1. | 
| VPMOVQ2M k1, ymm1 | EVEX.256.F3.0F38.W1 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding quadword in YMM1. | 
| VPMOVQ2M k1, zmm1 | EVEX.512.F3.0F38.W1 39 /r | avx512 | Sets each bit in k1 to 1 or 0 based on the value of the most significant bit of the corresponding quadword in ZMM1. | 
| VPMOVDB xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 31 /r | avx512 | Converts 4 packed double-word integers from xmm2 into 4 packed byte integers in xmm1/m32 with truncation under writemask k1. | 
| VPMOVSDB xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 21 /r | avx512 | Converts 4 packed signed double-word integers from xmm2 into 4 packed signed byte integers in xmm1/m32 using signed saturation under writemask k1. | 
| VPMOVUSDB xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 11 /r | avx512 | Converts 4 packed unsigned double-word integers from xmm2 into 4 packed unsigned byte integers in xmm1/m32 using unsigned saturation under writemask k1. | 
| VPMOVDB xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 31 /r | avx512 | Converts 8 packed double-word integers from ymm2 into 8 packed byte integers in xmm1/m64 with truncation under writemask k1. | 
| VPMOVSDB xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 21 /r | avx512 | Converts 8 packed signed double-word integers from ymm2 into 8 packed signed byte integers in xmm1/m64 using signed saturation under writemask k1. | 
| VPMOVUSDB xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 11 /r | avx512 | Converts 8 packed unsigned double-word integers from ymm2 into 8 packed unsigned byte integers in xmm1/m64 using unsigned saturation under writemask k1. | 
| VPMOVDB xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 31 /r | avx512 | Converts 16 packed double-word integers from zmm2 into 16 packed byte integers in xmm1/m128 with truncation under writemask k1. | 
| VPMOVSDB xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 21 /r | avx512 | Converts 16 packed signed double-word integers from zmm2 into 16 packed signed byte integers in xmm1/m128 using signed saturation under writemask k1. | 
| VPMOVUSDB xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 11 /r | avx512 | Converts 16 packed unsigned double-word integers from zmm2 into 16 packed unsigned byte integers in xmm1/m128 using unsigned saturation under writemask k1. | 
| VPMOVDW xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 33 /r | avx512 | Converts 4 packed double-word integers from xmm2 into 4 packed word integers in xmm1/m64 with truncation under writemask k1. | 
| VPMOVSDW xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 23 /r | avx512 | Converts 4 packed signed double-word integers from xmm2 into 4 packed signed word integers in ymm1/m64 using signed saturation under writemask k1. | 
| VPMOVUSDW xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 13 /r | avx512 | Converts 4 packed unsigned double-word integers from xmm2 into 4 packed unsigned word integers in xmm1/m64 using unsigned saturation under writemask k1. | 
| VPMOVDW xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 33 /r | avx512 | Converts 8 packed double-word integers from ymm2 into 8 packed word integers in xmm1/m128 with truncation under writemask k1. | 
| VPMOVSDW xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 23 /r | avx512 | Converts 8 packed signed double-word integers from ymm2 into 8 packed signed word integers in xmm1/m128 using signed saturation under writemask k1. | 
| VPMOVUSDW xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 13 /r | avx512 | Converts 8 packed unsigned double-word integers from ymm2 into 8 packed unsigned word integers in xmm1/m128 using unsigned saturation under writemask k1. | 
| VPMOVDW ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 33 /r | avx512 | Converts 16 packed double-word integers from zmm2 into 16 packed word integers in ymm1/m256 with truncation under writemask k1. | 
| VPMOVSDW ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 23 /r | avx512 | Converts 16 packed signed double-word integers from zmm2 into 16 packed signed word integers in ymm1/m256 using signed saturation under writemask k1. | 
| VPMOVUSDW ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 13 /r | avx512 | Converts 16 packed unsigned double-word integers from zmm2 into 16 packed unsigned word integers in ymm1/m256 using unsigned saturation under writemask k1. | 
| VPMOVM2B xmm1, k1 | EVEX.128.F3.0F38.W0 28 /r | avx512 | Sets each byte in XMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2B ymm1, k1 | EVEX.256.F3.0F38.W0 28 /r | avx512 | Sets each byte in YMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2B zmm1, k1 | EVEX.512.F3.0F38.W0 28 /r | avx512 | Sets each byte in ZMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2W xmm1, k1 | EVEX.128.F3.0F38.W1 28 /r | avx512 | Sets each word in XMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2W ymm1, k1 | EVEX.256.F3.0F38.W1 28 /r | avx512 | Sets each word in YMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2W zmm1, k1 | EVEX.512.F3.0F38.W1 28 /r | avx512 | Sets each word in ZMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2D xmm1, k1 | EVEX.128.F3.0F38.W0 38 /r | avx512 | Sets each doubleword in XMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2D ymm1, k1 | EVEX.256.F3.0F38.W0 38 /r | avx512 | Sets each doubleword in YMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2D zmm1, k1 | EVEX.512.F3.0F38.W0 38 /r | avx512 | Sets each doubleword in ZMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2Q xmm1, k1 | EVEX.128.F3.0F38.W1 38 /r | avx512 | Sets each quadword in XMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2Q ymm1, k1 | EVEX.256.F3.0F38.W1 38 /r | avx512 | Sets each quadword in YMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVM2Q zmm1, k1 | EVEX.512.F3.0F38.W1 38 /r | avx512 | Sets each quadword in ZMM1 to all 1’s or all 0’s based on the value of the corresponding bit in k1. | 
| VPMOVQB xmm1/m16 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 32 /r | avx512 | Converts 2 packed quad-word integers from xmm2 into 2 packed byte integers in xmm1/m16 with truncation under writemask k1. | 
| VPMOVSQB xmm1/m16 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 22 /r | avx512 | Converts 2 packed signed quad-word integers from xmm2 into 2 packed signed byte integers in xmm1/m16 using signed saturation under writemask k1. | 
| VPMOVUSQB xmm1/m16 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 12 /r | avx512 | Converts 2 packed unsigned quad-word integers from xmm2 into 2 packed unsigned byte integers in xmm1/m16 using unsigned saturation under writemask k1. | 
| VPMOVQB xmm1/m32 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 32 /r | avx512 | Converts 4 packed quad-word integers from ymm2 into 4 packed byte integers in xmm1/m32 with truncation under writemask k1. | 
| VPMOVSQB xmm1/m32 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 22 /r | avx512 | Converts 4 packed signed quad-word integers from ymm2 into 4 packed signed byte integers in xmm1/m32 using signed saturation under writemask k1. | 
| VPMOVUSQB xmm1/m32 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 12 /r | avx512 | Converts 4 packed unsigned quad-word integers from ymm2 into 4 packed unsigned byte integers in xmm1/m32 using unsigned saturation under writemask k1. | 
| VPMOVQB xmm1/m64 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 32 /r | avx512 | Converts 8 packed quad-word integers from zmm2 into 8 packed byte integers in xmm1/m64 with truncation under writemask k1. | 
| VPMOVSQB xmm1/m64 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 22 /r | avx512 | Converts 8 packed signed quad-word integers from zmm2 into 8 packed signed byte integers in xmm1/m64 using signed saturation under writemask k1. | 
| VPMOVUSQB xmm1/m64 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 12 /r | avx512 | Converts 8 packed unsigned quad-word integers from zmm2 into 8 packed unsigned byte integers in xmm1/m64 using unsigned saturation under writemask k1. | 
| VPMOVQD xmm1/m128 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 35 /r | avx512 | Converts 2 packed quad-word integers from xmm2 into 2 packed double-word integers in xmm1/m128 with truncation subject to writemask k1. | 
| VPMOVSQD xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 25 /r | avx512 | Converts 2 packed signed quad-word integers from xmm2 into 2 packed signed double-word integers in xmm1/m64 using signed saturation subject to writemask k1. | 
| VPMOVUSQD xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 15 /r | avx512 | Converts 2 packed unsigned quad-word integers from xmm2 into 2 packed unsigned double-word integers in xmm1/m64 using unsigned saturation subject to writemask k1. | 
| VPMOVQD xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 35 /r | avx512 | Converts 4 packed quad-word integers from ymm2 into 4 packed double-word integers in xmm1/m128 with truncation subject to writemask k1. | 
| VPMOVSQD xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 25 /r | avx512 | Converts 4 packed signed quad-word integers from ymm2 into 4 packed signed double-word integers in xmm1/m128 using signed saturation subject to writemask k1. | 
| VPMOVUSQD xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 15 /r | avx512 | Converts 4 packed unsigned quad-word integers from ymm2 into 4 packed unsigned double-word integers in xmm1/m128 using unsigned saturation subject to writemask k1. | 
| VPMOVQD ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 35 /r | avx512 | Converts 8 packed quad-word integers from zmm2 into 8 packed double-word integers in ymm1/m256 with truncation subject to writemask k1. | 
| VPMOVSQD ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 25 /r | avx512 | Converts 8 packed signed quad-word integers from zmm2 into 8 packed signed double-word integers in ymm1/m256 using signed saturation subject to writemask k1. | 
| VPMOVUSQD ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 15 /r | avx512 | Converts 8 packed unsigned quad-word integers from zmm2 into 8 packed unsigned double-word integers in ymm1/m256 using unsigned saturation subject to writemask k1. | 
| VPMOVQW xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 34 /r | avx512 | Converts 2 packed quad-word integers from xmm2 into 2 packed word integers in xmm1/m32 with truncation under writemask k1. | 
| VPMOVSQW xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 24 /r | avx512 | Converts 8 packed signed quad-word integers from zmm2 into 8 packed signed word integers in xmm1/m32 using signed saturation under writemask k1. | 
| VPMOVUSQW xmm1/m32 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 14 /r | avx512 | Converts 2 packed unsigned quad-word integers from xmm2 into 2 packed unsigned word integers in xmm1/m32 using unsigned saturation under writemask k1. | 
| VPMOVQW xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 34 /r | avx512 | Converts 4 packed quad-word integers from ymm2 into 4 packed word integers in xmm1/m64 with truncation under writemask k1. | 
| VPMOVSQW xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 24 /r | avx512 | Converts 4 packed signed quad-word integers from ymm2 into 4 packed signed word integers in xmm1/m64 using signed saturation under writemask k1. | 
| VPMOVUSQW xmm1/m64 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 14 /r | avx512 | Converts 4 packed unsigned quad-word integers from ymm2 into 4 packed unsigned word integers in xmm1/m64 using unsigned saturation under writemask k1. | 
| VPMOVQW xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 34 /r | avx512 | Converts 8 packed quad-word integers from zmm2 into 8 packed word integers in xmm1/m128 with truncation under writemask k1. | 
| VPMOVSQW xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 24 /r | avx512 | Converts 8 packed signed quad-word integers from zmm2 into 8 packed signed word integers in xmm1/m128 using signed saturation under writemask k1. | 
| VPMOVUSQW xmm1/m128 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 14 /r | avx512 | Converts 8 packed unsigned quad-word integers from zmm2 into 8 packed unsigned word integers in xmm1/m128 using unsigned saturation under writemask k1. | 
| VPMOVWB xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 30 /r | avx512 | Converts 8 packed word integers from xmm2 into 8 packed bytes in xmm1/m64 with truncation under writemask k1. | 
| VPMOVSWB xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 20 /r | avx512 | Converts 8 packed signed word integers from xmm2 into 8 packed signed bytes in xmm1/m64 using signed saturation under writemask k1. | 
| VPMOVUSWB xmm1/m64 {k1}{z}, xmm2 | EVEX.128.F3.0F38.W0 10 /r | avx512 | Converts 8 packed unsigned word integers from xmm2 into 8 packed unsigned bytes in 8mm1/m64 using unsigned saturation under writemask k1. | 
| VPMOVWB xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 30 /r | avx512 | Converts 16 packed word integers from ymm2 into 16 packed bytes in xmm1/m128 with truncation under writemask k1. | 
| VPMOVSWB xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 20 /r | avx512 | Converts 16 packed signed word integers from ymm2 into 16 packed signed bytes in xmm1/m128 using signed saturation under writemask k1. | 
| VPMOVUSWB xmm1/m128 {k1}{z}, ymm2 | EVEX.256.F3.0F38.W0 10 /r | avx512 | Converts 16 packed unsigned word integers from ymm2 into 16 packed unsigned bytes in xmm1/m128 using unsigned saturation under writemask k1. | 
| VPMOVWB ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 30 /r | avx512 | Converts 32 packed word integers from zmm2 into 32 packed bytes in ymm1/m256 with truncation under writemask k1. | 
| VPMOVSWB ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 20 /r | avx512 | Converts 32 packed signed word integers from zmm2 into 32 packed signed bytes in ymm1/m256 using signed saturation under writemask k1. | 
| VPMOVUSWB ymm1/m256 {k1}{z}, zmm2 | EVEX.512.F3.0F38.W0 10 /r | avx512 | Converts 32 packed unsigned word integers from zmm2 into 32 packed unsigned bytes in ymm1/m256 using unsigned saturation under writemask k1. | 
| VPSCATTERDD vm32x {k1}, xmm1 | EVEX.128.66.0F38.W0 A0 /vsib | avx512 | Using signed dword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERDD vm32y {k1}, ymm1 | EVEX.256.66.0F38.W0 A0 /vsib | avx512 | Using signed dword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERDD vm32z {k1}, zmm1 | EVEX.512.66.0F38.W0 A0 /vsib | avx512 | Using signed dword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERDQ vm32x {k1}, xmm1 | EVEX.128.66.0F38.W1 A0 /vsib | avx512 | Using signed dword indices, scatter qword values to memory using writemask k1. | 
| VPSCATTERDQ vm32x {k1}, ymm1 | EVEX.256.66.0F38.W1 A0 /vsib | avx512 | Using signed dword indices, scatter qword values to memory using writemask k1. | 
| VPSCATTERDQ vm32y {k1}, zmm1 | EVEX.512.66.0F38.W1 A0 /vsib | avx512 | Using signed dword indices, scatter qword values to memory using writemask k1. | 
| VPSCATTERQD vm64x {k1}, xmm1 | EVEX.128.66.0F38.W0 A1 /vsib | avx512 | Using signed qword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERQD vm64y {k1}, xmm1 | EVEX.256.66.0F38.W0 A1 /vsib | avx512 | Using signed qword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERQD vm64z {k1}, ymm1 | EVEX.512.66.0F38.W0 A1 /vsib | avx512 | Using signed qword indices, scatter dword values to memory using writemask k1. | 
| VPSCATTERQQ vm64x {k1}, xmm1 | EVEX.128.66.0F38.W1 A1 /vsib | avx512 | Using signed qword indices, scatter qword values to memory using writemask k1. | 
| VPSCATTERQQ vm64y {k1}, ymm1 | EVEX.256.66.0F38.W1 A1 /vsib | avx512 | Using signed qword indices, scatter qword values to memory using writemask k1. | 
| VPSCATTERQQ vm64z {k1}, zmm1 | EVEX.512.66.0F38.W1 A1 /vsib | avx512 | Using signed qword indices, scatter qword values to memory using writemask k1. | 
| VPSLLVD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 47 /r | avx2 | Shift doublewords in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s. | 
| VPSLLVQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 47 /r | avx2 | Shift quadwords in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s. | 
| VPSLLVD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 47 /r | avx2 | Shift doublewords in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s. | 
| VPSLLVQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 47 /r | avx2 | Shift quadwords in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s. | 
| VPSLLVW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 12 /r | avx512 | Shift words in xmm2 left by amount specified in the corresponding element of xmm3/m128 while shifting in 0s using writemask k1. | 
| VPSLLVW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 12 /r | avx512 | Shift words in ymm2 left by amount specified in the corresponding element of ymm3/m256 while shifting in 0s using writemask k1. | 
| VPSLLVW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 12 /r | avx512 | Shift words in zmm2 left by amount specified in the corresponding element of zmm3/m512 while shifting in 0s using writemask k1. | 
| VPSLLVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 47 /r | avx512 | Shift doublewords in xmm2 left by amount specified in the corresponding element of xmm3/m128/m32bcst while shifting in 0s using writemask k1. | 
| VPSLLVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 47 /r | avx512 | Shift doublewords in ymm2 left by amount specified in the corresponding element of ymm3/m256/m32bcst while shifting in 0s using writemask k1. | 
| VPSLLVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 47 /r | avx512 | Shift doublewords in zmm2 left by amount specified in the corresponding element of zmm3/m512/m32bcst while shifting in 0s using writemask k1. | 
| VPSLLVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 47 /r | avx512 | Shift quadwords in xmm2 left by amount specified in the corresponding element of xmm3/m128/m64bcst while shifting in 0s using writemask k1. | 
| VPSLLVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 47 /r | avx512 | Shift quadwords in ymm2 left by amount specified in the corresponding element of ymm3/m256/m64bcst while shifting in 0s using writemask k1. | 
| VPSLLVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 47 /r | avx512 | Shift quadwords in zmm2 left by amount specified in the corresponding element of zmm3/m512/m64bcst while shifting in 0s using writemask k1. | 
| VPSRAVD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 46 /r | avx2 | Shift doublewords in xmm2 right by amount specified in the corresponding element of xmm3/m128 while shifting in sign bits. | 
| VPSRAVD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 46 /r | avx2 | Shift doublewords in ymm2 right by amount specified in the corresponding element of ymm3/m256 while shifting in sign bits. | 
| VPSRAVW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 11 /r | avx512 | Shift words in xmm2 right by amount specified in the corresponding element of xmm3/m128 while shifting in sign bits using writemask k1. | 
| VPSRAVW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 11 /r | avx512 | Shift words in ymm2 right by amount specified in the corresponding element of ymm3/m256 while shifting in sign bits using writemask k1. | 
| VPSRAVW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 11 /r | avx512 | Shift words in zmm2 right by amount specified in the corresponding element of zmm3/m512 while shifting in sign bits using writemask k1. | 
| VPSRAVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 46 /r | avx512 | Shift doublewords in xmm2 right by amount specified in the corresponding element of xmm3/m128/m32bcst while shifting in sign bits using writemask k1. | 
| VPSRAVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 46 /r | avx512 | Shift doublewords in ymm2 right by amount specified in the corresponding element of ymm3/m256/m32bcst while shifting in sign bits using writemask k1. | 
| VPSRAVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 46 /r | avx512 | Shift doublewords in zmm2 right by amount specified in the corresponding element of zmm3/m512/m32bcst while shifting in sign bits using writemask k1. | 
| VPSRAVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 46 /r | avx512 | Shift quadwords in xmm2 right by amount specified in the corresponding element of xmm3/m128/m64bcst while shifting in sign bits using writemask k1. | 
| VPSRAVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 46 /r | avx512 | Shift quadwords in ymm2 right by amount specified in the corresponding element of ymm3/m256/m64bcst while shifting in sign bits using writemask k1. | 
| VPSRAVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 46 /r | avx512 | Shift quadwords in zmm2 right by amount specified in the corresponding element of zmm3/m512/m64bcst while shifting in sign bits using writemask k1. | 
| VPSRLVD xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W0 45 /r | avx2 | Shift doublewords in xmm2 right by amount specified in the corresponding element of xmm3/m128 while shifting in 0s. | 
| VPSRLVQ xmm1, xmm2, xmm3/m128 | VEX.NDS.128.66.0F38.W1 45 /r | avx2 | Shift quadwords in xmm2 right by amount specified in the corresponding element of xmm3/m128 while shifting in 0s. | 
| VPSRLVD ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W0 45 /r | avx2 | Shift doublewords in ymm2 right by amount specified in the corresponding element of ymm3/m256 while shifting in 0s. | 
| VPSRLVQ ymm1, ymm2, ymm3/m256 | VEX.NDS.256.66.0F38.W1 45 /r | avx2 | Shift quadwords in ymm2 right by amount specified in the corresponding element of ymm3/m256 while shifting in 0s. | 
| VPSRLVW xmm1 {k1}{z}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 10 /r | avx512 | Shift words in xmm2 right by amount specified in the corresponding element of xmm3/m128 while shifting in 0s using writemask k1. | 
| VPSRLVW ymm1 {k1}{z}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 10 /r | avx512 | Shift words in ymm2 right by amount specified in the corresponding element of ymm3/m256 while shifting in 0s using writemask k1. | 
| VPSRLVW zmm1 {k1}{z}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 10 /r | avx512 | Shift words in zmm2 right by amount specified in the corresponding element of zmm3/m512 while shifting in 0s using writemask k1. | 
| VPSRLVD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 45 /r | avx512 | Shift doublewords in xmm2 right by amount specified in the corresponding element of xmm3/m128/m32bcst while shifting in 0s using writemask k1. | 
| VPSRLVD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 45 /r | avx512 | Shift doublewords in ymm2 right by amount specified in the corresponding element of ymm3/m256/m32bcst while shifting in 0s using writemask k1. | 
| VPSRLVD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 45 /r | avx512 | Shift doublewords in zmm2 right by amount specified in the corresponding element of zmm3/m512/m32bcst while shifting in 0s using writemask k1. | 
| VPSRLVQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 45 /r | avx512 | Shift quadwords in xmm2 right by amount specified in the corresponding element of xmm3/m128/m64bcst while shifting in 0s using writemask k1. | 
| VPSRLVQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 45 /r | avx512 | Shift quadwords in ymm2 right by amount specified in the corresponding element of ymm3/m256/m64bcst while shifting in 0s using writemask k1. | 
| VPSRLVQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 45 /r | avx512 | Shift quadwords in zmm2 right by amount specified in the corresponding element of zmm3/m512/m64bcst while shifting in 0s using writemask k1. | 
| VPTERNLOGD xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.DDS.128.66.0F3A.W0 25 /r ib | avx512 | Bitwise ternary logic taking xmm1, xmm2 and xmm3/m128/m32bcst as source operands and writing the result to xmm1 under writemask k1 with dword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTERNLOGD ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.DDS.256.66.0F3A.W0 25 /r ib | avx512 | Bitwise ternary logic taking ymm1, ymm2 and ymm3/m256/m32bcst as source operands and writing the result to ymm1 under writemask k1 with dword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTERNLOGD zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.DDS.512.66.0F3A.W0 25 /r ib | avx512 | Bitwise ternary logic taking zmm1, zmm2 and zmm3/m512/m32bcst as source operands and writing the result to zmm1 under writemask k1 with dword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTERNLOGQ xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.DDS.128.66.0F3A.W1 25 /r ib | avx512 | Bitwise ternary logic taking xmm1, xmm2 and xmm3/m128/m64bcst as source operands and writing the result to xmm1 under writemask k1 with qword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTERNLOGQ ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.DDS.256.66.0F3A.W1 25 /r ib | avx512 | Bitwise ternary logic taking ymm1, ymm2 and ymm3/m256/m64bcst as source operands and writing the result to ymm1 under writemask k1 with qword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTERNLOGQ zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.DDS.512.66.0F3A.W1 25 /r ib | avx512 | Bitwise ternary logic taking zmm1, zmm2 and zmm3/m512/m64bcst as source operands and writing the result to zmm1 under writemask k1 with qword granularity. The immediate value determines the specific binary function being implemented. | 
| VPTESTMB k2 {k1}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W0 26 /r | avx512 | Bitwise AND of packed byte integers in xmm2 and xmm3/m128 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMB k2 {k1}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W0 26 /r | avx512 | Bitwise AND of packed byte integers in ymm2 and ymm3/m256 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMB k2 {k1}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W0 26 /r | avx512 | Bitwise AND of packed byte integers in zmm2 and zmm3/m512 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMW k2 {k1}, xmm2, xmm3/m128 | EVEX.NDS.128.66.0F38.W1 26 /r | avx512 | Bitwise AND of packed word integers in xmm2 and xmm3/m128 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMW k2 {k1}, ymm2, ymm3/m256 | EVEX.NDS.256.66.0F38.W1 26 /r | avx512 | Bitwise AND of packed word integers in ymm2 and ymm3/m256 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMW k2 {k1}, zmm2, zmm3/m512 | EVEX.NDS.512.66.0F38.W1 26 /r | avx512 | Bitwise AND of packed word integers in zmm2 and zmm3/m512 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMD k2 {k1}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 27 /r | avx512 | Bitwise AND of packed doubleword integers in xmm2 and xmm3/m128/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMD k2 {k1}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 27 /r | avx512 | Bitwise AND of packed doubleword integers in ymm2 and ymm3/m256/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMD k2 {k1}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.66.0F38.W0 27 /r | avx512 | Bitwise AND of packed doubleword integers in zmm2 and zmm3/m512/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMQ k2 {k1}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 27 /r | avx512 | Bitwise AND of packed quadword integers in xmm2 and xmm3/m128/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMQ k2 {k1}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 27 /r | avx512 | Bitwise AND of packed quadword integers in ymm2 and ymm3/m256/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTMQ k2 {k1}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.66.0F38.W1 27 /r | avx512 | Bitwise AND of packed quadword integers in zmm2 and zmm3/m512/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMB k2 {k1}, xmm2, xmm3/m128 | EVEX.NDS.128.F3.0F38.W0 26 /r | avx512 | Bitwise NAND of packed byte integers in xmm2 and xmm3/m128 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMB k2 {k1}, ymm2, ymm3/m256 | EVEX.NDS.256.F3.0F38.W0 26 /r | avx512 | Bitwise NAND of packed byte integers in ymm2 and ymm3/m256 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMB k2 {k1}, zmm2, zmm3/m512 | EVEX.NDS.512.F3.0F38.W0 26 /r | avx512 | Bitwise NAND of packed byte integers in zmm2 and zmm3/m512 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMW k2 {k1}, xmm2, xmm3/m128 | EVEX.NDS.128.F3.0F38.W1 26 /r | avx512 | Bitwise NAND of packed word integers in xmm2 and xmm3/m128 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMW k2 {k1}, ymm2, ymm3/m256 | EVEX.NDS.256.F3.0F38.W1 26 /r | avx512 | Bitwise NAND of packed word integers in ymm2 and ymm3/m256 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMW k2 {k1}, zmm2, zmm3/m512 | EVEX.NDS.512.F3.0F38.W1 26 /r | avx512 | Bitwise NAND of packed word integers in zmm2 and zmm3/m512 and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMD k2 {k1}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.F3.0F38.W0 27 /r | avx512 | Bitwise NAND of packed doubleword integers in xmm2 and xmm3/m128/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMD k2 {k1}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.F3.0F38.W0 27 /r | avx512 | Bitwise NAND of packed doubleword integers in ymm2 and ymm3/m256/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMD k2 {k1}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.F3.0F38.W0 27 /r | avx512 | Bitwise NAND of packed doubleword integers in zmm2 and zmm3/m512/m32bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMQ k2 {k1}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.F3.0F38.W1 27 /r | avx512 | Bitwise NAND of packed quadword integers in xmm2 and xmm3/m128/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMQ k2 {k1}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.F3.0F38.W1 27 /r | avx512 | Bitwise NAND of packed quadword integers in ymm2 and ymm3/m256/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VPTESTNMQ k2 {k1}, zmm2, zmm3/m512/m64bcst | EVEX.NDS.512.F3.0F38.W1 27 /r | avx512 | Bitwise NAND of packed quadword integers in zmm2 and zmm3/m512/m64bcst and set mask k2 to reflect the zero/non-zero status of each element of the result, under writemask k1. | 
| VRANGEPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst, imm8 | EVEX.NDS.128.66.0F3A.W1 50 /r ib | avx512 | Calculate two RANGE operation output value from 2 pairs of double-precision floating-point values in xmm2 and xmm3/m128/m32bcst, store the results to xmm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGEPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 50 /r ib | avx512 | Calculate four RANGE operation output value from 4pairs of double-precision floating-point values in ymm2 and ymm3/m256/m32bcst, store the results to ymm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGEPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae}, imm8 | EVEX.NDS.512.66.0F3A.W1 50 /r ib | avx512 | Calculate eight RANGE operation output value from 8 pairs of double-precision floating-point values in zmm2 and zmm3/m512/m32bcst, store the results to zmm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGEPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst, imm8 | EVEX.NDS.128.66.0F3A.W0 50 /r ib | avx512 | Calculate four RANGE operation output value from 4 pairs of single-precision floating-point values in xmm2 and xmm3/m128/m32bcst, store the results to xmm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGEPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 50 /r ib | avx512 | Calculate eight RANGE operation output value from 8 pairs of single-precision floating-point values in ymm2 and ymm3/m256/m32bcst, store the results to ymm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGEPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{sae}, imm8 | EVEX.NDS.512.66.0F3A.W0 50 /r ib | avx512 | Calculate 16 RANGE operation output value from 16 pairs of single-precision floating-point values in zmm2 and zmm3/m512/m32bcst, store the results to zmm1 under the writemask k1. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGESD xmm1 {k1}{z}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W1 51 /r | avx512 | Calculate a RANGE operation output value from 2 double-precision floating-point values in xmm2 and xmm3/m64, store the output to xmm1 under writemask. Imm8 specifies the comparison and sign of the range operation. | 
| VRANGESS xmm1 {k1}{z}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W0 51 /r | avx512 | Calculate a RANGE operation output value from 2 single-precision floating-point values in xmm2 and xmm3/m32, store the output to xmm1 under writemask. Imm8 specifies the comparison and sign of the range operation. | 
| VRCP14PD xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F38.W1 4C /r | avx512 | Computes the approximate reciprocals of the packed double-precision floating-point values in xmm2/m128/m64bcst and stores the results in xmm1. Under writemask. | 
| VRCP14PD ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F38.W1 4C /r | avx512 | Computes the approximate reciprocals of the packed double-precision floating-point values in ymm2/m256/m64bcst and stores the results in ymm1. Under writemask. | 
| VRCP14PD zmm1 {k1}{z}, zmm2/m512/m64bcst | EVEX.512.66.0F38.W1 4C /r | avx512 | Computes the approximate reciprocals of the packed double-precision floating-point values in zmm2/m512/m64bcst and stores the results in zmm1. Under writemask. | 
| VRCP14PS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F38.W0 4C /r | avx512 | Computes the approximate reciprocals of the packed single-precision floating-point values in xmm2/m128/m32bcst and stores the results in xmm1. Under writemask. | 
| VRCP14PS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F38.W0 4C /r | avx512 | Computes the approximate reciprocals of the packed single-precision floating-point values in ymm2/m256/m32bcst and stores the results in ymm1. Under writemask. | 
| VRCP14PS zmm1 {k1}{z}, zmm2/m512/m32bcst | EVEX.512.66.0F38.W0 4C /r | avx512 | Computes the approximate reciprocals of the packed single-precision floating-point values in zmm2/m512/m32bcst and stores the results in zmm1. Under writemask. | 
| VRCP14SD xmm1 {k1}{z}, xmm2, xmm3/m64 | EVEX.NDS.LIG.66.0F38.W1 4D /r | avx512 | Computes the approximate reciprocal of the scalar double-precision floating-point value in xmm3/m64 and stores the result in xmm1 using writemask k1. Also, upper double-precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| VRCP14SS xmm1 {k1}{z}, xmm2, xmm3/m32 | EVEX.NDS.LIG.66.0F38.W0 4D /r | avx512 | Computes the approximate reciprocal of the scalar single-precision floating-point value in xmm3/m32 and stores the results in xmm1 using writemask k1. Also, upper double-precision floating-point value (bits[127:32]) from xmm2 is copied to xmm1[127:32]. | 
| VRCP28PD zmm1 {k1}{z}, zmm2/m512/m64bcst {sae} | EVEX.512.66.0F38.W1 CA /r | avx512 | Computes the approximate reciprocals ( < 2^-28 relative error) of the packed double-precision floating-point values in zmm2/m512/m64bcst and stores the results in zmm1. Under writemask. | 
| VRCP28PS zmm1 {k1}{z}, zmm2/m512/m32bcst {sae} | EVEX.512.66.0F38.W0 CA /r | avx512 | Computes the approximate reciprocals ( < 2^-28 relative error) of the packed single-precision floating-point values in zmm2/m512/m32bcst and stores the results in zmm1. Under writemask. | 
| VRCP28SD xmm1 {k1}{z}, xmm2, xmm3/m64 {sae} | EVEX.NDS.LIG.66.0F38.W1 CB /r | avx512 | Computes the approximate reciprocal ( < 2^-28 relative error) of the scalar double-precision floating-point value in xmm3/m64 and stores the results in xmm1. Under writemask. Also, upper double-precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| VRCP28SS xmm1 {k1}{z}, xmm2, xmm3/m32 {sae} | EVEX.NDS.LIG.66.0F38.W0 CB /r | avx512 | Computes the approximate reciprocal ( < 2^-28 relative error) of the scalar single-precision floating-point value in xmm3/m32 and stores the results in xmm1. Under writemask. Also, upper 3 single-precision floating-point values (bits[127:32]) from xmm2 is copied to xmm1[127:32]. | 
| VREDUCEPD xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.128.66.0F3A.W1 56 /r ib | avx512 | Perform reduction transformation on packed double-precision floating point values in xmm2/m128/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register under writemask k1. | 
| VREDUCEPD ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 56 /r ib | avx512 | Perform reduction transformation on packed double-precision floating point values in ymm2/m256/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register under writemask k1. | 
| VREDUCEPD zmm1 {k1}{z}, zmm2/m512/m64bcst{sae}, imm8 | EVEX.512.66.0F3A.W1 56 /r ib | avx512 | Perform reduction transformation on double-precision floating point values in zmm2/m512/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register under writemask k1. | 
| VREDUCEPS xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F3A.W0 56 /r ib | avx512 | Perform reduction transformation on packed single-precision floating point values in xmm2/m128/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register under writemask k1. | 
| VREDUCEPS ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F3A.W0 56 /r ib | avx512 | Perform reduction transformation on packed single-precision floating point values in ymm2/m256/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register under writemask k1. | 
| VREDUCEPS zmm1 {k1}{z}, zmm2/m512/m32bcst{sae}, imm8 | EVEX.512.66.0F3A.W0 56 /r ib | avx512 | Perform reduction transformation on packed single-precision floating point values in zmm2/m512/m32bcst by subtracting a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register under writemask k1. | 
| VREDUCESD xmm1 {k1}{z}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W1 57 /r | avx512 | Perform a reduction transformation on a scalar double-precision floating point value in xmm3/m64 by subtracting a number of fraction bits specified by the imm8 field. Also, upper double precision floating-point value (bits[127:64]) from xmm2 are copied to xmm1[127:64]. Stores the result in xmm1 register. | 
| VREDUCESS xmm1 {k1}{z}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W0 57 /r /ib | avx512 | Perform a reduction transformation on a scalar single-precision floating point value in xmm3/m32 by subtracting a number of fraction bits specified by the imm8 field. Also, upper single precision floating-point values (bits[127:32]) from xmm2 are copied to xmm1[127:32]. Stores the result in xmm1 register. | 
| VRNDSCALEPD xmm1 {k1}{z}, xmm2/m128/m64bcst, imm8 | EVEX.128.66.0F3A.W1 09 /r ib | avx512 | Rounds packed double-precision floating point values in xmm2/m128/m64bcst to a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register. Under writemask. | 
| VRNDSCALEPD ymm1 {k1}{z}, ymm2/m256/m64bcst, imm8 | EVEX.256.66.0F3A.W1 09 /r ib | avx512 | Rounds packed double-precision floating point values in ymm2/m256/m64bcst to a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register. Under writemask. | 
| VRNDSCALEPD zmm1 {k1}{z}, zmm2/m512/m64bcst{sae}, imm8 | EVEX.512.66.0F3A.W1 09 /r ib | avx512 | Rounds packed double-precision floating-point values in zmm2/m512/m64bcst to a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register using writemask k1. | 
| VRNDSCALEPS xmm1 {k1}{z}, xmm2/m128/m32bcst, imm8 | EVEX.128.66.0F3A.W0 08 /r ib | avx512 | Rounds packed single-precision floating point values in xmm2/m128/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register. Under writemask. | 
| VRNDSCALEPS ymm1 {k1}{z}, ymm2/m256/m32bcst, imm8 | EVEX.256.66.0F3A.W0 08 /r ib | avx512 | Rounds packed single-precision floating point values in ymm2/m256/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in ymm1 register. Under writemask. | 
| VRNDSCALEPS zmm1 {k1}{z}, zmm2/m512/m32bcst{sae}, imm8 | EVEX.512.66.0F3A.W0 08 /r ib | avx512 | Rounds packed single-precision floating-point values in zmm2/m512/m32bcst to a number of fraction bits specified by the imm8 field. Stores the result in zmm1 register using writemask. | 
| VRNDSCALESD xmm1 {k1}{z}, xmm2, xmm3/m64{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W1 0B /r ib | avx512 | Rounds scalar double-precision floating-point value in xmm3/m64 to a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register. | 
| VRNDSCALESS xmm1 {k1}{z}, xmm2, xmm3/m32{sae}, imm8 | EVEX.NDS.LIG.66.0F3A.W0 0A /r ib | avx512 | Rounds scalar single-precision floating-point value in xmm3/m32 to a number of fraction bits specified by the imm8 field. Stores the result in xmm1 register under writemask. | 
| VRSQRT14PD xmm1 {k1}{z}, xmm2/m128/m64bcst | EVEX.128.66.0F38.W1 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed double-precision floating-point values in xmm2/m128/m64bcst and stores the results in xmm1. Under writemask. | 
| VRSQRT14PD ymm1 {k1}{z}, ymm2/m256/m64bcst | EVEX.256.66.0F38.W1 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed double-precision floating-point values in ymm2/m256/m64bcst and stores the results in ymm1. Under writemask. | 
| VRSQRT14PD zmm1 {k1}{z}, zmm2/m512/m64bcst | EVEX.512.66.0F38.W1 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed double-precision floating-point values in zmm2/m512/m64bcst and stores the results in zmm1 under writemask. | 
| VRSQRT14PS xmm1 {k1}{z}, xmm2/m128/m32bcst | EVEX.128.66.0F38.W0 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed single-precision floating-point values in xmm2/m128/m32bcst and stores the results in xmm1. Under writemask. | 
| VRSQRT14PS ymm1 {k1}{z}, ymm2/m256/m32bcst | EVEX.256.66.0F38.W0 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed single-precision floating-point values in ymm2/m256/m32bcst and stores the results in ymm1. Under writemask. | 
| VRSQRT14PS zmm1 {k1}{z}, zmm2/m512/m32bcst | EVEX.512.66.0F38.W0 4E /r | avx512 | Computes the approximate reciprocal square roots of the packed single-precision floating-point values in zmm2/m512/m32bcst and stores the results in zmm1. Under writemask. | 
| VRSQRT14SD xmm1 {k1}{z}, xmm2, xmm3/m64 | EVEX.NDS.LIG.66.0F38.W1 4F /r | avx512 | Computes the approximate reciprocal square root of the scalar double-precision floating-point value in xmm3/m64 and stores the result in the low quadword element of xmm1 using writemask k1. Bits[127:64] of xmm2 is copied to xmm1[127:64]. | 
| VRSQRT14SS xmm1 {k1}{z}, xmm2, xmm3/m32 | EVEX.NDS.LIG.66.0F38.W0 4F /r | avx512 | Computes the approximate reciprocal square root of the scalar single-precision floating-point value in xmm3/m32 and stores the result in the low doubleword element of xmm1 using writemask k1. Bits[127:32] of xmm2 is copied to xmm1[127:32]. | 
| VRSQRT28PD zmm1 {k1}{z}, zmm2/m512/m64bcst {sae} | EVEX.512.66.0F38.W1 CC /r | avx512 | Computes approximations to the Reciprocal square root (<2^-28 relative error) of the packed double-precision floating-point values from zmm2/m512/m64bcst and stores result in zmm1with writemask k1. | 
| VRSQRT28PS zmm1 {k1}{z}, zmm2/m512/m32bcst {sae} | EVEX.512.66.0F38.W0 CC /r | avx512 | Computes approximations to the Reciprocal square root (<2^-28 relative error) of the packed single-precision floating-point values from zmm2/m512/m32bcst and stores result in zmm1with writemask k1. | 
| VRSQRT28SD xmm1 {k1}{z}, xmm2, xmm3/m64 {sae} | EVEX.NDS.LIG.66.0F38.W1 CD /r | avx512 | Computes approximate reciprocal square root (<2^-28 relative error) of the scalar double-precision floating-point value from xmm3/m64 and stores result in xmm1with writemask k1. Also, upper double-precision floating-point value (bits[127:64]) from xmm2 is copied to xmm1[127:64]. | 
| VRSQRT28SS xmm1 {k1}{z}, xmm2, xmm3/m32 {sae} | EVEX.NDS.LIG.66.0F38.W0 CD /r | avx512 | Computes approximate reciprocal square root (<2^-28 relative error) of the scalar single-precision floating-point value from xmm3/m32 and stores result in xmm1with writemask k1. Also, upper 3 single-precision floating-point value (bits[127:32]) from xmm2 is copied to xmm1[127:32]. | 
| VSCALEFPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.NDS.128.66.0F38.W1 2C /r | avx512 | Scale the packed double-precision floating-point values in xmm2 using values from xmm3/m128/m64bcst. Under writemask k1. | 
| VSCALEFPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.NDS.256.66.0F38.W1 2C /r | avx512 | Scale the packed double-precision floating-point values in ymm2 using values from ymm3/m256/m64bcst. Under writemask k1. | 
| VSCALEFPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{er} | EVEX.NDS.512.66.0F38.W1 2C /r | avx512 | Scale the packed double-precision floating-point values in zmm2 using values from zmm3/m512/m64bcst. Under writemask k1. | 
| VSCALEFPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.66.0F38.W0 2C /r | avx512 | Scale the packed single-precision floating-point values in xmm2 using values from xmm3/m128/m32bcst. Under writemask k1. | 
| VSCALEFPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.66.0F38.W0 2C /r | avx512 | Scale the packed single-precision values in ymm2 using floating point values from ymm3/m256/m32bcst. Under writemask k1. | 
| VSCALEFPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst{er} | EVEX.NDS.512.66.0F38.W0 2C /r | avx512 | Scale the packed single-precision floating-point values in zmm2 using floating-point values from zmm3/m512/m32bcst. Under writemask k1. | 
| VSCALEFSD xmm1 {k1}{z}, xmm2, xmm3/m64{er} | EVEX.NDS.LIG.66.0F38.W1 2D /r | avx512 | Scale the scalar double-precision floating-point values in xmm2 using the value from xmm3/m64. Under writemask k1. | 
| VSCALEFSS xmm1 {k1}{z}, xmm2, xmm3/m32{er} | EVEX.NDS.LIG.66.0F38.W0 2D /r | avx512 | Scale the scalar single-precision floating-point value in xmm2 using floating-point value from xmm3/m32. Under writemask k1. | 
| VSCATTERDPS vm32x {k1}, xmm1 | EVEX.128.66.0F38.W0 A2 /vsib | avx512 | Using signed dword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERDPS vm32y {k1}, ymm1 | EVEX.256.66.0F38.W0 A2 /vsib | avx512 | Using signed dword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERDPS vm32z {k1}, zmm1 | EVEX.512.66.0F38.W0 A2 /vsib | avx512 | Using signed dword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERDPD vm32x {k1}, xmm1 | EVEX.128.66.0F38.W1 A2 /vsib | avx512 | Using signed dword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERDPD vm32x {k1}, ymm1 | EVEX.256.66.0F38.W1 A2 /vsib | avx512 | Using signed dword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERDPD vm32y {k1}, zmm1 | EVEX.512.66.0F38.W1 A2 /vsib | avx512 | Using signed dword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPS vm64x {k1}, xmm1 | EVEX.128.66.0F38.W0 A3 /vsib | avx512 | Using signed qword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPS vm64y {k1}, xmm1 | EVEX.256.66.0F38.W0 A3 /vsib | avx512 | Using signed qword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPS vm64z {k1}, ymm1 | EVEX.512.66.0F38.W0 A3 /vsib | avx512 | Using signed qword indices, scatter single-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPD vm64x {k1}, xmm1 | EVEX.128.66.0F38.W1 A3 /vsib | avx512 | Using signed qword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPD vm64y {k1}, ymm1 | EVEX.256.66.0F38.W1 A3 /vsib | avx512 | Using signed qword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERQPD vm64z {k1}, zmm1 | EVEX.512.66.0F38.W1 A3 /vsib | avx512 | Using signed qword indices, scatter double-precision floating-point values to memory using writemask k1. | 
| VSCATTERPF0DPS vm32z {k1} | EVEX.512.66.0F38.W0 C6 /5 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing single-precision data using writemask k1 and T0 hint with intent to write. | 
| VSCATTERPF0QPS vm64z {k1} | EVEX.512.66.0F38.W0 C7 /5 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing single-precision data using writemask k1 and T0 hint with intent to write. | 
| VSCATTERPF0DPD vm32y {k1} | EVEX.512.66.0F38.W1 C6 /5 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing double-precision data using writemask k1 and T0 hint with intent to write. | 
| VSCATTERPF0QPD vm64z {k1} | EVEX.512.66.0F38.W1 C7 /5 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing double-precision data using writemask k1 and T0 hint with intent to write. | 
| VSCATTERPF1DPS vm32z {k1} | EVEX.512.66.0F38.W0 C6 /6 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing single-precision data using writemask k1 and T1 hint with intent to write. | 
| VSCATTERPF1QPS vm64z {k1} | EVEX.512.66.0F38.W0 C7 /6 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing single-precision data using writemask k1 and T1 hint with intent to write. | 
| VSCATTERPF1DPD vm32y {k1} | EVEX.512.66.0F38.W1 C6 /6 /vsib | avx512 | Using signed dword indices, prefetch sparse byte memory locations containing double-precision data using writemask k1 and T1 hint with intent to write. | 
| VSCATTERPF1QPD vm64z {k1} | EVEX.512.66.0F38.W1 C7 /6 /vsib | avx512 | Using signed qword indices, prefetch sparse byte memory locations containing double-precision data using writemask k1 and T1 hint with intent to write. | 
| VSHUFF32X4 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 23 /r ib | avx512 | Shuffle 128-bit packed single-precision floating-point values selected by imm8 from ymm2 and ymm3/m256/m32bcst and place results in ymm1 subject to writemask k1. | 
| VSHUFF32x4 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.66.0F3A.W0 23 /r ib | avx512 | Shuffle 128-bit packed single-precision floating-point values selected by imm8 from zmm2 and zmm3/m512/m32bcst and place results in zmm1 subject to writemask k1. | 
| VSHUFF64X2 ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 23 /r ib | avx512 | Shuffle 128-bit packed double-precision floating-point values selected by imm8 from ymm2 and ymm3/m256/m64bcst and place results in ymm1 subject to writemask k1. | 
| VSHUFF64x2 zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F3A.W1 23 /r ib | avx512 | Shuffle 128-bit packed double-precision floating-point values selected by imm8 from zmm2 and zmm3/m512/m64bcst and place results in zmm1 subject to writemask k1. | 
| VSHUFI32X4 ymm1{k1}{z}, ymm2, ymm3/m256/m32bcst, imm8 | EVEX.NDS.256.66.0F3A.W0 43 /r ib | avx512 | Shuffle 128-bit packed double-word values selected by imm8 from ymm2 and ymm3/m256/m32bcst and place results in ymm1 subject to writemask k1. | 
| VSHUFI32x4 zmm1{k1}{z}, zmm2, zmm3/m512/m32bcst, imm8 | EVEX.NDS.512.66.0F3A.W0 43 /r ib | avx512 | Shuffle 128-bit packed double-word values selected by imm8 from zmm2 and zmm3/m512/m32bcst and place results in zmm1 subject to writemask k1. | 
| VSHUFI64X2 ymm1{k1}{z}, ymm2, ymm3/m256/m64bcst, imm8 | EVEX.NDS.256.66.0F3A.W1 43 /r ib | avx512 | Shuffle 128-bit packed quad-word values selected by imm8 from ymm2 and ymm3/m256/m64bcst and place results in ymm1 subject to writemask k1. | 
| VSHUFI64x2 zmm1{k1}{z}, zmm2, zmm3/m512/m64bcst, imm8 | EVEX.NDS.512.66.0F3A.W1 43 /r ib | avx512 | Shuffle 128-bit packed quad-word values selected by imm8 from zmm2 and zmm3/m512/m64bcst and place results in zmm1 subject to writemask k1. | 
| VTESTPS xmm1, xmm2/m128 | VEX.128.66.0F38.W0 0E /r | avx | Set ZF and CF depending on sign bit AND and ANDN of packed single-precision floating-point sources. | 
| VTESTPS ymm1, ymm2/m256 | VEX.256.66.0F38.W0 0E /r | avx | Set ZF and CF depending on sign bit AND and ANDN of packed single-precision floating-point sources. | 
| VTESTPD xmm1, xmm2/m128 | VEX.128.66.0F38.W0 0F /r | avx | Set ZF and CF depending on sign bit AND and ANDN of packed double-precision floating-point sources. | 
| VTESTPD ymm1, ymm2/m256 | VEX.256.66.0F38.W0 0F /r | avx | Set ZF and CF depending on sign bit AND and ANDN of packed double-precision floating-point sources. | 
| VZEROALL | VEX.256.0F.WIG 77 | avx | Zero all YMM registers. | 
| VZEROUPPER | VEX.128.0F.WIG 77 | avx | Zero upper 128 bits of all YMM registers. | 
| WAIT | 9B | Check pending unmasked floating-point exceptions. | |
| WAIT | 9B | Check pending unmasked floating-point exceptions. | |
| WBINVD | 0F 09 | Write back and flush Internal caches; initiate writing-back and flushing of external caches. | |
| WRFSBASE r32 | F3 0F AE /2 | fsgsbase | Load the FS base address with the 32-bit value in the source register. | 
| WRFSBASE r64 | F3 REX.W 0F AE /2 | fsgsbase | Load the FS base address with the 64-bit value in the source register. | 
| WRGSBASE r32 | F3 0F AE /3 | fsgsbase | Load the GS base address with the 32-bit value in the source register. | 
| WRGSBASE r64 | F3 REX.W 0F AE /3 | fsgsbase | Load the GS base address with the 64-bit value in the source register. | 
| WRMSR | 0F 30 | Write the value in EDX:EAX to MSR specified by ECX. | |
| WRPKRU | 0F 01 EF | ospke | Writes EAX into PKRU. | 
| XABORT imm8 | C6 F8 ib | rtm | Causes an RTM abort if in RTM execution | 
| XACQUIRE | F2 | hle | A hint used with an “XACQUIRE-enabled“ instruction to start lock elision on the instruction memory operand address. | 
| XRELEASE | F3 | hle | A hint used with an “XRELEASE-enabled“ instruction to end lock elision on the instruction memory operand address. | 
| XADD r/m8, r8 | 0F C0 /r | Exchange r8 and r/m8; load sum into r/m8. | |
| XADD r/m8, r8 | REX + 0F C0 /r | Exchange r8 and r/m8; load sum into r/m8. | |
| XADD r/m16, r16 | 0F C1 /r | Exchange r16 and r/m16; load sum into r/m16. | |
| XADD r/m32, r32 | 0F C1 /r | Exchange r32 and r/m32; load sum into r/m32. | |
| XADD r/m64, r64 | REX.W + 0F C1 /r | Exchange r64 and r/m64; load sum into r/m64. | |
| XBEGIN rel16 | C7 F8 | rtm | Specifies the start of an RTM region. Provides a 16-bit relative offset to compute the address of the fallback instruction address at which execution resumes following an RTM abort. | 
| XBEGIN rel32 | C7 F8 | rtm | Specifies the start of an RTM region. Provides a 32-bit relative offset to compute the address of the fallback instruction address at which execution resumes following an RTM abort. | 
| XCHG AX, r16 | 90+rw | Exchange r16 with AX. | |
| XCHG r16, AX | 90+rw | Exchange AX with r16. | |
| XCHG EAX, r32 | 90+rd | Exchange r32 with EAX. | |
| XCHG RAX, r64 | REX.W + 90+rd | Exchange r64 with RAX. | |
| XCHG r32, EAX | 90+rd | Exchange EAX with r32. | |
| XCHG r64, RAX | REX.W + 90+rd | Exchange RAX with r64. | |
| XCHG r/m8, r8 | 86 /r | Exchange r8 (byte register) with byte from r/m8. | |
| XCHG r/m8, r8 | REX + 86 /r | Exchange r8 (byte register) with byte from r/m8. | |
| XCHG r8, r/m8 | 86 /r | Exchange byte from r/m8 with r8 (byte register). | |
| XCHG r8, r/m8 | REX + 86 /r | Exchange byte from r/m8 with r8 (byte register). | |
| XCHG r/m16, r16 | 87 /r | Exchange r16 with word from r/m16. | |
| XCHG r16, r/m16 | 87 /r | Exchange word from r/m16 with r16. | |
| XCHG r/m32, r32 | 87 /r | Exchange r32 with doubleword from r/m32. | |
| XCHG r/m64, r64 | REX.W + 87 /r | Exchange r64 with quadword from r/m64. | |
| XCHG r32, r/m32 | 87 /r | Exchange doubleword from r/m32 with r32. | |
| XCHG r64, r/m64 | REX.W + 87 /r | Exchange quadword from r/m64 with r64. | |
| XEND | 0F 01 D5 | rtm | Specifies the end of an RTM code region. | 
| XGETBV | 0F 01 D0 | Reads an XCR specified by ECX into EDX:EAX. | |
| XLAT m8 | D7 | Set AL to memory byte DS:[(E)BX + unsigned AL]. | |
| XLATB | D7 | Set AL to memory byte DS:[(E)BX + unsigned AL]. | |
| XLATB | REX.W + D7 | Set AL to memory byte [RBX + unsigned AL]. | |
| XOR AL, imm8 | 34 ib | AL XOR imm8. | |
| XOR AX, imm16 | 35 iw | AX XOR imm16. | |
| XOR EAX, imm32 | 35 id | EAX XOR imm32. | |
| XOR RAX, imm32 | REX.W + 35 id | RAX XOR imm32 (sign-extended). | |
| XOR r/m8, imm8 | 80 /6 ib | r/m8 XOR imm8. | |
| XOR r/m8, imm8 | REX + 80 /6 ib | r/m8 XOR imm8. | |
| XOR r/m16, imm16 | 81 /6 iw | r/m16 XOR imm16. | |
| XOR r/m32, imm32 | 81 /6 id | r/m32 XOR imm32. | |
| XOR r/m64, imm32 | REX.W + 81 /6 id | r/m64 XOR imm32 (sign-extended). | |
| XOR r/m16, imm8 | 83 /6 ib | r/m16 XOR imm8 (sign-extended). | |
| XOR r/m32, imm8 | 83 /6 ib | r/m32 XOR imm8 (sign-extended). | |
| XOR r/m64, imm8 | REX.W + 83 /6 ib | r/m64 XOR imm8 (sign-extended). | |
| XOR r/m8, r8 | 30 /r | r/m8 XOR r8. | |
| XOR r/m8, r8 | REX + 30 /r | r/m8 XOR r8. | |
| XOR r/m16, r16 | 31 /r | r/m16 XOR r16. | |
| XOR r/m32, r32 | 31 /r | r/m32 XOR r32. | |
| XOR r/m64, r64 | REX.W + 31 /r | r/m64 XOR r64. | |
| XOR r8, r/m8 | 32 /r | r8 XOR r/m8. | |
| XOR r8, r/m8 | REX + 32 /r | r8 XOR r/m8. | |
| XOR r16, r/m16 | 33 /r | r16 XOR r/m16. | |
| XOR r32, r/m32 | 33 /r | r32 XOR r/m32. | |
| XOR r64, r/m64 | REX.W + 33 /r | r64 XOR r/m64. | |
| XORPD xmm1, xmm2/m128 | 66 0F 57/r | sse2 | Return the bitwise logical XOR of packed double-precision floating-point values in xmm1 and xmm2/mem. | 
| VXORPD xmm1,xmm2, xmm3/m128 | VEX.128.66.0F.WIG 57 /r | avx | Return the bitwise logical XOR of packed double-precision floating-point values in xmm2 and xmm3/mem. | 
| VXORPD ymm1, ymm2, ymm3/m256 | VEX.256.66.0F.WIG 57 /r | avx | Return the bitwise logical XOR of packed double-precision floating-point values in ymm2 and ymm3/mem. | 
| VXORPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst | EVEX.128.66.0F.W1 57 /r | avx512 | Return the bitwise logical XOR of packed double-precision floating-point values in xmm2 and xmm3/m128/m64bcst subject to writemask k1. | 
| VXORPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst | EVEX.256.66.0F.W1 57 /r | avx512 | Return the bitwise logical XOR of packed double-precision floating-point values in ymm2 and ymm3/m256/m64bcst subject to writemask k1. | 
| VXORPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst | EVEX.512.66.0F.W1 57 /r | avx512 | Return the bitwise logical XOR of packed double-precision floating-point values in zmm2 and zmm3/m512/m64bcst subject to writemask k1. | 
| XORPS xmm1, xmm2/m128 | 0F 57 /r | sse | Return the bitwise logical XOR of packed single-precision floating-point values in xmm1 and xmm2/mem. | 
| VXORPS xmm1,xmm2, xmm3/m128 | VEX.NDS.128.0F.WIG 57 /r | avx | Return the bitwise logical XOR of packed single-precision floating-point values in xmm2 and xmm3/mem. | 
| VXORPS ymm1, ymm2, ymm3/m256 | VEX.NDS.256.0F.WIG 57 /r | avx | Return the bitwise logical XOR of packed single-precision floating-point values in ymm2 and ymm3/mem. | 
| VXORPS xmm1 {k1}{z}, xmm2, xmm3/m128/m32bcst | EVEX.NDS.128.0F.W0 57 /r | avx512 | Return the bitwise logical XOR of packed single-precision floating-point values in xmm2 and xmm3/m128/m32bcst subject to writemask k1. | 
| VXORPS ymm1 {k1}{z}, ymm2, ymm3/m256/m32bcst | EVEX.NDS.256.0F.W0 57 /r | avx512 | Return the bitwise logical XOR of packed single-precision floating-point values in ymm2 and ymm3/m256/m32bcst subject to writemask k1. | 
| VXORPS zmm1 {k1}{z}, zmm2, zmm3/m512/m32bcst | EVEX.NDS.512.0F.W0 57 /r | avx512 | Return the bitwise logical XOR of packed single-precision floating-point values in zmm2 and zmm3/m512/m32bcst subject to writemask k1. | 
| XRSTOR mem | 0F AE /5 | Restore state components specified by EDX:EAX from mem. | |
| XRSTOR64 mem | REX.W+ 0F AE /5 | Restore state components specified by EDX:EAX from mem. | |
| XRSTORS mem | 0F C7 /3 | Restore state components specified by EDX:EAX from mem. | |
| XRSTORS64 mem | REX.W+ 0F C7 /3 | Restore state components specified by EDX:EAX from mem. | |
| XSAVE mem | 0F AE /4 | Save state components specified by EDX:EAX to mem. | |
| XSAVE64 mem | REX.W+ 0F AE /4 | Save state components specified by EDX:EAX to mem. | |
| XSAVEC mem | 0F C7 /4 | Save state components specified by EDX:EAX to mem with compaction. | |
| XSAVEC64 mem | REX.W+ 0F C7 /4 | Save state components specified by EDX:EAX to mem with compaction. | |
| XSAVEOPT mem | 0F AE /6 | xsaveopt | Save state components specified by EDX:EAX to mem, optimizing if possible. | 
| XSAVEOPT64 mem | REX.W + 0F AE /6 | xsaveopt | Save state components specified by EDX:EAX to mem, optimizing if possible. | 
| XSAVES mem | 0F C7 /5 | Save state components specified by EDX:EAX to mem with compaction, optimizing if possible. | |
| XSAVES64 mem | REX.W+ 0F C7 /5 | Save state components specified by EDX:EAX to mem with compaction, optimizing if possible. | |
| XSETBV | 0F 01 D1 | Write the value in EDX:EAX to the XCR specified by ECX. | |
| XTEST | 0F 01 D6 | hle | Test if executing in a transactional region |