Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions.
More...
|
static void | exec (real const *const *const A, real const *const *const B, real *const C, int const i=1, int const offset_A=0, int const offset_B=0, int const offset_C=0) |
| Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class.
|
|
template<int T_offset_A, int T_offset_B, int T_offset_C> |
static void | exec (real const *const *const A, real const *const *const B, real *const C, int const i=1) |
|
template<typename T_real, typename T_reg, int T_M, int T_N, int T_K>
class MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >
Matrix multiplication template for architectures with SSE2 or higher and compilers that support C++ intrinsics for access to SSE instructions.
Choice of template parameters:
- T_M and T_N should be chosen so that the T_M x T_N matrix C
fits in registers. For example T_M == T_N == 4
- T_K should be chosen so that the generated code fits in L1 instruction cache. For example T_K == 128.
- T_real and T_reg must go together. Example:
- <T_real, T_reg> == <double, __m128d>
- <T_real, T_reg> == <float, __m128>
The public typedefs and static members specify how the matrices must be stored.
template<typename
real, typename T_reg, int T_M, int T_N, int T_K>
void MM_kernel_inner_sse2_A< real, T_reg, T_M, T_N, T_K >::exec |
( |
real const *const *const | A, |
|
|
real const *const *const | B, |
|
|
real *const | C, |
|
|
int const | i = 1, |
|
|
int const | offset_A = 0, |
|
|
int const | offset_B = 0, |
|
|
int const | offset_C = 0 ) |
|
static |
Executes the matrix-matrix multiply C += A B with the three matrices A, B, and C stored according to the static members and typedefs of this class.
References A, B, floats_per_register, MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Loop< T_loop_index, T_end >::outer(), and STATIC_ASSERT_DEBUG.
Referenced by MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack< M, K, Ordering_col_wise, 1 >::pack(), and MM_kernel_inner_sse2_A< T_real, T_reg, T_M, T_N, T_K >::Pack< M, K, Ordering_col_wise, 1 >::unpack().