Created on 2024-08-08.00:00:00 last changed 3 months ago
Proposed resolution:
This wording is relative to N4988.
Modify [linalg.algs.blas2.gemv] as indicated:
[Drafting note: This change is needed because the matrix A does not need to be square. x.extents(0) must equal A.extents(1), while y.extents(0) must equal A.extents(0).]
-3- Mandates:
(3.1) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and
(3.2) — possibly-addable<decltype(y
x), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.-4- Preconditions:
(4.1) — multipliable(A, x, y) is true, and
(4.2) — addable(y
x, y, z) is true for those overloads that take a z parameter.-5- Complexity: 𝒪(A
x.extent(0) × xA.extent(01)).
Modify [linalg.algs.blas2.symv] as indicated:
-3- Mandates:
(3.1) — […]
(3.2) — […]
(3.3) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and
(3.4) — possibly-addable<decltype(y
x), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.-4- Preconditions:
(4.1) — A.extent(0) equals A.extent(1),
(4.2) — multipliable(A, x, y) is true, and
(4.3) — addable(y
x, y, z) is true for those overloads that take a z parameter.-5- Complexity: 𝒪(A
x.extent(0) × xA.extent(01)).
Modify [linalg.algs.blas2.hemv] as indicated:
-3- Mandates:
(3.1) — […]
(3.2) — […]
(3.3) — possibly-multipliable<decltype(A), decltype(x), decltype(y)>() is true, and
(3.4) — possibly-addable<decltype(y
x), decltype(y), decltype(z)>() is true for those overloads that take a z parameter.-4- Preconditions:
(4.1) — A.extent(0) equals A.extent(1),
(4.2) — multipliable(A, x, y) is true, and
(4.3) — addable(y
x, y, z) is true for those overloads that take a z parameter.-5- Complexity: 𝒪(A
x.extent(0) × xA.extent(01)).
Modify [linalg.algs.blas2.trmv] as indicated:
[Drafting note: The extents compatibility conditions are expressed differently than in the above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. They look correct here. The original Complexity elements adjusted below are technically correct, since is square, but changing this would improve consistency with [linalg.algs.blas2.gemv]]
template<in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec, out-vector OutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec, out-vector OutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y);-5- […]
-6- Effects: Computes . -5- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).template<in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InOutVec y); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, inout-vector InOutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InOutVec y);-8- […]
-9- Effects: […] -10- Complexity: 𝒪(Ay.extent(0) × yA.extent(01)).template<in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec1, in-vector InVec2, out-vector OutVec> void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec1 x, InVec2 y, OutVec z); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, in-vector InVec1, in-vector InVec2, out-vector OutVec> void triangular_matrix_vector_product(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InVec1 x, InVec2 y, OutVec z);-11- […]
-12- Effects: Computes . -13- Complexity: 𝒪(Ax.extent(0) × xA.extent(01)).
Modify [linalg.algs.blas3.rankk] as indicated:
[Drafting note: P3371R0, to be submitted in the August 15 mailing for LEWG review, contains the same wording changes to [linalg.algs.blas3.rankk] and [linalg.algs.blas3.rank2k] as proposed here, with additional changes corresponding to that proposal. Please apply this LWG issue's changes first, before P3371 merges]
-3- Mandates:
(3.1) — If InOutMat has layout_blas_packed layout, then the layout's Triangle template argument has the same type as the function's Triangle template argument; and
(3.2) — possibly-multipliable<decltype(A), decltype(transposed(A)), decltype(C)>
compatible-static-extents<decltype(A), decltype(A)>(0, 1)is true.;
(3.3) — compatible-static-extents<decltype(C), decltype(C)>(0, 1) is true; and
(3.4) — compatible-static-extents<decltype(A), decltype(C)>(0, 0) is true.-4- Preconditions: multipliable(A, transposed(A), C) is true.
(4.1) — A.extent(0) equals A.extent(1),
(4.2) — C.extent(0) equals C.extent(1), and
(4.3) — A.extent(0) equals C.extent(0).-5- Complexity: 𝒪(A.extent(0) × A.extent(1) × A
C.extent(0)).
Modify [linalg.algs.blas3.rank2k] as indicated:
-3- Mandates:
(3.1) — If InOutMat has layout_blas_packed layout, then the layout's Triangle template argument has the same type as the function's Triangle template argument;
(3.2) — possibly-multipliable<decltype(A), decltype(transposed(B)), decltype(C)>()
possibly-addable<decltype(A), decltype(B), decltype(C)>()is true; and(3.3) — possibly-multipliable<decltype(B), decltype(transposed(A)), decltype(C)>(0, 1)
compatible-static-extents<decltype(A), decltype(A)>(0, 1)is true.-4- Preconditions:
(4.1) — multipliable(A, transposed(B), C)
addable(A, B, C)is true, and(4.2) — multipliable(B, transposed(A), C) is true
A.extent(0) equals A.extent(1).-5- Complexity: 𝒪(A.extent(0) × A.extent(1) × B
C.extent(0)).
Modify [linalg.algs.blas3.trsm] as indicated:
[Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]
template<in-matrix InMat1, class Triangle, class DiagonalStorage, in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp> void triangular_matrix_matrix_left_solve(InMat1 A, Triangle t, DiagonalStorage d, InMat2 B, OutMat X, BinaryDivideOp divide); template<class ExecutionPolicy, in-matrix InMat1, class Triangle, class DiagonalStorage, in-matrix InMat2, out-matrix OutMat, class BinaryDivideOp> void triangular_matrix_matrix_left_solve(ExecutionPolicy&& exec, InMat1 A, Triangle t, DiagonalStorage d, InMat2 B, OutMat X, BinaryDivideOp divide);[…]
-6- Complexity: 𝒪(A.extent(0) × BX.extent(1) × BX.extent(1)).
Modify [linalg.algs.blas3.inplacetrsm] as indicated:
[Drafting note: Nothing is wrong here, but it's nice to make the complexity clauses depend only on input if possible]
template<in-matrix InMat, class Triangle, class DiagonalStorage, inout-matrix InOutMat, class BinaryDivideOp> void triangular_matrix_matrix_right_solve(InMat A, Triangle t, DiagonalStorage d, InOutMat B, BinaryDivideOp divide); template<class ExecutionPolicy, in-matrix InMat, class Triangle, class DiagonalStorage, inout-matrix InOutMat, class BinaryDivideOp> void triangular_matrix_matrix_right_solve(ExecutionPolicy&& exec, InMat A, Triangle t, DiagonalStorage d, InOutMat B, BinaryDivideOp divide);[…]
-13- Complexity: 𝒪(BA.extent(0) × A.extent(01) × AB.extent(1)).
As pointed out by Raffaele Solcà (CSCS Swiss National Supercomputing Centre), some of the Mandates, Preconditions, and Complexity elements of some BLAS 2 and BLAS 3 algorithms in [linalg] are incorrect.
History | |||
---|---|---|---|
Date | User | Action | Args |
2024-08-11 10:43:44 | admin | set | messages: + msg14320 |
2024-08-08 00:00:00 | admin | create |