Issue 2985: std::reverse should be permitted to be vectorized

Title: std::reverse should be permitted to be vectorized
Status: lewg
Section: [alg.reverse]
Submitter: Billy O'Neal III

Created on 2017-06-24.00:00:00 last changed 88 months ago

Messages

msg9796 (view)

Date: 2018-04-03.18:48:52

Proposed resolution:

This wording is relative to N4659.

Edit [alg.reverse] as indicated:
```
template<class BidirectionalIterator>
  void reverse(BidirectionalIterator first, BidirectionalIterator last);
template<class ExecutionPolicy, class BidirectionalIterator>
  void reverse(ExecutionPolicy&& exec,
               BidirectionalIterator first, BidirectionalIterator last);
```
-1- Requires: *first shall be swappable ([swappable.requirements]).
-2- Effects: For each non-negative integer i < (last - first) / 2, applies iter_swap to all pairs of iterators first + i, (last - i) - 1. If is_trivially_copyable_v<typename iterator_traits<BidirectionalIterator>::value_type> is true, an implementation may permute the elements by making temporary copies, rather than by calling iter_swap. [Note: this allows the implementation to be vectorized. — end note]

msg9348 (view)

Date: 2018-04-15.00:00:00

[ 2018-04-02, Billy comments ]

This issue should be resolved by P0551, because it prohibits user specialization of std::swap and std::iter_swap, which means the proposed vectorization optimization for pointers-to-trivially-copyable is now implementable without changes to reverse's specification (We can detect if the user has provided an alternate swap in their own namespace, but not if they explicitly specialized swap or iter_swap).

msg9279 (view)

Date: 2017-07-12.01:30:31

[ 2017-07 Toronto Monday issue prioritization ]

Status to LEWG; this is similar to 2973

msg9278 (view)

Date: 2017-06-24.00:00:00

The fine folks on our backend team suggested that we special case std::reverse of 1/2/4/8 to take advantage of vector units. Unfortunately, at present std::reverse says it does N/2 iter_swaps, which doesn't permit our vector implementation even if the iterator inputs are pointers to trivially copyable Ts.

The vectorized version for pointers to shorts is ~8x faster on Skylake than the serial version, and about 7x faster for unsigned long longs; and users don't actually care whether or not we call swap here.

History
Date	User	Action	Args
2018-04-03 18:48:52	admin	set	messages: + msg9796
2017-07-12 01:30:31	admin	set	messages: + msg9348
2017-07-12 01:30:31	admin	set	status: new -> lewg
2017-06-24 22:07:33	admin	set	messages: + msg9279
2017-06-24 00:00:00	admin	create