Title
std::reverse should be permitted to be vectorized
Status
lewg
Section
[alg.reverse]
Submitter
Billy O'Neal III

Created on 2017-06-24.00:00:00 last changed 73 months ago

Messages

Date: 2018-04-03.18:48:52

Proposed resolution:

This wording is relative to N4659.

  1. Edit [alg.reverse] as indicated:

    template<class BidirectionalIterator>
      void reverse(BidirectionalIterator first, BidirectionalIterator last);
    template<class ExecutionPolicy, class BidirectionalIterator>
      void reverse(ExecutionPolicy&& exec,
                   BidirectionalIterator first, BidirectionalIterator last);
    

    -1- Requires: *first shall be swappable ([swappable.requirements]).

    -2- Effects: For each non-negative integer i < (last - first) / 2, applies iter_swap to all pairs of iterators first + i, (last - i) - 1. If is_trivially_copyable_v<typename iterator_traits<BidirectionalIterator>::value_type> is true, an implementation may permute the elements by making temporary copies, rather than by calling iter_swap. [Note: this allows the implementation to be vectorized. — end note]

Date: 2018-04-15.00:00:00

[ 2018-04-02, Billy comments ]

This issue should be resolved by P0551, because it prohibits user specialization of std::swap and std::iter_swap, which means the proposed vectorization optimization for pointers-to-trivially-copyable is now implementable without changes to reverse's specification (We can detect if the user has provided an alternate swap in their own namespace, but not if they explicitly specialized swap or iter_swap).

Date: 2017-07-12.01:30:31

[ 2017-07 Toronto Monday issue prioritization ]

Status to LEWG; this is similar to 2973

Date: 2017-06-24.00:00:00

The fine folks on our backend team suggested that we special case std::reverse of 1/2/4/8 to take advantage of vector units. Unfortunately, at present std::reverse says it does N/2 iter_swaps, which doesn't permit our vector implementation even if the iterator inputs are pointers to trivially copyable Ts.

The vectorized version for pointers to shorts is ~8x faster on Skylake than the serial version, and about 7x faster for unsigned long longs; and users don't actually care whether or not we call swap here.

History
Date User Action Args
2018-04-03 18:48:52adminsetmessages: + msg9796
2017-07-12 01:30:31adminsetmessages: + msg9348
2017-07-12 01:30:31adminsetstatus: new -> lewg
2017-06-24 22:07:33adminsetmessages: + msg9279
2017-06-24 00:00:00admincreate