并行用于C++17中数组索引范围内的循环

Parallel for loop over range of array indices in C++17

本文关键字：范围内循环索引数组用于 C++17 并行更新时间：2023-10-16

我需要更新一个100M元素数组，并希望并行执行。std::for_each(std::execution::par, ...)对此似乎很好，只是更新需要访问其他数组的元素，这取决于我正在更新的索引。一个最小的串行工作示例，我试图并行化的东西可能看起来像这样：

for (size_t i = 0; i < 100'000'000; i++)
d[i] = combine(d[i], s[2*i], s[2*i+1]);

当然，我可以手动生成线程，但这比std::for_each要多得多，所以最好能找到一种使用标准库实现这一点的优雅方法。到目前为止，我发现了一些不太优雅的使用for_each的方法，例如：

通过对数组元素的地址使用指针算术来计算索引。
按照boost的counting_range的精神实现我自己的伪迭代器。

有更好的方法吗？

std::ranges应该能够提供帮助。如果您可以访问c++20，您可以迭代索引而不是数据：

#include <ranges>
#include <vector>
#include <algorithm>
#include <iostream>
int main() {
std::vector<int> d(100);
std::ranges::iota_view indexes((size_t)0, d.size());
std::for_each(std::execution::par, indexes.begin(), indexes.end(), [&d](size_t i)
{
std::cout << i << "," << d[i] << "n";
});
return 0;
}

您应该能够迭代索引，而不是项。我认为C++20std::ranges为您提供了一种简单的方法，或者您可以使用Boostrange方法之一。我不知道为什么你会考虑以Boostcounting_range的精神滚动自己的，而你可以，嗯，使用Boost:-(

话虽如此，我实际上选择了滚动您自己的方法，只是为了使代码既不包含C++20也不包含Boost：根据您的需要，可以随意用其他方法之一替换paxrange：

#include <iostream>
#include <algorithm>
// Seriously, just use Boost :-)
class paxrange {
public:
class iterator {
friend class paxrange;
public:
long int operator *() const { return value; }
const iterator &operator ++() { ++value; return *this; }
iterator operator ++(int) { iterator copy(*this); ++value; return copy; }
bool operator ==(const iterator &other) const { return value == other.value; }
bool operator !=(const iterator &other) const { return value != other.value; }
protected:
iterator(long int start) : value (start) { }
private:
unsigned long value;
};
iterator begin() const { return beginVal; }
iterator end() const { return endVal; }
paxrange(long int  begin, long int end) : beginVal(begin), endVal(end) {}
private:
iterator beginVal;
iterator endVal;
};
int main() {
// Create a source and destination collection.
std::vector<int> s;
s.push_back(42); s.push_back(77); s.push_back(144);
s.push_back(12); s.push_back(6);
std::vector<int> d(5);
// Shows how to use indexes with multiple collections sharing index.
auto process = [s, &d](const int idx) { d[idx] = s[idx] + idx; };
paxrange x(0, d.size());
std::for_each(x.begin(), x.end(), process); // add parallelism later.
// Debug output.
for (const auto &item: s) std::cout << "< " << item << 'n';
std::cout << "=====n";
for (const auto &item: d) std::cout << "> " << item << 'n';
}

"；肉；解决方案的一部分是main()中间的三行，在这里您为回调设置了一个函数，该函数接受索引而不是项本身。

在该函数中，您可以使用该索引加上所需数量的集合来设置目标集合，这与您想要的非常相似。

在我的例子中，我只是希望输出向量是输入向量，但根据输出将索引添加到每个元素：

< 42
< 77
< 144
< 12
< 6
=====
> 42
> 78
> 146
> 15
> 10

Github中有一个简单的仅头库，它可能会对您有所帮助。

您的最小示例可以像这样并行化。然而，可能是由于缓存冷却的原因，运行时不会随着内核数量线性缩减。

#include "Lazy.h"
double combine(double a, double b, double c)
{
if (b > 0.5 && c < 0.4)
return a + std::exp(b * c + 1);
else if (b*c < 0.2)
return a * 0.8 + (1-c) * (1-b);
else
return std::exp(1.0 / a) + b + c;
}
// Generate index split for parallel tasks
auto getIndexPairs(std::size_t N, std::size_t numSplits)
{
std::vector<std::pair<std::size_t, std::size_t>> vecPairs(numSplits);
double dFrom = 0, dTo = 0;
for (auto i = 0; i < numSplits; ++i) {
dFrom = dTo;
dTo += N / double(numSplits);
vecPairs[i] = {std::size_t(dFrom), std::min(std::size_t(dTo), N)};
}
vecPairs[numSplits-1].second = N;
return vecPairs;
}
int main(int argc, char** argv) {
const std::size_t N = 100000000;
const std::size_t C = std::thread::hardware_concurrency(); // Number of parallel finder threads
std::vector<double> d(N);
std::vector<double> s(2*N);
// Fill d and s with some values
for (std::size_t i = 0; i < N; ++i) {
s[i] = double(i) / N;
s[i + N] = double(i + N) / N;
d[i] = N - i;
}

// Run combine(...) in parallel in C threads
Lazy::runForAll(getIndexPairs(N, C), [&](auto pr) {
for (int i=pr.first; i<pr.second; ++i)
d[i] = combine(d[i], s[2*i], s[2*i+1]);
return nullptr; // Dummy return value
});
}

@Alan Birtles的答案不适用于并行执行策略，因为它错误地输出为"static_assert失败："并行算法需要前向迭代器或更强的迭代器。"&"；。

一个潜在的替代方案是制作迭代器向量，但它不会那么节省空间。

std::vector<std::size_t> indexes(d.size());
std::iota(indexes.begin(), indexes.end(), 0);
std::for_each(std::execution::par, indexes.begin(), indexes.end(), [&](size_t i) {
std::cout << i << ',' << d[i] << 'n';
}