在 MPI 中逐个元素对数组元素求和和收集

Summing and Gathering elements of array element-wise in MPI

本文关键字：求和数组元素元素 MPI 更新时间：2023-10-16

在进行计算后，使用笛卡尔拓扑将矩阵与向量相乘。我得到了以下过程，其中包含他们的等级和向量。

P0 (process with rank = 0) =[2 , 9].
P1 (process with rank = 1) =[2 , 3]
P2 (process with rank = 2) =[1 , 9] 
P3 (process with rank = 3) =[4 , 6].

现在。我需要分别对偶数秩过程和奇数过程的元素求和，如下所示：

温度1 = [3 ， 18] 温度2 = [6,9]

然后，将结果收集到不同的向量中，如下所示：

结果 = [3， 18， 6， 9]

我这样做的诀窍是使用MPI_Reduce，然后像这样MPI_Gather：

// Previous code 
double* temp1 , *temp2;
if(myrank %2 == 0){
BOOLEAN flag =  Allocate_vector(&temp1 ,local_m); // function to allocate space for vectors
MPI_Reduce(local_y, temp1, local_n, MPI_DOUBLE, MPI_SUM, 0 ,  comm);
MPI_Gather(temp1, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE,0, comm);
free(temp1);
}
else{
Allocate_vector(&temp2 ,local_m);
MPI_Reduce(local_y, temp2, local_n , MPI_DOUBLE, MPI_SUM, 0 ,  comm);
MPI_Gather(temp2, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0,comm);
free(temp2);
}

但答案是不正确的。似乎代码将偶数和奇数过程的所有元素相加，然后给出分段错误： Wrong_result = [21 15 0 0] 和这个错误

**./test': double free or corruption (fasttop): 0x00000000013c7510 *** *** Error in./test' 中的错误：双重释放或损坏(快速顶部)：0x0000000001605b60 ***

它不会按照您尝试的方式工作。要对流程子集的元素执行缩减，您必须为它们创建一个子通信器。在您的情况下，奇数和偶数进程共享相同的comm，因此操作不是在两个单独的进程组上，而是在组合组上。

您应该使用MPI_Comm_split执行拆分，使用两个新的子通信器执行缩减，最后让每个子通信器中的等级为 0(我们称这些领导者)参与另一个仅包含这两个子通信器的集合：

// Make sure rank is set accordingly
MPI_Comm_rank(comm, &rank);
// Split even and odd ranks in separate subcommunicators
MPI_Comm subcomm;
MPI_Comm_split(comm, rank % 2, 0, &subcomm);
// Perform the reduction in each separate group
double *temp;
Allocate_vector(&temp, local_n);
MPI_Reduce(local_y, temp, local_n , MPI_DOUBLE, MPI_SUM, 0, subcomm);
// Find out our rank in subcomm
int subrank;
MPI_Comm_rank(subcomm, &subrank);
// At this point, we no longer need subcomm. Free it and reuse the variable.
MPI_Comm_free(&subcomm);
// Separate both group leaders (rank 0) into their own subcommunicator
MPI_Comm_split(comm, subrank == 0 ? 0 : MPI_UNDEFINED, 0, &subcomm);
if (subcomm != MPI_COMM_NULL) {
MPI_Gather(temp, local_n, MPI_DOUBLE, gResult, local_n, MPI_DOUBLE, 0, subcomm);
MPI_Comm_free(&subcomm);
}
// Free resources
free(temp);

结果将在后subcomm中排名 0gResult，由于拆分的执行方式，后comm恰好是等级 0。

我猜并不像预期的那么简单，但这是在 MPI 中进行方便的集体操作的代价。

在

侧节点上，在所示的代码中，您将temp1和temp2分配为长度local_m，而在所有集合调用中，长度被指定为local_n。如果发生local_n > local_m，则会发生堆损坏。