MPI 가 fft 를 실현 하 는 교체 알고리즘 은 병렬 계산 - 구조 에서 비롯 된다.계산법.프로 그래 밍 중 의사 코드

8948 단어 프로 그래 밍 c 알고리즘 struct include fft

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define T 0

typedef struct {
	double real;
	double img;
} com;

double PI;
//don't use the omega array,not every node needs a whole copy of omega,not efficient
static inline void cp(com f,com *t);//copy complex
static inline void add(com a,com b,com* r);
static inline void mul(com a,com b,com* r);
static inline void sub(com a,com b,com* r);
int br(int src,int size);//bit reverse
void show(com a) {
	printf("%.4f %.4f 
",a.real,a.img);
}
int main(int argc,char *argv[]) {
	PI=atan(1)*4;
	int k,n;//process id and total number of processes
	MPI_Init(&argc,&argv);
	MPI_Comm_rank(MPI_COMM_WORLD,&k);
	MPI_Comm_size(MPI_COMM_WORLD,&n);
	//mpi communication obj
	MPI_Request isReq;
	MPI_Request reReq;
	MPI_Status s;
	//data
	com a;//input
	a.real=k+1;
	a.img=0;
	com c;//temp
	cp(a,&c);
	com w;//omega	
	com r;//recieve

	int l=log(n)/log(2);

	for(int h=l-1;h>=0;--h) {
		int p=pow(2,h);
		int q=n/p;
		if(k%p==k%(2*p)) {
			MPI_Issend(&c,2,MPI_DOUBLE,k+p,T,MPI_COMM_WORLD,&isReq);
			MPI_Irecv(&r,2,MPI_DOUBLE,k+p,T,MPI_COMM_WORLD,&reReq);
			int time=p*(br(k,l)%q);//compute while recieving and sending
			w.real=cos(2*PI*time/n);
			w.img=sin(2*PI*time/n);
			MPI_Wait(&reReq,&s);
			mul(r,w,&r);
			MPI_Wait(&isReq,&s);
			add(c,r,&c);
		} else {
			MPI_Issend(&c,2,MPI_DOUBLE,k-p,T,MPI_COMM_WORLD,&isReq);
			MPI_Irecv(&r,2,MPI_DOUBLE,k-p,T,MPI_COMM_WORLD,&reReq);
			int time=p*(br(k-p,l)%q);//compute while recieving and sending
			w.real=cos(2*PI*time/n);
			w.img=sin(2*PI*time/n);
			MPI_Wait(&reReq,&s);
			MPI_Wait(&isReq,&s);
			mul(c,w,&c);//can't modify until sending comes to an end
			sub(r,c,&c);
		}
		MPI_Barrier(MPI_COMM_WORLD);
	}

	MPI_Issend(&c,2,MPI_DOUBLE,br(k,l),T,MPI_COMM_WORLD,&isReq);
	MPI_Recv(&a,2,MPI_DOUBLE,MPI_ANY_SOURCE,T,MPI_COMM_WORLD,&s);
	printf("b%d:%.4f %.4f 
",k,a.real,a.img);
	MPI_Wait(&isReq,&s);
	MPI_Finalize();
	return 0;
}

void cp(com f,com *t) {
	t->real=f.real;
	t->img=f.img;
}
void add(com a,com b,com *c)   
{
	c->real=a.real+b.real;   
	c->img=a.img+b.img;   
}   
    
void mul(com a,com b,com *c)   
{   
	c->real=a.real*b.real-a.img*b.img;   
	c->img=a.real*b.img+a.img*b.real;   
}
void sub(com a,com b,com *c)   
{   
	c->real=a.real-b.real;   
	c->img=a.img-b.img;   
}   
int br(int src,int size) {
	int tmp=src;
	int des=0;
	for(int i=size-1;i>=0;--i) {
		des=((tmp&1)<<i)|des;
		tmp=tmp>>1;
	}
	return des;
}

계산 과정 과 이 책의 직렬 알고리즘 은 어느 정도 차이 가 있 고 노드 의 계 산 량 을 균형 시 켜 교묘 하 다.실현 과정 에서 가능 한 한 통신 과 계산 시간 을 중첩 시 켜 계산 시간 을 단축 시킨다.본인 의 다른 글 직렬 fft 를 고 쳐 씁 니 다.
오 메 가 배열 을 사용 하지 않 고 직접 사용 할 때 계산 합 니 다.각 노드 는 log (n) 개 omega 를 최대 로 사용 하기 때문에 계산 배열 은 n 개 를 계산 해 야 합 니 다.
접시 형 에서 상하 두 복수 의 계산 과 통신 순 서 는 약간의 차이 가 있 는데, 읽 고 쓰 는 오류 가 발생 할 수 있 기 때문이다.
데이터 크기 ds 와 프로 세 스 수 ps 가 0 = = ds% ps 를 보장 하기 만 하면 아래 의 개선 알고리즘 을 사용 할 수 있 습 니 다
전체 통신 알고리즘 은 매번 전역 적 으로 데 이 터 를 업데이트 하여 통신 overhead 가 너무 클 수 있 습 니 다.

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <sys/stat.h>
#include <memory.h>

typedef struct {
	double real;
	double img;
} com;

double PI;

int readBinary(char* file,void *buf,int fsize);//if named read causes override and miscall
int writeBinary(char *file,com *array,int size);
//don't use the omega array,not every node needs a whole copy of omega,not efficient
static inline void cp(com f,com *t);//copy complex
static inline void add(com a,com b,com* r);
static inline void mul(com a,com b,com* r);
static inline void sub(com a,com b,com* r);
int br(int src,int size);//bit reverse
void show(com a) {
	printf("%.4f %.4f 
",a.real,a.img);
}
int main(int argc,char *argv[]) {
	if(argc<3) {
		printf("wtf
");
		return 1;
	}
	PI=atan(1)*4;
	int self,size;//process id and total number of processes
	MPI_Init(&argc,&argv);
	MPI_Comm_rank(MPI_COMM_WORLD,&self);
	MPI_Comm_size(MPI_COMM_WORLD,&size);
	int fsize,n;
	void *buf;
	com *in;
	if(0==self) {
		printf("start 
");
		struct stat fstat;
		stat(argv[1],&fstat);
		fsize=fstat.st_size;
		buf=malloc(fsize);
		n=readBinary(argv[1],buf,fsize)/2;//n stands for total complex number

		in=(com*)malloc(n*sizeof(com));
		memcpy(in,((int*)buf+1),n*sizeof(com));
	}

	MPI_Bcast(&n,1,MPI_INT,0,MPI_COMM_WORLD);//every thread should know the total size
	
	if(0==self) {
		
	}
	
	if(-1==n) {
		printf("error reading 
");
		MPI_Finalize();
	}
	
	int psize=n/size;
	//mpi communication obj
	MPI_Request isReq[psize];
	MPI_Request reReq[psize];
	MPI_Status s[psize];
	//data
	com a[psize];//input
	com w;//omega
	com r[psize];//recieve

	int l=log(n)/log(2);
	
//	void *buffer=malloc(fsize);
//	MPI_Buffer_attach(buffer,fsize);

	com c[psize];//temp
	
	
	for(int h=l-1;h>=0;--h) {
		//initialize data each round
		MPI_Scatter(in,psize*2,MPI_DOUBLE,a,psize*2,MPI_DOUBLE,0,MPI_COMM_WORLD);
		MPI_Barrier(MPI_COMM_WORLD);
		memcpy(c,a,psize*sizeof(com));
		//calculate
		int p=pow(2,h);
		int q=n/p;
		int k;
		for(k=self*psize;k<self*psize+psize;++k) {//post all comunications first
			if(k%p==k%(2*p)) {//tag is always the sending row number
			//printf("self %d row %d dest row %d dest %d 
",self,k+self,k+p,(k+p)/psize);
				MPI_Issend(&c[k-self*psize],2,MPI_DOUBLE,(k+p)/psize,k,MPI_COMM_WORLD,&isReq[k-self*psize]);
				MPI_Irecv(&r[k-self*psize],2,MPI_DOUBLE,(k+p)/psize,k+p,MPI_COMM_WORLD,&reReq[k-self*psize]);
			} else {
				MPI_Issend(&c[k-self*psize],2,MPI_DOUBLE,(k-p)/psize,k,MPI_COMM_WORLD,&isReq[k-self*psize]);
				MPI_Irecv(&r[k-self*psize],2,MPI_DOUBLE,(k-p)/psize,k-p,MPI_COMM_WORLD,&reReq[k-self*psize]);
			}
		}
		MPI_Barrier(MPI_COMM_WORLD);
		for(k=self*psize;k<self*psize+psize;++k) {//wait and calculate
			if(k%p==k%(2*p)) {
				int time=p*(br(k,l)%q);//compute while recieving and sending
				w.real=cos(2*PI*time/n);
				w.img=sin(2*PI*time/n);
				MPI_Wait(&reReq[k-self*psize],&s[k-self*psize]);
				mul(r[k-self*psize],w,&r[k-self*psize]);
				MPI_Wait(&isReq[k-self*psize],&s[k-self*psize]);
				add(c[k-self*psize],r[k-self*psize],&c[k-self*psize]);
			} else {
				int time=p*(br(k-p,l)%q);//compute while recieving and sending
				w.real=cos(2*PI*time/n);
				w.img=sin(2*PI*time/n);
				MPI_Wait(&reReq[k-self*psize],&s[k-self*psize]);
				MPI_Wait(&isReq[k-self*psize],&s[k-self*psize]);
				mul(c[k-self*psize],w,&c[k-self*psize]);//can't modify until sending comes to an end
				sub(r[k-self*psize],c[k-self*psize],&c[k-self*psize]);
			}
		}
		MPI_Gather(c,2*psize,MPI_DOUBLE,in,2*psize,MPI_DOUBLE,0,MPI_COMM_WORLD);
	}
	//reverse all data
	for(int k=self*psize;k<self*psize+psize;++k) {//post all comunications first
		//tag is always the sending row number
		int brv=br(k,l);
		MPI_Issend(&c[k-self*psize],2,MPI_DOUBLE,brv/psize,k,MPI_COMM_WORLD,&isReq[k-self*psize]);
		MPI_Irecv(&r[k-self*psize],2,MPI_DOUBLE,brv/psize,brv,MPI_COMM_WORLD,&reReq[k-self*psize]);
	}
	MPI_Waitall(psize,isReq,s);
	MPI_Waitall(psize,reReq,s);
	MPI_Gather(r,psize*2,MPI_DOUBLE,in,psize*2,MPI_DOUBLE,0,MPI_COMM_WORLD);
	if(0==self) {
		for(int i=0;i<n;++i) {
			printf("b%d:%.4f %.4f 
",i,in[i].real,in[i].img);
		}
		writeBinary(argv[2],in,n);
	}
//	MPI_Buffer_detach(buffer,&fsize);
//	free(buffer);
	if(0==self) {
		free(buf);
	}
	MPI_Finalize();
	return 0;
}

int readBinary(char* file,void *buf,int fsize) {
	FILE *in;
	if(!(in=fopen(file,"r"))) {
		printf("can't open 
");
		return -1;
	}

	fread(buf,sizeof(char),fsize,in);
	int size=((int*)buf)[0];
	fclose(in);
	return size;
}

int writeBinary(char *file,com *array,int size) {
	FILE *out;
	if(!(out=fopen(file,"w"))) {
		printf("can't open 
");
		return -1;
	}
	int bsize=sizeof(int)+size*sizeof(com);
	void *buf=malloc(bsize);
	((int*)buf)[0]=2*size;
	memcpy(((int*)buf+1),array,size*sizeof(com));
	fwrite(buf,sizeof(char),bsize,out);
	free(buf);
	fclose(out);
	return 0;
}
void cp(com f,com *t) {
	t->real=f.real;
	t->img=f.img;
}
void add(com a,com b,com *c)   
{
	c->real=a.real+b.real;   
	c->img=a.img+b.img;   
}   
    
void mul(com a,com b,com *c)   
{   
	c->real=a.real*b.real-a.img*b.img;   
	c->img=a.real*b.img+a.img*b.real;   
}
void sub(com a,com b,com *c)   
{   
	c->real=a.real-b.real;   
	c->img=a.img-b.img;   
}   
int br(int src,int size) {
	int tmp=src;
	int des=0;
	for(int i=size-1;i>=0;--i) {
		des=((tmp&1)<<i)|des;
		tmp=tmp>>1;
	}
	return des;
}

이 내용에 흥미가 있습니까?

현재 기사가 여러분의 문제를 해결하지 못하는 경우 AI 엔진은 머신러닝 분석(스마트 모델이 방금 만들어져 부정확한 경우가 있을 수 있음)을 통해 가장 유사한 기사를 추천합니다:

Linux Shell 프로 그래 밍 - 텍스트 처리 grep, sed

사용자 가 지정 한 '모드' 에 따라 대상 텍스트 를 일치 하 게 검사 하고 일치 하 는 줄 을 인쇄 합 니 다. ##포함 되 지 않 음, 역방향 일치 \ ##키워드 앞 뒤 가 맞지 않 고 키워드 만 일치 합 니 다...

텍스트를 자유롭게 공유하거나 복사할 수 있습니다.하지만 이 문서의 URL은 참조 URL로 남겨 두십시오.

CC BY-SA 2.5, CC BY-SA 3.0 및 CC BY-SA 4.0에 따라 라이센스가 부여됩니다.

스 레 드 의 상호작용

프로그래밍 없이 데이터 SQL을 MYSQL로 내보내는 방법

좋은 웹페이지 즐겨찾기

개발자 우수 사이트 수집

개발자가 알아야 할 필수 사이트 100선 추천 우리는 당신을 위해 100개의 자주 사용하는 개발자 학습 사이트를 정리했습니다