MPI Programming

     Hi niners, I have a different query today, this relates to my master's work rather than my job. Not sure if anyone here is familiar with MPI but here it goes, I have this:

    MPI_Init(&argc, &argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
      MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
      if(my_rank == 0) {
        for(i=1; i<=n;++i)
          a[i-1] = i-1;
        // Distribute                                                                                                        
        MPI_Send(a, n, MPI_INT, my_rank+1, n, MPI_COMM_WORLD);
      } else {
        // Receive and distribute initial values                                                                             
        MPI_Recv(a, n, MPI_INT, MPI_ANY_SOURCE, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
        k = status.MPI_TAG/2;
        if(k!=0) {
          MPI_Send(a+1, k, MPI_INT, my_rank+1, k, MPI_COMM_WORLD); //pass sub-array [1..k]                                   
          MPI_Send(a+k+1, k, MPI_INT, k+my_rank+1, k, MPI_COMM_WORLD); //pass sub-array [k+1..2k+1]                          
        partial = a[0];
        printf("rank: %d\tvalue: %d\n", my_rank, partial);
        // Carry out Prefix Sum algorithm : for 1 to log2(n)   - THIS IS WHERE IT FAILS                                                          
        j = (int)(log((float)n+1)/log((float)2));
        //printf("Log(n) = %d\n", j);                                                                                        
        for(i=0; i<j; ++i) {
          k = (int)(pow((float)2,(float)i));
          printf("**rank: %d, k=%d\n", my_rank, k);
          if(my_rank+k<=n) MPI_Send(&partial, 1, MPI_INT, my_rank+k, 0, MPI_COMM_WORLD/*, &request*/);
          if(my_rank-k>=1) {
            MPI_Recv(&value, 1 , MPI_INT, my_rank-k, 0, MPI_COMM_WORLD, &status);
            partial += value;
        printf("rank: %d\tpartial: %d\n", my_rank, partial);

    And it's deadlocking. It works about 80% of the time and the other time, it fails. I've noticed that when it fails, a process usually sends messages, ends before its receiver has received yet. Does anyone know if an MPI process can send message and finalize before message is received?

    I can't see why it's deadlocking, any ideas?

     Yay, I figured out the problem. I'll leave it up for any curious coders that want to spot the error, if you're able to spot it, treat yourself to a cookie. 

    Didn't really look too much into your code, but...I haven't done MPI in a while but usually the deadlock is a result of bad send and receive ordering, using the wrong pairing of send and receive (one blocking, the other non blocking), or using non-blocking calls without accounting for where the slave or master is in the process.

