Honestly, is there any better way to spend a day (or two) on Pi Day (March 14 = “3.14”) than to write some code to compute the value of Pi?!
You may remember from your math coursework Newton’s Formula, which is easy to implement:
Or, you may have seen the Monte Carlo method. You could “speed this up” using a CUDA GPU implementation, e.g., http://cacs.usc.edu/education/cs596/src/cuda/pi.cu:
// Using CUDA device to calculate pi #include <stdio.h> #include <cuda.h> #define NBIN 10000000 // Number of bins #define NUM_BLOCK 30 // Number of thread blocks #define NUM_THREAD 8 // Number of threads per block int tid; float pi = 0; // Kernel that executes on the CUDA device __global__ void cal_pi(float *sum, int nbin, float step, int nthreads, int nblocks) { int i; float x; int idx = blockIdx.x*blockDim.x+threadIdx.x; // Sequential thread index across the blocks for (i=idx; i< nbin; i+=nthreads*nblocks) { x = (i+0.5)*step; sum[idx] += 4.0/(1.0+x*x); } } // Main routine that executes on the host int main(void) { dim3 dimGrid(NUM_BLOCK,1,1); // Grid dimensions dim3 dimBlock(NUM_THREAD,1,1); // Block dimensions float *sumHost, *sumDev; // Pointer to host & device arrays float step = 1.0/NBIN; // Step size size_t size = NUM_BLOCK*NUM_THREAD*sizeof(float); //Array memory size sumHost = (float *)malloc(size); // Allocate array on host cudaMalloc((void **) &sumDev, size); // Allocate array on device // Initialize array in device to 0 cudaMemset(sumDev, 0, size); // Do calculation on device cal_pi <<<dimGrid, dimBlock>>> (sumDev, NBIN, step, NUM_THREAD, NUM_BLOCK); // call CUDA kernel // Retrieve result from device and store it in host array cudaMemcpy(sumHost, sumDev, size, cudaMemcpyDeviceToHost); for(tid=0; tid<NUM_THREAD*NUM_BLOCK; tid++) pi += sumHost[tid]; pi *= step; // Print results printf("PI = %f\n",pi); // Cleanup free(sumHost); cudaFree(sumDev); return 0; }
Unfortunately, while these are easy to implement, they are terrible algorithms, converging on the value of Pi very, very slowly indeed–even using the parallelism of GPUs. Although I’ve read about Chudnovsky’s Formula, I never tried it, probably because I had my priorities wrong. Beside, all of these require an arbitrary precision math package, which I didn’t have.
But, it turns out there are some: MPIR (Multiple Precision Integers and Rationals), MPIR.NET, and LongCalc
So, I wish everybody a belated Pi Day with a simple C# program that uses LongCalc (Mpir.NET/MPIR).
namespace Pi { using System; using LongCalc; class Pi_ChudnovskyFormula { public static bf Compute(int times) { // https://en.wikipedia.org/wiki/Pi#Rapidly_convergent_series bf bigK = 6; bf bigM = 1; bf bigL = 13591409; bf bigX = 1; bf bigS = 13591409; bf mul = new bf("-262537412640768000"); System.Console.WriteLine("iterations " + times); for (int k = 1; k < times; ++k) { bf bfk = k; if (k % 1000 == 0) System.Console.Write(" " + k); bf bfk3 = bfk * bfk * bfk; bf bigK3 = bigK * bigK * bigK; bf bigK4 = (bigK * 16); bigM = (bigK3 - bigK4) * bigM / bfk3; bigL += 545140134; bigX = bigX.Multiply(mul); bf ml = bigM * bigL; bigS += ml / bigX; bigK += 12; } System.Console.WriteLine(); bf sq = new bf(10005).Sqrt(); bf pi = 426880 * sq / bigS; return pi; } } class Program { static void Main(string[] args) { int digits = 50000; LongCalc.bf.GlobalPrecision = (int)(digits * Math.Log(10, 2) * 2); // bits int times = LongCalc.bf.GlobalPrecision / 14; // approximate accuracy added to significand per iteration. bf r2 = Pi_ChudnovskyFormula.Compute(times); System.Console.WriteLine(r2.toString(10, false, digits)); } } }
3.1415926535897932384626433...