Monday, December 26, 2011

Graph convolution is a natural for SMP

Wiki has a good summary:
Mesh architectures avoid these bottlenecks, and provide nearly linear scalability to much higher processor counts at the sacrifice of programmability:
Serious programming challenges remain with this kind of architecture because it requires two distinct modes of programming, one for the CPUs themselves and one for the interconnect between the CPUs. A single programming language would have to be able to not only partition the workload, but also comprehend the memory locality, which is severe in a mesh-based architecture.[1]
A computer system that uses symmetric multiprocessing is called a symmetric multiprocessor or symmetric multiprocessor system (SMP system).[2][3] SMP systems allow any processor to work on any task no matter where the data for that task are located in memory, provided that each task in the system is not in execution on two or more processors at the same time; with proper operating system support, SMP systems can easily move tasks between processors to balance the workload efficiently.

Graphs are self directed, nested form with forward pointer, they meet the SMP architecture. So we can conceive of an ideal graph layer. Read in a whole bust of nested form into shared memory. No matter if they comne from one graph or a zillion graphs as long as the graphs meet syntax.

The get some microprocessors, don't matter how many, have all the processors get access to shared memory. Each micro processor execute the convolution function at the request of the data. The only thing we need to care is to make sure the threads of convolution coalesce into the proper output segments, more of an order problem than a shared memory problem.

I think we are looking for SMP processors with four, maybe eight, independent processors doing convolutions into o and out of a huge chunk of ram. Every so often when a huge chunk of ram has become free, we load in a huge block of nested stored.

The issue of multi-processing has no context in self directed graph convolutions. Ram bandwidth being about 10,000 times the disk io bandwidth, naturally, one disk per eight SMP processors, and pile on the size of the shared ram. The ultimate efficiency comes when you have enough processors to chew up ram bandwidth and then the disk rate and the ram rate are equalized.

SMP and simplicity engineering match here.

No comments: