David Patterson (who else) experimented with the idea almost 20 years ago. Idea was to put CPU into SRAM, remove caches and connect to the rest of the world with fast serial interfaces.
With water stacking, its not unreasonable to have high interconnect density for two different (highly optimized eg DRAM and FinFET) processes with high yield. I expect it to happen sooner than later. Latency to extremely large caches could be significantly improved. I expect it to happen.
An in memory processor is not simply a processor in a memory array. It's more like a processor based on cellular automata where each memory cell also performs computation.
My thinking is that analog signals are a more natural and effective processing medium for hardware neural nets. The "memory" component of the chip might be constant signals representing the weight of each (initial) connection in the network, which amplifies the down-stream signal.
Contrast this with digital circuits where the amplification of each transistor is a high/low signal from an upstream transistor which represents a 0/1, making it a switch.
Currently, PIM(Processing In Memory) has no reasonable programming model. Hardware people don't seem to understand that without programming model, hardware capability will remain unused. As long as this is not solved, PIM will continue to fail.
The most reasonable one I have seen is Ambit from Microsoft Research. Ambit looks nice for its proposed workloads, but it is still unclear how it can be extended to more general computation.
> How close would this be to a drop-in replacement for a GPU?
Very far away from it. A GPU is a clever bit of kit but it still has a memory bus, what is described here does away with that bus and moves computational capability right next to the memory cells.
An FPGA used for this kind of application would incur significant overhead compared to an ASIC, because you'd be wasting large chunks of the FPGA to remain idle.
If in-memory processors are going to be a thing you could prototype them very inefficiently on an FPGA, which would be an interesting thing to do but not interesting enough to give it a commercial edge in that particular form of packaging.
I meant more in spirit. What is an FPGA more than a very small computing device (logical function block) next to a memory (configuration of said function block)?
The configuration part of an FPGA is not at all like normal memory, it is much more like a flash device, and one that can't be as easily reconfigured as that due to the internal structure of the FPGA. It is more akin to a series of fuses that you can blow to create a circuit than memory, and if it were to be seen as memory it would be (P)ROM rather than RAM.
In order to make a part of the FPGA work as memory you'd have to use some of that capacity to use the logical function blocks to emulate the memory.
Indeed. Now imagine, what people already tried years ago on FPGA, rewriting the memory configuration in a machine learning, or evolution fashion. But with RAM instead of flash (-like) memory the iterations could go much, much faster.
http://iram.cs.berkeley.edu/
I'm guessing that mixing CMOS and SRAM into same process is hard to do in large scale.