Interesting systems to check:
TensTorrent Startup: https://www.tenstorrent.com/technology/
Their architecture offers more independent parallelism of the cores than the current GPUs/TPUs, or as they express it: "Fully programmable architecture that supports fine-grain conditional execution, dynamic sparsity handling, and an unprecedented ability to scale via tight integration of computation and networking"
Cerebras Systems: Building the World’s First Wafer-Scale Processor
https://youtu.be/LiJaHflemKUHow The World's Largest AI/ML Training System Was Built (Cerebras)
https://cerebras.net/product/