WithinReasonHNs at duck dot com
Isn't this a software problem being solved in hardware? Ideally you would try to avoid going to memory in the first place by fusing the operations, which should be much faster than speeding up memory ops. E.g. you should never do an explicit im2col before a convolution, it should be fused. However it's hard to argue with a 0.019 mm2 area increase.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io