...

GregarianChild

403

Karma

2020-02-23

Created

Recent Activity

  • Chisel has a compiler to Verilog. That is not the problem. Many semi-companies use a tool-chain to generate much Verilog from higher-level sources.

    The rumour I heard was this: The problem with Chisel was that (at least in the past) the Chisel compiler did not preserve port structure well. So if you had a Chisel file that translated to 80M LoCs Verilog, then verified the 80M Verilog (which is very expensive), then made a tiny change to the source Chisel, the resulting new Verilog uses different port names even for the parts that were not affected by the change. (To quip: the (old?) Chisel compiler was a bit of a hash function ...) So you have to re-verify the whole 80M of Verilog. That is prohibitively expensive, compared to only reverifying the parts that truely need to change. The high verification costs forced by this problem were rumoured to nearly have sank a company.

    This is a compiler problem, not a Chisel language problem. I was told that the compiler problem has been fixed since. But I did not check this.

  • The "intersection of all sets such that" is not vague at all. It's perfectly formally defined in ZF* set theories. But it's impredicative. One of the guiding ideas behind type theories is to minimise impredicative constructions as much as possible. After all, impredicative definitions are circular ... Of course there is no free lunch and the power of impredicative constructions needs to be supplied in other ways in type theories ...

  • The reason that VLIW/EPIC architectures have not been successful that for mainstream workloads is the combination of

    • the "memory wall",

    • the static unpredictability of memory access, and

    • the lack of sufficient parallelism for masking latency.

    Those make dynamically scheduling instructions is just much more efficient.

    Dataflow has been tried many many many times for general-purposed workloads. And every time it failed for general-purposed workloads. In the early 2020s I was part of an expensive team doing a blank-slate dataflow architecture for a large semi company: the project got cancelled b/c the performance figures were weak relative to the complexity of micro-architecture, which was high (hence expensive verification and high area). As one of my colleagues on that team says: "Everybody wants to work on dataflow until he works on dataflow." Regarding history of dataflow architectures, [1] is from 1975, so half a century old this year.

    [1] J. Dennis, A Preliminary Architecture for a Basic Data-Flow Processor https://courses.cs.washington.edu/courses/cse548/11au/Dennis...

  • Modern GPU instructions are often VLIW and the compiler has to do a lot to schedule them. For example, Nvidia's Volta (from 2017) uses 128-bit to encode each instruction. According to [1], the 128 bits in a word are used as follows:

    • at least 91 bits are used to encode the instruction

    • at least 23 bits are used to encode control information associated to multiple instructions

    • the remaining 14 bits appeared to be unused

    AMD GPUs are similar, I believe. VLIW is good for instruction density. VLIW was unsuccessful in CPUs like Itanium because the compiler was expected to handle (unpredictable) memory access latency. This is not possible, even today, for largely sequential workloads. But GPUs typically run highly parallel workload (e.g. MatMul), and the dynamic scheduler can just 'swap out' threads that wait for memory loads. Your GPU will also perform terribly on highly sequential workloads.

    [1] Z. Jia, M. Maggioni, B. Staiger, D. P. Scarpazza, Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking. https://arxiv.org/abs/1804.06826

  • I'd be interested to learn who paid for this machine!

    Did Sandia pay list price? Or did SpiNNcloud Systems give it to Sandia for free (or at least for a heavily subsidsed price)? I conjecture the latter. Maybe someone from Sandia is on the list here and can provide detail?

    SpiNNcloud Systems is known for making misleading claims, e.g. their home page https://spinncloud.com/ lists DeepMind, DeepSeek, Meta and Microsoft as "Examples of algorithms already leveraging dynamic sparsity", giving the false impression that those companies use SpiNNcloud Systems machines, or the specific computer architecture SpiNNcloud Systems sells. Their claims about energy efficiency (like "78x more energy efficient than current GPUs") seem sketchy. How do they measure energy consumption and trade it off against compute capacities: e.g. a Raspberry Pi uses less absolute energy than a NVIDIA Blackwell but is this a meaningful comparison?

    I'd also like to know how to program this machine. Neuromorphic computers have so far been terribly difficult to program. E.g. have JAX, TensorFlow and PyTorch been ported to SpiNNaker 2? I doubt it.

HackerNews