On MIPS you can simulate atomics with a load-linked/store-conditional (LL/SC) loop. If another processor has changed the same address between the LL and SC instructions, the SC fails to store the result and you have to retry. The underlying idea is that the processors would have to communicate memory accesses to each other via the cache coherence protocol anyway, so they can easily detect conflicting writes between the LL and SC instructions. It gets more complicated with out-of-order execution...
loop: LL r2, (r1)
ADD r3, r2, 1
SC r3, (r1)
BEQ r3, 0, loop
NOPThere's also Dask, which can do distributed pandas and numpy operations etc. However it was originally developed for traditional HPC systems and has only limited support for GPU computing. https://www.dask.org/
Back in the 1990's. As an example, back then the Rational Rose design software had a feature to generate UML diagrams from existing source code, and it was called "reverse engineering".