...

gpderetta

12877

Karma

2014-02-28

Created

Recent Activity

  • Even if the prefetcher was capable of traversing pointers, it wouldn't help. The hypothetical benchmark wouldn't do anything other chasing pointers, and the prefetcher can't really do that any quicker. A traversing prefetcher is useful if the code actually does work for each traversed node, then the prefetcher (or the OoO machinery) could realistically run ahead.

  • In the end it depends exactly what you want to measure. Of course a load-load dependency will make everything as slow as the latency of the cache level you are accessing as that becomes the bottleneck.

    Traversing a contiguous list of pointers in L1 is also slower than accessing those pointers by generating their address sequentially, so adding a load-load dependency is not a good way to benchmarking random access vs sequential access (it is a good way to benchmark vector traversal vs list traversal of course).

    At the end of the day you have to accept that like caching and prefetching speedup sequential access, OoO execution[1] will speedup (to a lesser extent) random access. Instead of memory latency, in this case the bottleneck would be the OoO queue depth, or more likely the maximum number of outstanding L1/L2/L3 (and potentially TLB) misses. As long as the maximum number of outstanding misses is lower than the memory latency for that cache level, then, in first approximation, the cpu can effectively hide the sequential vs random access cost for independent accesses.

    Benchmarking is hard. Making sure that that a microbenchmark represents your load effectively, doubly so.

    [1] Even many in-order CPUs have some run-ahead capabilities for memory.

  • Actually you are right, what I said about reordering is nonsense. The compiler will definitely reorder non-aliasing accesses. There are much weaker properties that are preserved.

  • Yes, even a 20% success rate seems quite high.

  • > Considering that it's Undefined Behavior, quite possibly.

    Is it thought? Certainly it is according the C and C++ standards, but POSIX adds:

    > References to unmapped addresses shall result in a SIGSEGV signal

    While time-traveling-UB is a theoretical possibility, practically POSIX compliant compilers won't reorder around potentially trapping operations (they will do the reverse, they might remove a null check if made redundant by a prior potentially trapping dereference) .

    A real concern is if a null pointer is dereferenced with a large attacker-controlled offset that can avoid the trap, but that's more of an issue of failing to bound check.

HackerNews