I probably won't use a LUT in this particular EG case. But still, the simple guide line still holds most of the time. i.e. I've never seen a randomly accessed LUT outperform serial/ordered/localized access. At worst, they'd perform the same when many many other things are happening. So in terms of good coding design, one should always try to localize access to memory during a certain time window, the longer that window is, the better. Or in otherwords, traversal order matters. Saying it differently, data access patterns matter. All these are nutshellsmystran wrote: ↑Sat Jul 10, 2021 5:20 pm The problem with anything having to do with cache is that there really is no nutshells and there is no simple answers. Sequential access as such usually mostly doesn't even matter (well, again "it depends"), but using complete cache lines is much faster than fetching a few bytes here and another bytes for somewhere else.
As for the order of code execution and it's cache interference with data access patterns. Modern CPUs have separate L1 data and instruction caches. This should help to a certain extent.
If you don't like nutshells, Here is a good one hour talk about it: https://www.youtube.com/watch?v=WDIkqP4JbkE