I think you're missing the point I'm trying to make. You don't want any matrix-vector products. You want to expand (and then optimize) the whole thing as scalar code and you want to do whatever high-level optimisations that make said scalar code as short as possible (ie. least amount of FLOPs). Working at this level, the best strategy I know of involves using partial LU factorisations (and tracking fill-in to take advantage of whatever sparsity is left), but feel free to compare other options.Fire Sledge - Ohm Force wrote: ↑Sun Jun 07, 2020 8:20 pm That’s why the DK method is nice: it formalizes what you described. The outer loop is just made of a few matrix-vector products. Non-linearities are isolated so the inner loop can focus on solving the voltages or currents at the junctions
Basically if you don't have a high-level optimizing code-generator that spits out something that looks like glorified assembly then you're doing it wrong. You probably don't want to produce actual assembly though, because you want to take advantage of your compiler's register allocator (which is basically the single compiler optimization that gives you the most speedup here).