Yeah, you said it more concisely: A/B while matching levels. And that's the kind of insight that I have more confidence in. As the devil's advocate, I could see how even correcting levels could be a real rabbit hole unless you know how to actively listen to a compressor, notch, flavors of noise etc. That is, you could spend quite a bit of time tweaking and really happy with the experiments, then turn it back down to comparable levels and arrive at the conclusion that the results are not subjectively stronger than the default 5 minutes ago.
Probably good to level match first, maybe with some smart-mixer or adaptive eq even. But again, this is the kind of feedback that I trust because you're putting in safety's not to fool yourself.
I like your approach for preserving transients. That could get really sophisticated with the syncopation of the tune.