It's not so much the 'C paradigm' that doesn't fit these machines, but really the ABI features C relies on to support separate compilation units. You could have a C compiler that's restricted to handling complete programs with no "extern" references of any kind (other than those that might be specified by some custom, asm-like mechanism) and it could absolutely match hand-written asm, given a good optimizer.
Are there any Z80 compilers with link time optimization? This is the exact problem that lto is designed to help with. ie. rather than emitting machine code in the object files (or in addition to), emit compiler IR so you can do additional optimization passes with the whole binary present at link time.
One C compiler I used 30 years ago sacrificed reentrancy for variable folding. It did call chain analysis so that it could alias local variable and function parameters to single global scratch pad variables.
That allowed you to write non trivial programs in C for the 8031/32 without external RAM.
Edit: Wow, as a result of your other comment about GCC I was surprised to learn that none of the Z80 compilers I could find appear to support this sort of thing. It looks like there was (previously?) an effort to put together a Z80 backend for LLVM, but my search results didn't look promising.
Yeah, it's a newer concept for the compiler space (it relies on having a lot of compiler host memory, more than you might think), and I'm not sure that the more niche compilers have had time to get this feature yet.
I didn't mention GCC in that comment. The concept of default visibility is hardly GCC-specific.
Also, it is not clear that the thread is still on the topic of z80 specifically or whole-project LTO generally. If you find yourself doing the latter outside of z80, you may find GCC useful. And hidden visibility is often useful a good default.
If you want a z80-specific LTO option, sorry, I'm unfamiliar with any. That's a pretty niche ask.