Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Why do C to Z80 compilers produce poor code? (2018) (retrocomputing.stackexchange.com)
115 points by eklitzke on May 30, 2020 | hide | past | favorite | 51 comments


As noted in the first post, the Z80 is still much easier to generate code for than the 6502.

I see everyone there seems to be referring to free/open-source compilers, but I know of one commercial compiler for Z80 (IAR) whose output is quite good, and I have linked to in this previous comment I made on a different article about Z80 C compilation:

https://news.ycombinator.com/item?id=18902501

If one didn't know better, it would look almost like handwritten Asm.

On the topic of architectures which are difficult to compile C to, I can also think of several more: 8051, low-end PIC series, and these: https://cpldcpu.wordpress.com/2019/08/12/the-terrible-3-cent...

...and most of those have C (or at least C-subset) compilers too.


One thing that, I think, won’t work well is adding a backend to generate code for such much less powerful CPUs to a compiler-linker system designed for more powerful CPUs.

Because saving and restoring register values is fairly expensive (both in time and in memory) you shouldn’t do register allocation per function, or commit to a fixed ABI. The best assembly to generate for a function often will depend on what it calls or where it is called from. Let’s say you need a register or flag to be clear at some time. A function you just called might accidentally already do that, or it might be cheaper to change that function’s code or register allocation to give it the desired side-effect than to have the function that needs the register or flag cleared waste an entire byte to do that.

You basically need a whole-program (compiler-linker) to generate good code for these CPUs.

That also helps in moving local variables to global addresses (you likely will have to, because of limited stack space) and in having such variables share memory locations as much as possible (if foo doesn’t (indirectly) call bar and vice versa, ant hey aren’t recursively called, their local variables can share the same memory addresses. Again, you’ll likely have to, because memory is limited)


> You basically need a whole-program (compiler-linker) to generate good code for these CPUs.

Modern compilers can do this, 'link time optimisation'.


That’s true, but I don’t think they do to the extent that is needed to generate good Z80 or 6502 code. They won’t move local variables to global locations, for example. That often is essential on 6502 because of the limited stack space.

I also doubt they’ll use status flags to return data from functions. Such micro-optimizations just aren’t that useful on heavily pipelined CPUs with zillions of registers.

But yes, a link-time optimizer would work, but it would have to be built for these architectures, not, as in gcc’s or LLVM’s case, for the intermediate language.

Edit: for stuff that I don’t see a traditional link time optimizer do, look at CHRGET and CHRGOT in http://unusedino.de/ec64/technical/project64/mapping_c64.htm....

Semi-equivalent C code (leaving out returning info in the status flags) is:

   void * addr;
   void * CHRGOT() { return addr; }
   void * CHRGET() { while(shouldSkip(++addr)); return addr;}
The assembly code uses self-modifying code for speed, and makes CHRGET fall-through into CHRGOT, adding an unnecessary while-loop to CHRGOT.

Also, there’s the choice as to what flags to use for returning info.


See also the GCC flag, "-fwhole-program."


To my knowledge, none of the compilers that support these architectures support LTO... bit of a chicken and egg; since C historically wasn't a great fit, support never grew to that point where it'd fit better.


It appears there's actually a project benchmarking a bunch of Z80 C compilers:

https://github.com/z88dk/z88dk/wiki/Benchmarks

Among IAR, there's also HITECH C Compiler, which, when it is not disqualified for bad results, seems to have some impressive numbers at times. (And the project's own C compiler, when it does not hit a pathological case.)


The accepted answer is somewhat true but also has some glaring inaccuracies.

The fact is that there are/were some very effective C compilers for Z80, none of them free and far from it. By the time compiler technology advanced through the 80s and 90s the market for a non commercial optimizing compiler for the Z80 and 6502 rapidly diminished. gcc has never in its history been a fantastic 8 or 16 bit compiler. By the time egcs was merged even 68k was rapidly losing relevance.

It is totally possible with modern techniques to make a very effective optimizing C compiler for Z80, but who’s going to do it? There are very few hobbyists with both the chops and the time for something with little commercial value.

The answer is correct in implying that the PDP was better suited as a CISC target for C when optimization is not done.


I see a lot of Z80 enthusiasm on the web, is there a good forum/website to talk to the hard core? I just finished porting the classic Sargon Z80 chess program to run on x86, https://github.com/billforsternz/retro-sargon . One of the side effects of the project is Sargon in real Z80 assembly language, rather than the weird 8080 hybrid it was originally coded and published in, and I am sure there will be a hard core Z80 fan or two who would appreciate that.


This is the best one I found. Quite active too.

https://groups.google.com/forum/#!forum/retro-comp


So if C paradigm doesn't fit the Z80 architecture, does anyone know if there's a language (other than Z80 assembler) that does. Or do Z80 enthusiasts generally just write everything in asm?


It is possible to write a Forth implementation[0] for the Z80 that uses just enough registers to get the execution model on top.

[0] https://github.com/siraben/ti84-forth/blob/bc0140d641a121a4f...


Not only is possible. Even was a comercial computer using Forth instead of BASIC. And it was pretty similar to the ZX Spectrum... I'm talking about the Jupiter Ace

https://en.wikipedia.org/wiki/Jupiter_Ace

Also, there is an obscure french Z80 computer that uses Forth : Hector HRX


It's not so much the 'C paradigm' that doesn't fit these machines, but really the ABI features C relies on to support separate compilation units. You could have a C compiler that's restricted to handling complete programs with no "extern" references of any kind (other than those that might be specified by some custom, asm-like mechanism) and it could absolutely match hand-written asm, given a good optimizer.


Are there any Z80 compilers with link time optimization? This is the exact problem that lto is designed to help with. ie. rather than emitting machine code in the object files (or in addition to), emit compiler IR so you can do additional optimization passes with the whole binary present at link time.


One C compiler I used 30 years ago sacrificed reentrancy for variable folding. It did call chain analysis so that it could alias local variable and function parameters to single global scratch pad variables.

That allowed you to write non trivial programs in C for the 8031/32 without external RAM.


I remember using that one as well, Keil right?


It is probably pretty realistic at this point to compile any program for a Z80 into a single compilation unit.


Sure, but codebases aren't structured that way.


Concatenate it into an amalgamated unit at build time, then. This only adds a style requirement in that you can't use linking to enforce namespacing.


To expect the compiler to take advantage of that, you'd need to mark pretty much every function with static.


I believe the LTO implementations of modern compilers handle this for you? (https://clang.llvm.org/docs/LTOVisibility.html)

Edit: Wow, as a result of your other comment about GCC I was surprised to learn that none of the Z80 compilers I could find appear to support this sort of thing. It looks like there was (previously?) an effort to put together a Z80 backend for LLVM, but my search results didn't look promising.


Yeah, it's a newer concept for the compiler space (it relies on having a lot of compiler host memory, more than you might think), and I'm not sure that the more niche compilers have had time to get this feature yet.


-fwhole-program? Or -fvisibility=hidden.


GCC doesn't have a mature Z80 backend.


I didn't mention GCC in that comment. The concept of default visibility is hardly GCC-specific.

Also, it is not clear that the thread is still on the topic of z80 specifically or whole-project LTO generally. If you find yourself doing the latter outside of z80, you may find GCC useful. And hidden visibility is often useful a good default.

If you want a z80-specific LTO option, sorry, I'm unfamiliar with any. That's a pretty niche ask.


> I didn't mention GCC in that comment. The concept of default visibility is hardly GCC-specific.

Those options you gave are GCC-specific.

I don't know of a mature Z80 compiler that supports LTO.

The thread at this point is on the topic of "so your compiler doesn't support LTO, how can we 80/20 our way to the same semantics"


What about PL/M, the language invented by Gary Kildall in 1973 and used to implement CP/M?

(PL/M targets, like CP/M, the 8080 instruction set, of which the Z80 provides a superset.)

https://en.wikipedia.org/wiki/PL/M


Late 80s and just out of uni I worked on a mature PL/M codebase for flight simulator visual systems. Happy days!


There's a number of Forth implementations for 8-bit computers.

Chuck Moore has claimed "A Forth application can take 1% the code employed by the same application written in C" [1], which is certainly a big benefit on a small computer.

[1]: https://colorforth.github.io/1percent.html


There was even one 8-bit machine which shipped with Forth as standard, rather than BASIC: https://en.wikipedia.org/wiki/Jupiter_Ace


I wrote z80 asm from the early 80s to the early 90s and definitely never felt the need to grab for a higher level language. Friends told me to try C/Pascal, but I found them clunky, slow and inefficient. Thing is, you could not write very large programs anyway: I could get up to 128kb but it was very rare to have code that large; most of it was game data for instance. If I would have a Z80 now with megabytes of memory (which I do by the way), I would still write asm for it if the project was for fun; if it were serious, then I would try (probably) a commercial compiler. Maybe Forth, as was mentioned here; I really like Forth and use it on new embedded devices to go spelonking.


Z80 is just not a great architecture these days so most people would pick a different platform. We have a wealth of other low-powered, cheap 16- or 32-bit microcontroller architectures without the same limitations.

Most of the TI-83/84 scene was hand-written z80 assembler.


The Z80 enthusiasts in the TI-83/84 arena used to just write everything by hand in asm. Although since TI just killed that with their latest OS update, that's going to be a decent chunk of the Z80 enthusiast market gone overnight.


There's still a bazillion 83/84 devices out there which have absolutely no way of updating the OS.


Eh? The OS can be updated just fine, or are you just saying that all those devices won't be updated instead of can't be updated? Because that's going to be true for the enthusiasts & people no longer going through standardized testing, definitely. For those doing standardized testing, though, it wouldn't be all the much of a shock if tests mandated / provided an update to use the calculator on the test.


Most of the earlier 83’s had ROMs.


You can try Copilot https://copilot-language.github.io/ (which does use C as backend, but this C is restricted to be microcontroller friendly)


FORTH ran pretty decently on any 8 bit architecture. While probably not great in speed, it generated very compact code, which mattered a lot with memory sizes in those days.


Turbo Pascal was born on Z80s. The Turbo Pascal wikipedia page says Anders Hejlsberg released his first Z80 Pascal in 1981. And BASIC, of course, runs on everything.


Goddammerung, I cannot find my Lisp interprete/compiler source code for Z80. Especially from time before I constructed the external memory bank. There was nothing comparable anywhere until Turbo-Pascal came to be. It was originally Forth-like stack machine, but then I implemented another stack, the "execution stack". "(" and and ")" symbols were push/pops on that stack and thus "(+ 1 2)" was directly executable in my Forth. OTH in this machine the "compiling" was just replacing symbolic linked lists with a sequence of direct subroutine calls. https://timonoko.github.io/Nokolisp.htm


Nice to see this here, yes in what concerned the 8 bit world and the 16 bit home micros (PC, Amiga, Atari) C compilers just sucked.

Anyone that was writing code where performance mattered was using Assembly.

Even on 16 bit platforms, although high level languages were already quite common, in what concerns games (or demoscene), they were our "Unity".

Good enough for prototyping, but also full of inline assembly.

I once saw a game submission to a Portuguese newspaper (for a MS-DOS game), which the only C that it had were data structures and function declarations, the bodies were 100% inline Assembly.

And by the time Pentium arrived, unless one was buying Watcom, there wasn't a good C / C++ compiler that could take advantage of the pipelines and instruction scheduling in a proper way, as described by Abrash books.


Yeah, I was so excited to get my hands on UCSD Pascal because there wasn't a C compiler for Atari 8-bit. It was multi-diskette and sooo slow I never really ended up using it.

The "Action!" programming language cartridge was the best thing ever. Somewhere between C and machine language. Being in ROM it booted instantly and contained a built-in editor, compiler, runtime, and monitor. It compiled about as fast as Go does now on modern hardware.

It wasn't until I got an Atari ST (Megamax was good) or PCs that C I got to use C compilers at home.

[0] https://en.wikipedia.org/wiki/Action!_(programming_language)


Cool, I wasn't aware of Action!.


Well, while certainly imperfect, some like SDCC are good enough to write some games with nontrivial logic.

Here’s an open source game (disclosure: I wrote it) that compiles with SDCC and runs a game involving a flood fill algorithm, a tiny GUI with checkboxes and buttons and only one assembly routine, all running on the Amstrad CPC464 and above: https://github.com/cpcitor/color-flood-for-amstrad-cpc

It’s much easier to write and much faster (and probably smaller) to run than the same in BASIC.


From what I've heard, SDCC generates pretty bad z80 code. It might be good enough if your application isn't super demanding.


Write in C and implement portions requiring higher performance in assembler.

Nineties style.


The whole point of the project was to go from "from what I've heard" to "I observe (and it's remarkable)". Write non-trivial code, see it run, look at the generated ASM.

You can git clone the project above in a Debian/Ubuntu or similar, do "make" and have a look at the .lst listing of generated assembly with opcode bytes.

I consistently observed that while z88dk produced extremely verbose code, with many steps back and forth and many extra push and pop, SDCC produced sane code.

Of course it helps a lot when you "const" all relevant variable, and "const type const (star)" all relevant pointers when applicable, when you use uint8_t instead of int. Else generated code is heavy because it pays the code of high genericity that is probably not necessary. Help the compiler and, if it's a good compiler, it will help you. SDCC is a good compiler in this regard. This is especially important in the Z80 context because registers are smaller than the default integer types.

I observed good code produced by SDCC. Too long to explain here but for example it allocates local variables to registers when applicable and code is somehow close to hand-written code (like in answers of the SO question).

For example, it replaces a memset() C function call with this code which fits what one SO answer calls "kind of code one writes where optimization does not matter", which z88dk could not do:

                            243 ;model.c:132: memset( &connections, 0, sizeof( connections ) );
   00A2 21r68r00      [ 3]  244  ld hl, #_connections
   00A5 06 19         [ 2]  245  ld b, #0x19
   00A7                     246 00244$:
   00A7 36 00         [ 3]  247  ld (hl), #0x00
   00A9 23            [ 2]  248  inc hl
   00AA 10 FB         [ 4]  249  djnz 00244$
(you might have zeroed a and done ld (hl),a, or even ldir, but this is still correct code)

The next one below looks like it's easy. It is, if the compiler can figure out what is constant, and because you write sane C code. Garbage C code would yield garbage ASM code.

Line uint8_t column = 20 - ( sizeof( message ) - 1 ) / 2; below yields no code, zero bytes. push de is the C ABI

Code: [Select]

                             63 ;uint8_t column = 20 - ( sizeof( message ) - 1 ) / 2;
                             64 ;testfixture.c:23: fw_txt_set_cursor( 2, column );
   4058 11 02 03      [ 3]   65  ld de, #0x0302
   405B D5            [ 4]   66  push de
   405C CD 43 41      [ 5]   67  call _fw_txt_set_cursor



Another Z80-specific consideration: Z80 is not made to access data off a stack. From the 8080 it added IX and IY index registers. You can `ld someregister, (ix-+127)` and family. They are somehow handy (somehow short code) but slow. SDCC uses this extensively.

Chart of Z80 instruction timings: https://wiki.octoate.de/lib/exe/fetch.php/amstradcpc:z80_cpc...

Z80 is more comfortable with all-fixed-addresses code. In a function, making local variables static kills the reentrancy but then you can replace the slow `ld someregister, (ix-+127)` with faster and shorter `ld a,(someaddress)`. I observed 20% reduction in SDCC-generated code in a big algorithmic function, which is attributable both to Z80 constraints and ability of SDCC somehow navigate those decently. In complex functions, code generated by SDCC is not extremely nice, but it's decent. It'd good enough so that only the performance-critical parts have to be written in assembly.

On another side, SDCC supports most (all?) of C11. I've tested it, you can e.g. print(var) like you'd stdout << var in C++ and let the compiler do what you mean. Unusual in a microcontroller context.


I think one solution would be to have another language that's a bit more abstract than ASM maybe with some variables and register spilling and some macros, so you can write ASM more easily. That language could be transpiled into ASM, but would need to be debugged as ASM as well then, so you still need to know assembler very well.

I tried to do that a couple years ago with little success (readme and feature list missing, sorry but maybe you get the idea from the repo): https://github.com/NanoWar/FunkCompiler


They produce about as good code as can be produced for any 8-bit (accumulator size) processor (there are some 16-bit instructions for some registers). The PDP-11, for which C was designed, has a 16-bit register architecture for all registers.


avrgcc would disagree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: