FIB in PowerPC assembly and in JONESFORTH

Recently I’ve been teaching myself PowerPC assembly through porting JONESFORTH to PowerPC on Mac OS X. It struck me to run the same little fibonacci-sequence microbenchmark that I ran lo these many years past. The results were interesting:

Language Implementation Detail Time (per (fib 29) call, in milliseconds) Ops/s Ratio (opt. C) Ratio (unopt. C)
PPC assembly - 24 935983000 0.43 0.205
FORTH JONESFORTH ported to PPC 277 81096000 4.95 2.37

The hand-coded assembly beats all the other entrants (perhaps unsurprisingly). The naive indirect-threaded FORTH is the fastest interpreted language, merely 5 times slower than fully optimised C.

Here’s the JONESFORTH code:

: FIB DUP 2 >= IF 1- DUP RECURSE SWAP 1- RECURSE + ELSE DROP 1 THEN ;

and here’s the PPC assembly (arg and result in r3):

_SFIB:  cmpwi   r3,2
        bge     1f
        li      r3,1
        blr
1:      mflr    r0
        stw     r0,-4(r1)
        addi    r3,r3,-1
        stwu    r3,-8(r1)
        bl      _SFIB
        lwz     r4,0(r1)
        stw     r3,0(r1)
        addi    r3,r4,-1
        bl      _SFIB
        lwz     r4,0(r1)
        add     r3,r3,r4
        lwz     r0,4(r1)
        addi    r1,r1,8
        mtlr    r0
        blr