Post by bugothecat on Jun 7, 2022 15:39:49 GMT
After a lot of effort and experiments I managed to port my 3D software rendering engine from 3DO to PC-FX. This is still on going, there are a lot of things to solve and improve, one day I could try a bigger world instead of a 3d object in the middle of the screen. Or it might also be good for using it in a future demo. But it's mostly my curiosity how smooth can it be with simple objects, some gouraud and texture mapping or both (I use the texture mapping to do some pseudo-envmap in this). For comparison I also ported the engine to 386 and Gameboy Advance.
I have sections marked in the video if you want to go directly to the PC-FX related clips. In the beginning I am showing some code and trying it on emulator. But later I also capture it on the real hardware. I much later realized there is a huge bottleneck writing to KRAM, that I wasn't anticipating it to be so bad. I guess the KRAM is mainly to upload a background once and then do hardware scrolling, not continuously updating it, the console was meant for 2d stuff. But at least the 8bpp mode is not that bad, the bottleneck for fullscreen copy is 30-32fps. I tried different methods, OUT, write to the memory map corresponding to the hardware OUT, then the famed bit string functions. A fullscreen copy will max 30-32fps. 32 if I do OUT.W instead of OUT.H, aparrently OUTing 32bit at once works with slight improvement, even if one should OUT 16bit at a time. Also bitstring instructions didn't save the day. Now, for some reason the 16bpp mode is four times slower. Max 8bpp with fullscreen blit. In the captured video the 16bpp cube is faster at around 15fps, the reason is I changed the algorithm so that I don't blit the whole thing, but rather than the smaller region of the rendered object which is first saved in a backbuffer. That's why if you move the cube far away, the frame rate explodes. But not ideal for a full blown 3D world, only for small 3d object shows.
Anyway, it was interesting to try all these and be disappointed on the real thing. The CPU is a pretty good compared to the ARM on the 3DO. Purely on CPU the engine is quite slower sometimes. The NEC performance is closer to the GBA and 386DX in the video. I think on 8bpp it would be possible to port Doom oneday, it could be slow but possibly not slower than 3DO version. I need to see more the PC-FX hardware though, maybe I'll find something more, like I have never touched the hardware sprites. If they have scaling capabilities I could use them for few things.
I have sections marked in the video if you want to go directly to the PC-FX related clips. In the beginning I am showing some code and trying it on emulator. But later I also capture it on the real hardware. I much later realized there is a huge bottleneck writing to KRAM, that I wasn't anticipating it to be so bad. I guess the KRAM is mainly to upload a background once and then do hardware scrolling, not continuously updating it, the console was meant for 2d stuff. But at least the 8bpp mode is not that bad, the bottleneck for fullscreen copy is 30-32fps. I tried different methods, OUT, write to the memory map corresponding to the hardware OUT, then the famed bit string functions. A fullscreen copy will max 30-32fps. 32 if I do OUT.W instead of OUT.H, aparrently OUTing 32bit at once works with slight improvement, even if one should OUT 16bit at a time. Also bitstring instructions didn't save the day. Now, for some reason the 16bpp mode is four times slower. Max 8bpp with fullscreen blit. In the captured video the 16bpp cube is faster at around 15fps, the reason is I changed the algorithm so that I don't blit the whole thing, but rather than the smaller region of the rendered object which is first saved in a backbuffer. That's why if you move the cube far away, the frame rate explodes. But not ideal for a full blown 3D world, only for small 3d object shows.
Anyway, it was interesting to try all these and be disappointed on the real thing. The CPU is a pretty good compared to the ARM on the 3DO. Purely on CPU the engine is quite slower sometimes. The NEC performance is closer to the GBA and 386DX in the video. I think on 8bpp it would be possible to port Doom oneday, it could be slow but possibly not slower than 3DO version. I need to see more the PC-FX hardware though, maybe I'll find something more, like I have never touched the hardware sprites. If they have scaling capabilities I could use them for few things.