|
Post by redllama7 on Dec 5, 2021 17:41:50 GMT
Hi all,
I recently started looking into developing for the PC Engine. I have some previous experience developing for Nintendo systems, mostly the SNES. One big difference between the systems is that the PCE allows writing to VRAM at any time, unlike the SNES. I was wondering how is that capability used in practice in games.
Is it ok on the PCE to build the SATB directly in VRAM or would it be better to keep a shadow copy in RAM and upload it to VRAM during vblank? And for scrolling, is it possible to update non visible BAT columns and rows directly in VRAM or is it better to buffer the changes in RAM and upload them in vblank?
My last question is about VRAM bandwith. How much data can be transfered per frame / in vblank, assuming a resolution of 256*239 is used?
Thanks!
Est.
|
|
|
Post by dshadoff on Dec 5, 2021 18:09:30 GMT
It's perfectly fine to build the SATB in VRAM directly; in fact, it's preferable if you can manage it... it's just a little awkward for people who are used to accessing values in RAM (such as X- or Y-positions and pattern pointers). Not every sprite needs to be updated every frame, so performing a copy of an entire SATB's worth of data to VRAM is a bit of a waste of cycles (however, this is how HuC originally did it, for simplicity for people starting out on the PCE).
The transfer amount is a more nuanced question, because (a) you are not blitting a frame buffer, and (b) there are time slices available to CPU, and time slices not available, depending on various VDC register settings. Chances are, whatever amount you are planning to transfer will be OK. But that's a cop-out... so perhaps you can explain a bit more about why this question is important - your goals and your fears ? Generally, tiles and sprites are preloaded and then the accesses are merely to the BAT and SATB... but of course, VRAM isn't infinite.
|
|
|
Post by redllama7 on Dec 5, 2021 18:30:11 GMT
Thanks!
On the SNES for example, I handle scrolling by copying two tilemap rows and / or columns to VRAM during vblank. In the game loop I unpack the data from my map into some RAM buffers and use DMA during vblank to update the tilemaps. I was wondering if it is ok to skip the RAM buffers and vblank copy and write the data to the BAT directly. For a BAT size of 64x32, in the worst case updating two rows and cols, it would be 384 bytes.
Another thing that I usually do on the SNES for some sprites is to stream the animation frames directly from ROM to VRAM during vblank, instead of having all the sprite tiles loaded in VRAM at once. This is specially useful for player sprites. But without an estimate of the VRAM bandwith during vblank, it can be hard to know how many sprite tiles you can stream per frame for example.
Est.
|
|
|
Post by elmer on Dec 5, 2021 18:30:12 GMT
Hi all, I recently started looking into developing for the PC Engine. I have some previous experience developing for Nintendo systems, mostly the SNES. Welcome! Since you're coming from the SNES, can I assume that you're looking at developing in assembly-language? If so ... I really need to get back to uploading more of my example code to github! One big difference between the systems is that the PCE allows writing to VRAM at any time, unlike the SNES. I was wondering how is that capability used in practice in games. It gives us the freedom to avoid having to write queuing code for every transfer that we want to make to VRAM. The SNES has a far more sophisticated graphics hardware chip in it than the PC Engine, but it is a huge PITA to use (partly because of the inability to write to VRAM at any time), and the PCE can often produce similar results just with clever programming tricks. Is it ok on the PCE to build the SATB directly in VRAM or would it be better to keep a shadow copy in RAM and upload it to VRAM during vblank? And for scrolling, is it possible to update non visible BAT columns and rows directly in VRAM or is it better to buffer the changes in RAM and upload them in vblank? Remember that the SATB in VRAM is just a shadow, and it doesn't effect the active display until it is transferred to the hardware SATB inside the VDC chip (either manually or automatically at the start of vblank). So yeah, you can just write directly to the shadow in VRAM if you wish. Similarly, it is easiest (and faster) to just update the edges of the BAT inside your scrolling code, and not worry about queuing up the writes until vblank. My last question is about VRAM bandwith. How much data can be transfered per frame / in vblank, assuming a resolution of 256*239 is used? Well, a TIA instruction to VRAM should run at about 7-cycles (6-cycles for the TIA, plus 1-cycle penalty for the VDC write) per byte transferred, so that's about 1MByte per second, or about 17KBytes per frame. But that is just a theoretical figure, since you're not likely to do all of your transfers in a single TIA instruction, and as Dave said, there are a few times (particularly at the end of a line and the start of vblank) that the VDC may be busy and will halt the CPU if you try to write during that time. Practically, IMHO you're more likely to get around 8KB or less if you're doing some processing to figure out what to upload.
|
|
|
Post by dshadoff on Dec 5, 2021 18:51:09 GMT
Don't forget that a write to VRAM is word-sized, and needs two memory accesses. In practice, this can make it tedious, as you may only wish to update part of that word, but still need to commit at least the most-significant byte.
|
|
|
Post by redllama7 on Dec 5, 2021 18:53:42 GMT
Hi, Yes, I am using assembly, kind of. I use a high level assembler called wiz github.com/wiz-lang/wiz. The PCE backend has a few small issues but it seems to work ok so far. I have been using wiz for my SNES programming with good results. I am quite happy with it. Est.
|
|
|
Post by turboxray on Dec 5, 2021 19:48:53 GMT
Hi, Yes, I am using assembly, kind of. I use a high level assembler called wiz github.com/wiz-lang/wiz. The PCE backend has a few small issues but it seems to work ok so far. I have been using wiz for my SNES programming with good results. I am quite happy with it. Est. Hey that's pretty great! I'm a fan of high level assembly approaches. So yeah, SATB is in vram.. but it's a "buffer". The real SAT are a set of internal registers. The VDC sets a DMA period where it copies the contents from vram SAT buffer to the internal registers. So if you have a local copy in cpu ram, then it's a SATBB hahah. If you can manage it vram, then you'll save about 3-4% cpu resource (from not having to copy it to vram). SATB DMA happens on the VDC vblank period - that means when the active 'frame' defined by the VDC setup ends (which is not necessarily NTSC vblank). During active display, when the VDC isn't fetching sprite pixels in hblank, you have active cpu slots. Basically it's described as; if you break the active display into blocks of 8 pixels, and you assigned an access slot to each of those 8 pixels, the cpu is given like 4 access slots (every other access slot is for the cpu). In low res mode, each dot/pixel of that block is like 186ns. There's no way the cpu can write a WORDs worth of data in 186ns, so you'll never saturate it. The VDC can assert /RDY to the cpu if it's busy, and you access VRAM right before a slot or such, you'll get a fraction clock delay. So 7 cycle per byte via Txx will be like 7.2 cycles or whatever. However the alignment is. So pretty full access from the programmers perspective. Out side of the VDCs defined active frame, you have what's called 'BURST' mode, and every slot is available to the CPU - so no edge case fractional clock pauses/alignments. Things to be aware of; SATB DMA (if you have the flag set to auto, or have an update request pending) happens are soon as the active frame ends on the VDC settings. The speed of the DMA is dependent on what speed the VDC is running. I think in low res mode it takes 3 scanlines to transfer the entire SAT. If you touch VRAM during this time, the cpu will be completely paused until the DMA is finished. Not a big deal, if you're cycle counting for optimization, probably don't touch vram as soon as vsync interrupt fires for about 3 scanlines. Bandwidth for Txx block transfer to vram is slow. It's about 65 bytes a scanline. It also has the downside of stalling ALL interrupts. You can get a faster transfer rate if you embed your graphics as ST1/ST2 opcodes. This is 5 cycles a byte, and ~91 bytes a scanline. But you can get higher transfer than that; the VDC 'word' port has an LSB and MSB address for reading and writing to the VDC. The MSB is the latch. You can write once to the LSB, and keep writing to the MSB with different values. While this doesn't help you with Txx instructions, this does help you with ST1/ST2 method. Chances are you have some blank lines, or areas with repeating LSB bytes, and this will actually bring your transfer rate up higher than 91 bytes a scanline (2x the rate when that happens). ST1/ST2 has the downside that it takes up more rom/ram space for the equiv graphics, but the up side is that it's much faster (even on the low/floor end) and it doesn't stall interrupts (which is import if you're playing samples). The repeated LSB trick that gets higher transfer rates also means your overhead from the ST1 opcode is lower, so it's not an automatic 2x space overhead. I wrote a python script that takes a sprite sheet, and builds out my graphics in this format. As far as writing to BAT during on screen.. yeah if it's not showing up then definitely do it. For sprites though, I tend to double buffer so the active vram writes don't show up as mid screen changes. PCE vram is more flexible in layout than the SNES (to banks for sprites), so they can exist anywhere. And not having 2 or 3 tilemaps means that much more vram. SATB can be located anywhere in vram too. The BAT is of course fixed, but if you're not using the whole thing - you can use unused sections for tiles or sprites (which both can be in the entire address range). You can also write to the VCE at any time, mid screen, mid scanline, etc. Writing to it (any reg) will cause to not to read a pixel from the digit pixel bus coming from the VDC, and output the last color it had received. You can change res mid frame, or mid scanline, and do any CRAM color changes as well.
|
|
|
Post by elmer on Dec 5, 2021 20:45:09 GMT
That looks really cool! Bandwidth for Txx block transfer to vram is slow. It's about 65 bytes a scanline. It also has the downside of stalling ALL interrupts. You can get a faster transfer rate if you embed your graphics as ST1/ST2 opcodes. This is 5 cycles a byte, and ~91 bytes a scanline. turboxray is the platform's performance champion! If you choose to take the tradeoff of the 7 cycles-per-byte TIA instead of 5 cycles-per-byte ST1/ST2, and so save some ROM/RAM for more animations or other graphics, then it is recommended that you split your transfers into 32-byte chuncks at a time (approx 250 CPU cycles) so that you can still respond to raster and timer interrupts in a timely fashion. When it comes to raster interrupts, you have very few cycles after the interrupt to write any VDC register changes, so any delay (such as a TIA) can break things if you try to use the system in that way. The way that most PCE programs, including the System Card BIOS, handle things is to set off a raster interrupt on the line before the one that you want to change, and then you have hundreds of cycles of wiggle-room in which to write the VDC register changes before they are needed, and so you can get away with any IRQ delay caused by using the Txx instructions (in 32-byte chunks).
|
|
pokun
Gun-headed
Posts: 85
Homebrew skills: HuC6280 assembly
|
Post by pokun on Dec 6, 2021 23:45:49 GMT
So yeah, SATB is in vram.. but it's a "buffer". The real SAT are a set of internal registers. The VDC sets a DMA period where it copies the contents from vram SAT buffer to the internal registers. So if you have a local copy in cpu ram, then it's a SATBB hahah. To be picky with semantics, SATB is officially the internal VDC memory while SAT is the VRAM area that is copied to SATB using the VRAM-SATB DMA channel. Someone got them mixed up at some point I guess, and so Mednafen and English homebrew documentation got them wrong as well.
|
|
|
Post by turboxray on Dec 7, 2021 0:24:48 GMT
So yeah, SATB is in vram.. but it's a "buffer". The real SAT are a set of internal registers. The VDC sets a DMA period where it copies the contents from vram SAT buffer to the internal registers. So if you have a local copy in cpu ram, then it's a SATBB hahah. To be picky with semantics, SATB is officially the internal VDC memory while SAT is the VRAM area that is copied to SATB using the VRAM-SATB DMA channel. Someone got them mixed up at some point I guess, and so Mednafen and English homebrew documentation got them wrong as well. SAT stands for Sprite Attribute Table. SATB is Sprite Attribute Table Buffer. Why would you call the set of internal "registers" a "buffer", when the vram location actually acts as the buffer? Just because it's named VRAM-SATB, doesn't mean it necessarily mean it's "source-destination DMA". If the official docs does actually calls the actual SAT set of registers the SATB, then I'd say the documentation got it wrong haha. Wouldn't be the first time.
|
|
|
Post by dshadoff on Dec 7, 2021 0:47:39 GMT
...So I went back to the Develo documents and the American developer docs, and pokun is correct. It doesn't actually make sense, but that's the way it is.
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Dec 7, 2021 11:39:45 GMT
I think using Txx when the VDC is parsing the SAT,can make some glitches/missing datas .I experienced this on the hardware, thing that does not happen with the classic method, so LDA/STA . I guess this is because Txx cannot be stalled when already started,and you end up with datas writing nowhere.
|
|
|
Post by turboxray on Dec 7, 2021 17:37:14 GMT
I think using Txx when the VDC parse the SAT,can make some glitches/missing datas .I experienced this on the hardware, thing that does not happen with the classic méthod, so LDA/STA . I guess this is because Txx cannot be stalled when already started,and you end up with datas writing nowhere. /RDY will stall Txx. From the tests Ki did, the cpu can be stalled indefinitely without issue. Txx just can't be interrupted via an interrupt request. If you have a rom with that issue, I can take a look with my logic analyzer.
|
|
pokun
Gun-headed
Posts: 85
Homebrew skills: HuC6280 assembly
|
Post by pokun on Dec 7, 2021 18:00:14 GMT
It's not just the American dev docs though, the Develo manuals uses the same terminology in Japanese so it's most likely not simply a translation error. The terms are very consistently used in all these manuals.
Elmer once said you can think either SAT or SATB of as the "original" or the "buffer" so I don't think it's that illogically named (but I think this is why they where mixed up in the first place by English homebrew pioneers). VRAM-VRAM DMA and VRAM-SATB DMA channels are definitely named as "source-destination" where VRAM means SAT in the latter one (so "SAT-to-SATB DMA") and this is also consistently used, though the English translated manual replaces "DMA" with "block transfer" or something like that if I remember correctly.
SAT also goes very well with BAT (though there is no "BATB"). If it's really a mistake, I guess it might have been done by Hudson themselves early on when making the original documentation for their chips, then they made it official.
Back to the topic of the Famicom, SAT kinda corresponds to the Famicom's OAM as well as the OAM buffer in RAM while SATB is like the second internal OAM buffer that the Famicom PPU uses internally when drawing the sprites. Though this second OAM buffer is filled automatically and completely inaccessible by the user (unlike SATB on the PC-Engine where DMA is used) so it's not completely analogous of course.
|
|
|
Post by turboxray on Dec 7, 2021 18:22:42 GMT
From my experience in real world embedded software design and having to deal with requirements and technical documentation - errors like that propagate like that often haha. It's becomes a copy paste thing - no one bothers to change it (even when others know better). Sega's Genesis and SegaCD documentation is full of errors and had a lot of bulletin updates to correct it. Engineers aren't always cognizant of how things work (or termed) in the rest of the world (standards). I have stories
|
|