PCE is unique in that it has an internal buffer and 16 hardwired registers for each sprite on a scanline. The sprite table is transfered into the a special area of the chip right before the start of the frame. ...
I was looking at the offline archive of the old PCEFX Development threads, and saw that Bonknuts had already talked about how the sprite system loads pixel-data during the hblank, and that the CPU will be delayed if it tries to read/write VRAM during this time.
I do miss his particular passion and technical curiosity in our little PCE community.
As we discussed in the PC Engine CDROM 'BIOS' Interrupts thread, when it comes to split-screen displays and parallax scrolling, there is a delay between the VDC signalling an RCR interrupt, and when the VDC locks the scroll registers for the next line.
All indications are that the VDC actually fires the RCR interrupt right at the end of the HDW display period.
That means that you only have a very short window in which to write new scroll values into the VDC in order to set them before the next line starts.
BUT, if you do write an IRQ1 interrupt-handler to do things this way, then you are going to have to make sure that nothing can possibly delay your response to the interrupt, which means that you have to be VERY careful about handling simultaneous timer interrupts, or ADPCM-streaming interrupts, and you can totally forget about using the TIA instruction to quickly upload new graphics to VRAM!
The way that the hardware is supposed to be used is that you generate an RCR interrupt two screen-lines before the line that you want to change, then wait until the next line's scroll registers are locked for that line, and then you set the new scroll values for your target line.
The advantage of doing things this way is that your safe-window to write the new scroll values goes from being a couple-of-dozen cycles, into being a couple-of-hundred cycles, and so you can afford some variation in your interrupt-response timing, and thus you can safely use timer interrupts, ADPCM-streaming interrupts, and the TIA instruction to upload new graphics to VRAM ... as long as you are careful with your code-design.
That is the way that the System Card's RCR interrupt handling works, and if you look at it, it even has a couple of "bsr-to-rts" 15-cycle delays in there to make sure that it gets the timing right.
Unfortunately, the System Card's code wasn't tested very well, and while it does work correctly for 256 & 336 pixel width displays, it doesn't work properly for 240 & 320 pixel width displays (the X-scroll value is set too early!).
The critical thing to know when designing your interrupt code, is exactly how many CPU cycles occur between when the RCR interrupt triggers, and when it is safe to write the scroll registers, knowing that they have already been locked for the next line's display.
I've done some tests, and here are the results...
5.36MHz (with MWR = $x0)
Safe to write BYR @ 100 cpu cycles if width=240 hdw=$1D Safe to write BYR @ 90 cpu cycles if width=248 hdw=$1E Safe to write BYR @ 79 cpu cycles if width=256 hdw=$1F Safe to write BYR @ 67 cpu cycles if width=264 hdw=$20
7.16MHz (with MWR = $x0)
Safe to write BYR @ 106 cpu cycles if width=320 hdw=$27 Safe to write BYR @ 98 cpu cycles if width=328 hdw=$28 Safe to write BYR @ 90 cpu cycles if width=336 hdw=$29 Safe to write BYR @ 82 cpu cycles if width=344 hdw=$2A Safe to write BYR @ 74 cpu cycles if width=352 hdw=$2B
10.74MHz (with MWR = $xA)
Safe to write BYR @ 112 cpu cycles if width=480 hdw=$3B Safe to write BYR @ 107 cpu cycles if width=488 hdw=$3C Safe to write BYR @ 101 cpu cycles if width=496 hdw=$3D Safe to write BYR @ 96 cpu cycles if width=504 hdw=$3E Safe to write BYR @ 91 cpu cycles if width=512 hdw=$3F Safe to write BYR @ 85 cpu cycles if width=520 hdw=$40 Safe to write BYR @ 79 cpu cycles if width=528 hdw=$41 Safe to write BYR @ 75 cpu cycles if width=536 hdw=$42 Safe to write BYR @ 69 cpu cycles if width=544 hdw=$43
Note: The VDC's hde,hsw, & hds settings have *NO* effect!!!
Note: These cycle timings are to the write-cycle within the instruction, and not to the start of the instruction.
Note: The VDC shadows/locks the BYR register a cycle-or-two before the BXR register, so write BYR first.
So, where does this all RCR timing information leave us?
Let's imagine trying to create a section of the screen with a parallax scroll, where you want to change the BGX scroll value on every raster line.
That means that you need to change the BGX value approximately every 455 CPU cycles.
Let's also imagine that you are uploading a bunch of new graphics to VRAM at the same time, using a TIA instruction with a PCEFX-recommended 32-byte transfer to VRAM.
And finally, let's also imagine that you have 16-sprites-on-a-line in that area of the screen as some enemies go passed, and so the VDC needs to read 64 words of pixel data for the next line's display.
Having the CPU write to VRAM while the VDC is busy reading the next line's sprite data will cause the CPU to block until the VDC is finished reading the data that it needs.
32 byte TIA to VRAM -> 241 CPU cycles SPR DMA delays CPU -> 86 CPU cycles (64 VDC cycles @ 5MHz) == 327 CPU cycles
Safe delay to BYR in IRQ1 -> 100 cycles (with a 240-wide screen) == 427 cycles
This isn't a (major) problem (yet), because we actually have 455 + 100 - 2 to write the BYR before it is locked for the line that we want to change.
BUT ... we also have to possibly set a new CR value to enable/disable sprites, maybe change a color palette register, and finally, set up a new RCR line value for the next line's interrupt. And our IRQ1 handler should really handle all (reasonable) screen resolutions, so we might choose to make the "safe delay" be 112 cycles so that we can use the 480-wide screen resolution.
The timing is pretty critical in these circumstances, and the RCR code that we write is going to have to be carefully designed to do things in the right order, at the right time, to be predictable, and to not waste time ... or there will be unstable results and occasional screen glitches.
So what does the HuC RCR handler do?
32 byte TIA to VRAM -> 241 CPU cycles SPR DMA delays CPU -> 86 CPU cycles (64 VDC cycles @ 5MHz) == 327 CPU cycles
IRQ1 begins to RCR written -> 136 cycles == 463 cycles
IRQ1 begins to BGX written -> 208 cycles == 535 cycles (1 complete line + 80 cycles)
IRQ1 begins to BGY written -> 243 cycles == 570 cycles (1 complete line + 115 cycles)
Well, in this case, the RCR value is written too late to catch the interrupt on the next line, the BYR value is written too late to catch the next line in all of the resolutions, but the BXR value should make it in time except for the 256-wide mode, where it may or may not make it in time.
So there is the possibility of some nasty screen glitches and things just not working the way that the programmer wants, but this circumstance of having 16 sprites on a line and starting a TIA instruction at the same time is going to be rare, and make it look like a random and unexplained problem.
Looking at HuC's code, I suspect that the author just didn't realize that the VDC's sprite-pixel-data delay could potentially throw a nasty wrench into the works.
MagicKit also checks for IRQ1, then VSync, and finally HSync in its interrupt handling code, which I would consider ass-backwards for dealing with timing-critical code. I usually put HSync interrupt checking first in my own ASM code.
I know he was taking classes, and that might have just taken up all his time, but I would hate to lose his knowledge and participation in the scene. Anyone have contact information for him and can reach out to him to let him know we're here?
Touching on the missing members subject (thread deraling WOOOOOOO!), does anyone know if MooZ (the HuSDK guy) ever had an account @ PCEFX? I'm interacting with him via Github and I was wondering if he was aware/cared about the forums.
At first I thought "what kind of madman would do a TIA during interrupt triggering scanlines" but then again, I realized that any user that's trying to use the "unused cycles" of the scanline in the main thread to keep updating logic for the next frame, HuC or not, could run into that issue. In a non-HuC project you could simply implement a NES-style VRAM transfer buffer to be run during VBlank (or VBlank + safe scanline range) only, and that's also true for HuC, BUT, you could argue that HuC gives the false impression that writing to VRAM anywhere is OK (and it kinda is most of the time...). It would be nice to have a buffer implementation for the HuC VRAM write routines.
I could say the same thing about VCE writes which are kinda worse because any write outside v/hblank will show an ugly artifact on screen. HuC could also use a VBlank VCE writer. I'm not going to mention any game names here so people don't think I'm being a dick or trashing those games, but there are many homebrews where this glitch is very visible.
Anyway, just a wishlist that will probably never be implemented, haha.
MagicKit also checks for IRQ1, then VSync, and finally HSync in its interrupt handling code, which I would consider ass-backwards for dealing with timing-critical code.
Yep, putting your time-critical stuff first just makes sense.
It doesn't seem to be too horrible to write something that meets the timing requirements on a PCE, but I'm having some trouble coming up with a general-purpose (i.e. for HuC) routine that I feel comfortable with on the SGX when using a 256-wide screen. 240-wide is OK, and 320-wide and 336-wide modes are fine too (if running in single-cycle mode and slightly overclocking the VRAM).
All of the timing difficulties go away if you split TIA instructions into 16-byte chunks instead of 32-byte chunks ... perhaps that would be the sane thing to do?
spenoza: So Power Golf and Lords of Thunder are too laggy to be much fun, but most of the other shooters are pretty sound, owing to being pretty damn responsive in their original form.
Jun 1, 2020 16:59:44 GMT
sunteam_paul: Looks like Amazon UK are finally starting to send out Minis now
Jun 5, 2020 8:10:57 GMT
sunteam_paul: Mini arrived, Fantasy Zone is sooo much better in the new version.
Jun 6, 2020 13:53:47 GMT
tron: Finished ys 1&2 again recently.Since i got ys memories of celecta for ps4 recently,time for that game.
Jun 11, 2020 8:23:39 GMT
sunteam_paul: I've just finished Memories of Celceta on PC, great game
Jun 11, 2020 12:06:22 GMT
sunteam_paul: Nice to hear remixes of some of the Ys IV music in it
Jun 11, 2020 12:06:50 GMT
bigusschmuck: Hope everyone is well, still trying to adjust working from home.
Jun 27, 2020 14:37:26 GMT
sunteam_paul: Not working at all at the moment...awaiting
Jun 28, 2020 21:29:46 GMT
spenoza: I have been working from home quite a while, but have begun alternating furloughs due to financial constraints.
Jun 30, 2020 20:21:05 GMT
spenoza: All retro prices are up. Something about people being stuck at home and desperately needing stuff to do. It's why Netflix viewership is up and why Disney+ accidentally stumbled into the perfect launch window.
Jul 20, 2020 18:49:55 GMT
bigusschmuck: I suppose so. So much for a price crash lol
Jul 22, 2020 18:13:45 GMT
tron: yup same thing for blazing lazers/gunhed,digipiggy etc.It's getting rather insane for the sega saturn as well even commons for the console have skyrocketed in price.
Jul 24, 2020 8:09:11 GMT
tron: Speaking of retro i got a refurbish snes classic recently.I heard nintendo was selling ones again so i snag it finally,neat little plug'n'play gonna have to hack it since i hear it's easy to do.
Jul 24, 2020 8:13:52 GMT