touko
Punkic Cyborg
Posts: 106
|
Post by touko on Oct 25, 2019 10:05:24 GMT
Cool to see you are still alive tom .
|
|
|
Post by dshadoff on Feb 5, 2020 3:45:35 GMT
I was reading the top post of the thread for technical information, and I notice that there was an update in the past few days, regarding timing of BYR/BXR updates near RCR interrupts.
My questions is regarding points of refernce. Specifically, I saw the following: "Safe to write BYR @ 100 cpu cycles if width=240 hdw=$1D"
1) in the above, the measurement of 100 cpu cycles - what is that with reference to ? The beginning of the line, end of the line, RCR interrupt trigger (which is n cycles before HDW end), or something else ?
2) Is that 100 cycles and up, before the event in question ?
...Just trying to get my bearings on this.
Also, where you say "MWR=$x0 (1-clk-per-access)" and "MWR=$xA (2-clk-per-access)"... Does MWR = memory write register (i.e. VRAM write address) ? Develo writes that as MAWR...
And finally, do we know how many clocks are needed for the other possible values (besides $x0 and $xA) ?
I know I wasn't so familiar with the details of the VDC, but I feel like I knew even less than I thought I did...
|
|
|
Post by elmer on Feb 5, 2020 5:32:07 GMT
Also, where you say "MWR=$x0 (1-clk-per-access)" and "MWR=$xA (2-clk-per-access)"... Does MWR = memory write register (i.e. VRAM write address) ? Develo writes that as MAWR... And finally, do we know how many clocks are needed for the other possible values (besides $x0 and $xA) ? I'm referring to the MWR (memory access width) register, R09 in the English developer docs. The low nibble of that register determines if the VDC accesses VRAM at 1-clk-per-access, 2-clks-per-access or 4-clks-per-access. The only values of the low-nibble that are relevant to games are $0, $A, and sometimes $9 (which at-least one game uses instead of $A, although they are the same). If you remember, this came up as part of the "why aren't there more 320-wide games?" thread, together the stuff about NEC forcing Hudson to use 2-clks-per-access mode for the US release of R-Type, even though it caused a lot of sprite dropout. I was reading the top post of the thread for technical information, and I notice that there was an update in the past few days, regarding timing of BYR/BXR updates near RCR interrupts. The update was just me checking when the VDC's CR register is shadowed/locked for the line, and adding that information to the table. My questions is regarding points of refernce. Specifically, I saw the following: "Safe to write BYR @ 100 cpu cycles if width=240 hdw=$1D" 1) in the above, the measurement of 100 cpu cycles - what is that with reference to ? The beginning of the line, end of the line, RCR interrupt trigger (which is n cycles before HDW end), or something else ? 2) Is that 100 cycles and up, before the event in question ? ...Just trying to get my bearings on this. It's all to do with RCR interrupts, and when the VDC's BYR/BXR/CR/etc registers are shadowed/locked for the next display line. When you're trying to program raster splits, or parallax, or other "wavy-screen" effects, this information is crucial (and the mister core is probably getting it wrong). When you get an RCR, you have a limited amount of CPU cycles in which to change the VDC's BYR/BXR/CR/HDS/HDW/etc registers in order for your changes to show up on the very next scanline. If you write your regsiter changes too late, then they won't effect things until the next scanline *after* the one that is about to start. This is why Chris (in his HuZero game) and Tom (in his posts) are very careful to write as-efficient-as-possible interrupt handlers in order to change the VDC's registers before they are shadowed. But doing things in this way means that you must be careful to make sure that nothing can possibly delay the CPU's response to the VDC's RCR interrupt, so you have to rule out using the CPU's TIA transfer instructions, and be very, very careful about using timer interrupts. The other alternative is to set your RCR interrupt 2 lines before the line that you wish to change, and then change the register values *after* the VDC shadows the settings for the next line. Doing things that way means that instead of having 80-or-so cycles from the RCR to set all the registers, you have over 500 cycles to set them ... which means that you don't have to be so careful about delaying the CPU's response to the RCR interrupt, and you can still use TIA transfer instructions (in 32-byte chunks), but you have to be careful not to write the register too soon, or things won't work the way that you expect. This is the method that the System Card uses, and lots of other CD games too, even if they provide their own custom interrupt handlers. Which ever method a programmer uses, it is important to know when the VDC locks those registers, and potentially in which order. That's what the table that I've provided shows. It tells you how many cycles after the RCR interrupt is triggered, that it is safe to write the BYR register so that it does *not* effect the very next scanline. So basically, the BYR register is shadowed by the VDC 1-cycle before ... then the BXR register, and then the CR register (at 1-cycle increments). If the Mister core doesn't get this timing right, then games are going to break. Even though Mednafen may be off by a few cycles on when the RCR interrupt occurs in relation to the end of the line, and about when the vblank interrupt occurs ... it's actually pretty accurate on the RCR-to-BYR delay, because it *has* to be. For instance ... the System Card's timing is so close to the limit, that it actually breaks when you change the screen to 320-wide, or even to 240-wide! Anyway, does that make sense, or have I failed to describe it in an understandable way? I know I wasn't so familiar with the details of the VDC, but I feel like I knew even less than I thought I did... This interrupt-handling stuff is both hard, and undocumented. Manufacturers have *never* documented this kind of information, it's always been the kind of nitty-gritty detail that developers have had to work out for themselves.
|
|
|
Post by dshadoff on Feb 5, 2020 5:43:07 GMT
Anyway, does that make sense, or have I failed to describe it in an understandable way? That all makes sense, and I was aware of much (but not all) of it. I remembered he memory access width register discussion only vaguely, and only after you reminded me... thanks for the recap. Mostly it was the fact that the message at the top of the thread gave a measurement but not what it was measured from - which I now know.
|
|
|
Post by Mathius on Feb 5, 2020 22:57:16 GMT
Why do you think, Elmer, NEC forced Hudson to create a situation where they had to use 2-clks-per-access in the US R-Type?
|
|
|
Post by elmer on Feb 6, 2020 5:34:37 GMT
Why do you think, Elmer, NEC forced Hudson to create a situation where they had to use 2-clks-per-access in the US R-Type? It was discussed back in the other thread here ... Why are PCE games that use >= 320 horizontal res so rare?The TLDR is that running games in medium-res and 1-cycle-per-VDC-access is actually slightly overclocking the VRAM in the PCE, and so isn't guaranteed to work. NEC were being cautious with their new machine, and presumably didn't want it to get a bad reputation for unreliability with consumers. In practice, I don't believe that anyone has ever reported problems with running in medium-res and 1-cycle-per-VDC-access. Heck, people even seem to be able to successfully run in high-resolution with 1-cycle-per-VDC-access, which is a *massive* overclock ... and totally unneccesary, since you can just use 2-cycles-per-VDC-access in high-resolution with absolutely no loss in VDC/CPU performance. Does that answer you question?
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Feb 6, 2020 9:15:28 GMT
Normal, in mid-res the VRAM is not overclocked at all . At the 7.16 mhz dot clock and with 1cycle/access, it require a 140ns RAM, and the PCE has a 120 one(HSRM20256LM12), despite it would change nothing even at 140 ns because it's SRAM,on the other hand at 10.74mhz, i don't think it should be the same .
|
|
|
Post by elmer on Feb 6, 2020 22:46:18 GMT
Normal, in mid-res the VRAM is not overclocked at all . At the 7.16 mhz dot clock and with 1cycle/access, it require a 140ns RAM, and the PCE has a 120 one(HSRM20256LM12), despite it would change nothing even at 140 ns because it's SRAM,on the other hand at 10.74mhz, i don't think it should be the same . Errrmmm ... things are *far* more complicated than just comparing two numbers and saying that 140ns > 120ns. Yes, SRAM is rated in access times, but there are a bunch of electrical signals involved in the read and write cycle timings. Without the manufacturer's timing diagrams, we don't know how long the delay is between the VDC clock changing, and the /CE, /RD, /WR, address bus and data bus signals becoming stable, or how long the data-hold times are after a read and a write. All of those delays (and others) reduce the 140ns cycle time ... and at some point, things will start to fail. We do know that the HuC6280 CPU was specifically designed to run at 7.16MHz and 1-cycle-per-CPU-access ... and it uses an HSRM2264LM10 8KBx8 100ns SRAM, and not the 120ns version of that SRAM chip. That would seem to be a good indication that that there are enough delays in the CPU's memory interface signals that the 120ns chip wouldn't work reliably. We can speculate that Hudson designed the VDC's memory interface timings to be so much tighter than the CPU's memory interface timings that it could work at 7.16MHz and 1-cycle-per-VDC-access using 120ns SRAM, but without anything to back up the idea, it would just be an unproved theory. NEC's hardware engineers, who had all of the technical specs of the chips that we don't have, clearly believed that there could *potentially* be a problem with running the VDC at 7.16MHz and 1-cycle-per-VDC-access. Personally, I suspect that NEC's hardware engineers weren't idiots, and that we can just get away with it because Epson's SRAM chips were well manufactured, and that even the chips that they sold as 120ns chips could actually run fast enough to work. Heck ... we actually know that that is the case because we can run the VDC at 10.74MHz and 1-cycle-per-access and everything appears to work (although there is little benefit for us in actually doing so).
|
|
|
Post by Mathius on Feb 7, 2020 1:58:50 GMT
Why do you think, Elmer, NEC forced Hudson to create a situation where they had to use 2-clks-per-access in the US R-Type? It was discussed back in the other thread here ... Why are PCE games that use >= 320 horizontal res so rare?The TLDR is that running games in medium-res and 1-cycle-per-VDC-access is actually slightly overclocking the VRAM in the PCE, and so isn't guaranteed to work. NEC were being cautious with their new machine, and presumably didn't want it to get a bad reputation for unreliability with consumers. In practice, I don't believe that anyone has ever reported problems with running in medium-res and 1-cycle-per-VDC-access. Heck, people even seem to be able to successfully run in high-resolution with 1-cycle-per-VDC-access, which is a *massive* overclock ... and totally unneccesary, since you can just use 2-cycles-per-VDC-access in high-resolution with absolutely no loss in VDC/CPU performance. Does that answer you question? 'Aye. I'll go back and re-read that other thread again as well. Thanks!
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Feb 7, 2020 8:35:00 GMT
Not really, in fact it's a bit less,because the vanilla 6502 is in fact not at 1/2 cycle per access but less 0.4, this is why the hu6280 is under the 1 cycle (maybe between 0.7 and 0.9) and require a faster RAM than 140ns . But the 65xxx are a particular case, and i don't think that the VDC use less than 1 cycle(all other hardwares are 1 cycle too,and were the classic DMA cycles), because 1 cycle is already rude for memory. For exemple, the MD's Z80 RAM is 100ns(plus, it's SRAM) for a 3.5mhz CPU wich has a 4 cycles access and a DRAM refresh controler built in, do you think that the SEGA's ingeneers were bad ?? i agree, but we can also speculate many things then,and like you said "it would just be an unproved theory.",just for me a DMA under 1cycle/access is just a non sense,not impossible, but not logical .
|
|
|
Post by dshadoff on Feb 8, 2020 6:22:01 GMT
Getting back to the RCR interrupt, I understand why you are counting CPU cycles - because of the interrupt service routine - but in an effort to reconcile it with the VDC, let me play out a scenario, and see if you agree.
Based on the 7.16MHz dot-clock, 1-cycle access for simplicity, we have the following table:
7.16MHz (with MWR = $x0) ; ; Safe to write BYR @ 106 cpu cycles if width=320 hdw=$27 ; Safe to write BYR @ 98 cpu cycles if width=328 hdw=$28 ; Safe to write BYR @ 90 cpu cycles if width=336 hdw=$29 ; Safe to write BYR @ 82 cpu cycles if width=344 hdw=$2A ; Safe to write BYR @ 74 cpu cycles if width=352 hdw=$2B Now, we know from earlier in the thread that: a) The RCR interrupt fires 12 cycles prior to the end of HDW (+/- 1 cycle) b) The delay between RCR interrupt and BYR latch is dependent only on HDW c) There is a fixed number of cycles (455) per line in 7.16MHz dot-clock mode
This implies that the BYR latch happens a fixed number of cycles before HDW starts in the 7.16MHz dot-clock mode. This likely translates to the same number of VDC master cycles across different dot-clocks.
Based on the following assumptions: 1) Your measurements already counted interrupt latency and service cycles into account (the time between interrupt being raised and interrupt service routine CPU cycles being expended) 2) The figures listed above (i.e. 106 cycles) are counted to the first cycle *after* the latch took place
This leaves us with the BYR being latched 41 VDC cycles prior to the start of HDW.
That is to say: One scanline looks like this: X (time from start of line to BYR latch) + Y (time between BYR latch and start of HDW) + HDW-12 (time to RCR interrupt) + 12 (time from RCR interrupt to end of HDW) + Z (time after HDW to end of line)
We have (for 320-wide): From interrupt delay: 106 = 12 + Z + X -> therefore (Z+X) = 94
Also, from Scanline: 455 = X + Y + HDW-12 + 12 + Z 455 = X + Y + HDW + Z
Since HDW = 320, and (Z+X) = 94, Y = 455 - 320 - 94 so Y = 41 cycles before HDW
Since an interrupt pushes the flag register onto the stack and basically does a 'JSR' to the interrupt service routine, I'm going to assume that at least the first 10 cycles or so after the interrupt signal is raised, the CPU is processing interrupt overhead. I'm not sure whether those cycles are counted in your 106, but my assumptions above expect that they are.
Let me know your thoughts.
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Jun 3, 2020 15:16:34 GMT
And what happens if the CPU is making a block copy with Txx ?? do you think the process is halted or it may cause some writes miss ??
|
|
|
Post by dshadoff on Jun 3, 2020 20:50:27 GMT
The process is stalled for the WAIT state, but not otherwise interrupted. Effectively, a read (or a write) would take additional cycles.... Actually, if you can read VHDL, you can see how the MiSTer core has been implemented - it passes all of our tests to-date, so it is currently the best model for how the VDC works. github.com/MiSTer-devel/TurboGrafx16_MiSTer
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Jun 4, 2020 8:07:07 GMT
Thanks .
|
|
|
Post by turboxray on Jun 4, 2020 18:16:35 GMT
Yeah. It doesn't matter what the instruction is for the 6280 when accessing the VDC. If the VDC asserts /RDY, then the processor halts and until it's released. dshadoff: Where are you getting 12 cycles before HDW ends, from?
|
|