zanto
Deep Blooper
Posts: 12
|
Post by zanto on Oct 25, 2023 21:04:52 GMT
Hi! I've started learning to program for the PCE and I'm having a blast! There aren't a lot of videos and tutorials, like the NES, but the things I ran into were very helpful! I hope I'll be able to make a... game-ish... program at some point I have a question about the block transfer. I was wondering if there's a way to re-create something like TAA. Here's my problem: I have a tile index, which is 1 byte long, and a palette for that tile, which is also 1 byte long. I want to transfer the same tile with the same palette to the screen multiple times. So basically, I want to do lda #TileIndex sta $0002 lda #PaletteIndex sta $0003 multiple times. That seems to be so close to what you can do with TIA or something, but I also need the source address to alternate, just like the destination address. Is it possible to do something like this efficiently? Or am I dumb for missing something? Sorry if this is a stupid question, there are many things I still don't understand.
|
|
|
Post by ccovell on Oct 28, 2023 0:29:22 GMT
I don't think there is a block transfer that does this, only one that increments the source and alternates the destination.
To do a memory fill inside VRAM, you could use the VRAM-to-VRAM DMA transfer registers of the VDC. Write your two bytes to VRAM address $0000 manually, for example, then set up the DMA registers to do a block increment with a source of $0000, destination $0001, and set the # of words to transfer. You'll need to look up the VDC reference to see how this is done.
For a beginner, I'd recommend just writing into your BAT manually, but VRAM DMA tranfers can be done for greater speed when you have more experience.
|
|
zanto
Deep Blooper
Posts: 12
|
Post by zanto on Oct 28, 2023 1:01:08 GMT
Thanks! I think I understand the idea, but I have no idea how to implement that. As you said, I'll just keep using my simple loop to load my tiles, haha! Btw, your video tutorials have been extremely helpful! Thank you for taking your time to make them! ^^
|
|
|
Post by turboxray on Oct 29, 2023 18:10:42 GMT
Hi! I've started learning to program for the PCE and I'm having a blast! There aren't a lot of videos and tutorials, like the NES, but the things I ran into were very helpful! I hope I'll be able to make a... game-ish... program at some point I have a question about the block transfer. I was wondering if there's a way to re-create something like TAA. Here's my problem: I have a tile index, which is 1 byte long, and a palette for that tile, which is also 1 byte long. I want to transfer the same tile with the same palette to the screen multiple times. So basically, I want to do lda #TileIndex sta $0002 lda #PaletteIndex sta $0003 multiple times. That seems to be so close to what you can do with TIA or something, but I also need the source address to alternate, just like the destination address. Is it possible to do something like this efficiently? Or am I dumb for missing something? Sorry if this is a stupid question, there are many things I still don't understand. No, but if you use the arcade card and I/O banks with auto-increment, it's the equiv of TAA. It's how they got around the issue with the arcade card. Yeah, the idea Chris posted is just like a ram clearing routine. stz $2000, followed by TII $2000,$2001,$1fff. It's also how LZ decompression schemes do RLE for free (without needing a special command code). Same idea. You could have a section of ram.. say like 10-20 bytes or whatever, and just write the "column" or "row" that you need (once), and then have your TIA read from that section with an outer loop each call. I'm assuming you want this for speed reasons? Another solution is to have like multiple code (as tables) in rom (or ram, as self modifying code). These would be like ST2 #val (i.e. st2 #nn, st2 #nn, st2 #nn..... rts). Have like sets of 8. For your loop, before you enter into the loop, write to $0002 low byte VDC value. Then inside your loop, JSR to the section of ST2 instructions. This is very fast. You only need to write 1 byte instead of two. This works because the VDC buffers the lower byte, and it always keeps that value. So if you just write to the upper byte ($0003), then the lower byte gets copied for free (the last value there). This will be faster than if a TAA existed.
|
|
zanto
Deep Blooper
Posts: 12
|
Post by zanto on Oct 29, 2023 23:40:12 GMT
Hi! I've started learning to program for the PCE and I'm having a blast! There aren't a lot of videos and tutorials, like the NES, but the things I ran into were very helpful! I hope I'll be able to make a... game-ish... program at some point I have a question about the block transfer. I was wondering if there's a way to re-create something like TAA. Here's my problem: I have a tile index, which is 1 byte long, and a palette for that tile, which is also 1 byte long. I want to transfer the same tile with the same palette to the screen multiple times. So basically, I want to do lda #TileIndex sta $0002 lda #PaletteIndex sta $0003 multiple times. That seems to be so close to what you can do with TIA or something, but I also need the source address to alternate, just like the destination address. Is it possible to do something like this efficiently? Or am I dumb for missing something? Sorry if this is a stupid question, there are many things I still don't understand. No, but if you use the arcade card and I/O banks with auto-increment, it's the equiv of TAA. It's how they got around the issue with the arcade card. Yeah, the idea Chris posted is just like a ram clearing routine. stz $2000, followed by TII $2000,$2001,$1fff. It's also how LZ decompression schemes do RLE for free (without needing a special command code). Same idea. You could have a section of ram.. say like 10-20 bytes or whatever, and just write the "column" or "row" that you need (once), and then have your TIA read from that section with an outer loop each call. I'm assuming you want this for speed reasons? Another solution is to have like multiple code (as tables) in rom (or ram, as self modifying code). These would be like ST2 #val (i.e. st2 #nn, st2 #nn, st2 #nn..... rts). Have like sets of 8. For your loop, before you enter into the loop, write to $0002 low byte VDC value. Then inside your loop, JSR to the section of ST2 instructions. This is very fast. You only need to write 1 byte instead of two. This works because the VDC buffers the lower byte, and it always keeps that value. So if you just write to the upper byte ($0003), then the lower byte gets copied for free (the last value there). This will be faster than if a TAA existed.
Thank you for your answer! I read a bit about self modifying codes and I couldn't really wrap my head around it. I imagine once I get more used to programming on the PCE, I could give that a try. Right now, I'm working on some tables and it's kinda overwhelming to debug them (but I feel I'm slooooowly getting the hang of it).
|
|
|
Post by turboxray on Oct 30, 2023 23:21:34 GMT
No, but if you use the arcade card and I/O banks with auto-increment, it's the equiv of TAA. It's how they got around the issue with the arcade card. Yeah, the idea Chris posted is just like a ram clearing routine. stz $2000, followed by TII $2000,$2001,$1fff. It's also how LZ decompression schemes do RLE for free (without needing a special command code). Same idea. You could have a section of ram.. say like 10-20 bytes or whatever, and just write the "column" or "row" that you need (once), and then have your TIA read from that section with an outer loop each call. I'm assuming you want this for speed reasons? Another solution is to have like multiple code (as tables) in rom (or ram, as self modifying code). These would be like ST2 #val (i.e. st2 #nn, st2 #nn, st2 #nn..... rts). Have like sets of 8. For your loop, before you enter into the loop, write to $0002 low byte VDC value. Then inside your loop, JSR to the section of ST2 instructions. This is very fast. You only need to write 1 byte instead of two. This works because the VDC buffers the lower byte, and it always keeps that value. So if you just write to the upper byte ($0003), then the lower byte gets copied for free (the last value there). This will be faster than if a TAA existed. Thank you for your answer! I read a bit about self modifying codes and I couldn't really wrap my head around it. I imagine once I get more used to programming on the PCE, I could give that a try. Right now, I'm working on some tables and it's kinda overwhelming to debug them (but I feel I'm slooooowly getting the hang of it).
Ahh okay. I just wanted to show that if you're writing the same fixed value set to the VDC.. you don't have to write to $0002 every time. Something like.. copyFixedValtoVRAM: ; Call arguments ; X -> $0002 ; Acc -> $0003 ; num of copy iterations -> count0 * count1
stx $0002
ldy count0 .loop.outer
ldx count1 .loop.inner sta $0003 dex bne .loop_inner dey bne .loop.outer
rts
It's not even unrolled and it still faster than a TAA would be. This is 12 cycles per WORD transfer where as a Txx to a VDC port is 14 cycles per WORD transfer. And the best thing is that it doesn't stall interrupts like the Txx instructions do (gotta be careful with them).
|
|
zanto
Deep Blooper
Posts: 12
|
Post by zanto on Oct 31, 2023 2:04:34 GMT
Interesting! So just by writing at $0003, the VRAM will use the same value in $0002? That's neat, I thought you had to write into both addresses every time! That's definitely more efficient than what I had coded ^^;
|
|
zanto
Deep Blooper
Posts: 12
|
Post by zanto on Nov 4, 2023 7:54:18 GMT
Hi! I have another stupid question. I thought things like these don't deserve their own forum post and clutter things, so I decided to add it to this post. Let me know if I should change this. Anyway, my silly question. I'm trying to load a variable amount of tiles to the VRAM. So this is what I'm trying to do:
vload TILE_VRAM, Test_Tiles, <_ex
However, I get this error
#[1] C:\MyFiles\Development\PC Engine\pceas2\PCE_RPG\main.asm 61 00:E097 lda LOW_BYTE #<_ax Syntax error in expression! 61 00:E099 lda HIGH_BYTE #<_ax Syntax error in expression! # 2 error(s)
Is it not possible to do that? To give more context on what I'm trying to do: I want to have a bank with tilesets, where each tileset has a different amount of tiles in them. So, whenever I load a map that uses a specific tileset, I can store in VRAM only the tiles that will be used.
map Tilemap1 jsr Clear_BAT lda Tilemap1Metadata+2 jsr set_bat_size stw #$0000, <_di lda #TILEMAP_BANK sta < _bl stw #Tilemap1, <_si lda #16 sta <_cl sta <_ch stw #Tilemap1Metatiles, < _dx lda Tilemap1Metatiles sta <_al lda Tilemap1Metatiles+1 sta <_ah map Test_Tiles vload TILE_VRAM, Test_Tiles, <_ax ; <---- this is where the the error happens set_bgpal #1, Tile_Pal, #2
|
|
|
Post by turboxray on Nov 4, 2023 20:27:42 GMT
I don't know anyone that uses magickit includes, just PCEAS, so I'm not too familiar with its macros. But a quick look shows:
; vload([vram,] data, size) ; ---- ; vram, VRAM base address ; data, video data memory address ; size, number of words to copy
.macro vload .if (\# = 3) stw #\1,<__di .if (\?2 = ARG_LABEL) stb #BANK(\2),<__bl .else stb #$FE,<__bl .endif stw #\2,<__si stw \3,<__cx .else stw #VRAM(\1),<__di stb #BANK(\1),<__bl stw #\1,<__si stw #\2,<__cx .endif jsr load_vram .endm
In the macro code that I found, it only ever does stw \3 and not a stw #\3. You might want to check what version if library.inc you're using and if the macro is the same.
Or are you using someone else's macro with the same name?
|
|
|
Post by dshadoff on Nov 4, 2023 21:57:52 GMT
So, wouldn't the syntax have to include brackets and be more like this ? (I'm just going by memory, and I haven't done this in a long time...)
Instead of:
lda LOW_BYTE #<_ax and
lda HIGH_BYTE #<_ax
...wouldn't it be:
lda #LOW_BYTE(_ax) and
lda #HIGH_BYTE(_ax)
...But of course, this is assuming that you're trying to use the ADDRESS of _ax.
If you're trying to use the value STORED AT the address _AX, then you shouldn't be using LOW_BYTE/HIGH_BYTE:
lda <_al
and
lda <_ah
|
|
zanto
Deep Blooper
Posts: 12
|
Post by zanto on Nov 5, 2023 4:53:08 GMT
In the end, I decided to ignore the macro and adapt the code to what I to do
map Test_Tiles stw #TILE_VRAM, <_di ; tile location in VRAM stw #Test_Tiles, <_si ; tile location in ROM stb #BANK(Test_Tiles), <_bl ; ROM bank stw Tile1_Metadata, <_cx ; num of tiles to send to vram jsr load_vram
I'm using PCEAS2 and a Magic Kit library I found online. Is there a more recent version of it or a better lib? I may have missed it somehow.
I think that was it... ^^; I understand that the # makes it so that you're referencing the literal address value and not its contents, but sometimes I get things mixed up.
|
|
|
Post by turboxray on Nov 5, 2023 16:40:49 GMT
Yeah, the issue is with STW macro which is called inside that vload macro.
; ; STW - store a word-sized value at stated memory location ; stw .macro lda LOW_BYTE \1 sta LOW_BYTE \2 lda HIGH_BYTE \1 sta HIGH_BYTE \2 .endm There's no detection of the parameter type in the macro. As in, if you're trying to give the macro a "ZP address" instead of a full address with the "<" operator, the stw (store word) macro is too dumb to detect it and tries to combo low/high byte on top of it.
|
|
|
Post by ccovell on Nov 5, 2023 18:35:35 GMT
I've been using the MagicKit libraries for a while, but I can't tell you which version. However, the macro I have in the libraries for the above function shows: stw .macro .if (\?1 = 2) ; immediate mode lda #low(\1) ; same as old 'stwi' sta \2 lda #high(\1) sta \2+1 .else ; other addressing modes lda \1 ; same as old 'movw' sta \2 lda \1+1 sta \2+1 .endif .endm But I have also put my own comments into my copy of the libraries over time apparently:
; ADDW - add word-sized value to value at stated memory location, ; storing result back into stated memory location (or into ; another destination memory location - third arg) ; ; FROM FU**ING HuC!!!!!!!!!!! IT DOESN'T FU**ING WORK WITH MKIT! ;addw .macro ; .if (\# = 3) ; ; 3-arg mode . . And:
; LIBRARY.ASM - MagicKit Standard Library ; ; ; 1/24/2023: Clear VRAM in init_vdc never set register #2 for writing! ; has it been bugged all these years???? . . ; clear the video RAM
st0 #0 st1 #0 st2 #0 st0 #2 ;Point to write register <--? Added by me
And:
; vload([vram,] data, #size) ; ---- ; vram, VRAM base address ; data, video data memory address ; size, number of words to copy
.macro vload .if (\# = 3) stw #\1,<_di stw #\2,<_si stw #\3,<_cx ;WTF is this??? ERRORS: "stw \3,<_cx" .else stw #VRAM(\1),<_di stw #\1,<_si stw #\2,<_cx .endif jsr load_vram .endm
So obviously the MagicKit libraries were not perfect, and I may have poached some code from the HuC libraries to get better, bugfixed versions of the same routines.
|
|
|
Post by elmer on Nov 5, 2023 19:44:30 GMT
I've been using the MagicKit libraries for a while, but I can't tell you which version. However, the macro I have in the libraries for the above function shows: stw .macro .if (\?1 = 2) ; immediate mode lda #low(\1) ; same as old 'stwi' sta \2 lda #high(\1) sta \2+1 .else ; other addressing modes lda \1 ; same as old 'movw' sta \2 lda \1+1 sta \2+1 .endif .endm That's a much better implementation of that STW macro! FWIW, you should also be able to get rid of the "+1" and use this syntax (which PCEAS has supported since way back in v3.21) to make it clearer that you're using 16-bit variables ... stw .macro .if (\?1 = 2) ; immediate mode lda #low(\1) ; same as old 'stwi' sta.l \2 lda #high(\1) sta.h \2 .else ; other addressing modes lda.l \1 ; same as old 'movw' sta.l \2 lda.h \1 sta.h \2 .endif .endm
|
|