Getting data in assembly (HuC/ASM)

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 2, 2022 16:33:32 GMT

Quote

Post by 0x8bitdev on Jun 2, 2022 16:33:32 GMT

Jun 2, 2022 15:33:54 GMT elmer said:

Yes, I was going to point out the bug, good for you for finding it first!

You need to change
lda <__ax+1 adc <__si+1
to
lda <__si+1 and #$1F adc <__ax+1
And even then, you're left with a pointer which must still be remapped to the correct destination-page that you want to use, so change
tya and #$1f sta <__si+1
to
tya and #$1f ora #$60 sta <__si+1

A little about this bug.

At first I wrote that I've found the bug, because today I tested this routine on a non-absolute offset read from an array of labels that cross 8K. And I couldn't get a correct bank value - 04. That is because the label offset points to a bank 04 with the same address space like in bank 03. I thought that bank 04 label address must be in range #8000-#9FFF... But the bank 04 labels point to #7xxx address.

And then I removed the message about the bug, because that routine works on data that fits in 3 banks without glitches...

So I'm curious why that routine works without fixes?

[upd]: Just forgot... Thanks for the fixes!

Jun 2, 2022 15:33:54 GMT elmer said:

Well, if you're writing in C (maybe with some inline-asm, then your code will be in a procedure located in MPR5 ($A000-$BFFF).

If you write tiles_data_processing in asm as a procedure, it would also run in MPR5.

If you write tiles_data_processing in asm as a bit of code to run in LIB2_BANK or LIB3_BANK ... it would also run in MPR5.

Typically, when multiple banks of data are needed, you use MPR3 and MPR4 ... BUT note that you absolutely cannot call any other procedures in HuC while you've got MPR4 mapped to a region of data, and that you MUST restore MPR4's original value before you return.

FWIW, that exact restriction/limitation does not apply in the new KickC environment, but that's not particularly helpful to you at this time!

Thus, I can do the same thing that map_data/unmap_data do.

Last Edit: Jun 2, 2022 16:51:13 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

elmer
ANIKIIII!

Posts: 1,040

Getting data in assembly (HuC/ASM) Jun 2, 2022 18:04:00 GMT

Quote

Post by elmer on Jun 2, 2022 18:04:00 GMT

Jun 2, 2022 16:33:32 GMT 0x8bitdev said:

So I'm curious why that routine works without fixes?&

Without the fixes, the code is going to be entirely dependent upon which page the farptr was assembled into ... or if you wipe the top 3-bits of the far-ptr yourself somewhere else.

I don't have your code, so I don't know how you got away with it.

The other thing to note is that I've treated the offset as an actual offset from the start of the data, which *seems* to be what you described that you wanted, but it's not what you've actually written in the snippet that you posted, where you are just taking the low 16-bits of the absolute address of the screen data ...

.word _Lev0Scr0

Jun 2, 2022 16:33:32 GMT 0x8bitdev said:

Thus, I can do the same thing that map_data/unmap_data do.

Precisely!

But only if you're writing in asm or using inline-asm in HuC. AFAIK you can't call those functions directly from C code in HuC.

I'd personally avoid saving the current contents of MPR3 and MRP4 in zero-page, and just put them on the stack instead. But there's nothing wrong with using zero-page locations if you know that they're not going to be overwritten.

The whole weirdness of the map_data function checking for bank $FE and putting something in __bp ... well I have no idea at all why that's there. I've never done anything like that in my code.

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 3, 2022 11:03:16 GMT

Quote

Post by 0x8bitdev on Jun 3, 2022 11:03:16 GMT

Jun 2, 2022 18:04:00 GMT elmer said:

The other thing to note is that I've treated the offset as an actual offset from the start of the data, which *seems* to be what you described that you wanted, but it's not what you've actually written in the snippet that you posted, where you are just taking the low 16-bits of the absolute address of the screen data ...

.word _Lev0Scr0

As a result, the idea with the 24-bit starting-point for the Layout data + 16-bit offset for a screeen data will not work, because I can't determine if the offset value is pointing to the same bank or the next... Anyway I need a bank value.

...
_Lev0_Layout:	
	.word _Lev0Scr0
	.word _Lev0Scr1
	.word _Lev0Scr2
	...
_Lev0Scr0:
	...
_Lev0Scr1:
	...
_Lev0Scr...

...
_mpd_MapsArr:
	.word _Lev0_Layout
	.byte bank(_Lev0_Layout)

Jun 2, 2022 15:33:54 GMT elmer said:

Well, if you're writing in C (maybe with some inline-asm), then your code will be in a procedure located in MPR5 ($A000-$BFFF).

If you write tiles_data_processing in asm as a procedure, it would also run in MPR5.

If you write tiles_data_processing in asm as a bit of code to run in LIB2_BANK or LIB3_BANK ... it would also run in MPR5.

Typically, when multiple banks of data are needed, you use MPR3 and MPR4 ... BUT note that you absolutely cannot call any other procedures in HuC while you've got MPR4 mapped to a region of data, and that you MUST restore MPR4's original value before you return.

I feel the magic...

Another blind spot in my understanding on what is going on here. How many banks a HuC program can take? In segment usage I saw two banks with code in range $A000-$BFFF.

What is the limitation?

As I quessed, the turboxray 's idea with 3 banks mapping isn't for HuC...

Last Edit: Jun 3, 2022 11:03:45 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

elmer
ANIKIIII!

Posts: 1,040

Getting data in assembly (HuC/ASM) Jun 3, 2022 13:54:33 GMT

Quote

Post by elmer on Jun 3, 2022 13:54:33 GMT

Jun 3, 2022 11:03:16 GMT 0x8bitdev said:

As a result, the idea with the 24-bit starting-point for the Layout data + 16-bit offset for a screeen data will not work, because I can't determine if the offset value is pointing to the same bank or the next... Anyway I need a bank value.

Having a bank value will be faster to process than base+16-bit-offset, but there's absolutely nothing wrong with using base+16-bit-offset (if you are using < 64KB of data).

But you do need to actually make the value an offset instead of an absolute address ...

_Lev0_Layout:	
	.word ((bank(_Lev0Scr0) << 13) + (_Lev0Scr0 & $1FFF)) - ((bank(_Lev0_Layout) << 13) + (_Lev0_Layout & $1FFF))

Note 1: If you were doing this a lot, you'd just define a PCEAS "user-defined-function" (see huc/doc/pce/usage.txt).

Note 2: I'll probably add a built-in function to do this, because it's such a common need.

Note 3: Some of the ugliness is because PCEAS is inconsistent about when it does or does not increment the ".page" number when crossing a bank boundary. I am trying to change PCEAS to be consistent and predictable about this so that we can just write ...

_Lev0_Layout:	
	.word _Lev0Scr0 - _Lev0_Layout

... and know that it will always work, but it'll take a couple of days of testing to make sure that the change doesn't break anything.

Jun 3, 2022 11:03:16 GMT 0x8bitdev said:

Another blind spot in my understanding on what is going on here. How many banks a HuC program can take? In segment usage I saw two banks with code in range $A000-$BFFF.

What is the limitation?

The only limitation is the size of your target ROM (or target SuperCD RAM).

The current low-level assembly-language library code for HuC was a bit limited in size (until turboxray made some changes), and is still a PITA to add new capabilities to, while PCEAS's "procedures" OTOH, which are what HuC uses for C code, are only limited to a maximum of 455 procedures with a maximum size of 8KB each, and they can use up as many total banks as you need/have.

Jun 3, 2022 11:03:16 GMT 0x8bitdev said:

As I quessed, the turboxray 's idea with 3 banks mapping isn't for HuC...

Nope, his idea is perfectly applicable to HuC, you just need to hide it behind a library call written in assembly-language.

This is what you don't seem to understand yet ... HuC relies on assembly-language libraries to do all of the hard-work, and then the HuC user calls those assembly-language libraries from C code when they want to do something.

In modern terms, it is far better to think of HuC as a application scripting-language like LUA, Python or Unreal-Script, and not as a modern high-performance C compiler usable for system-level programming.

Last Edit: Jun 3, 2022 17:32:38 GMT by elmer

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 5, 2022 12:50:02 GMT

Quote

Post by 0x8bitdev on Jun 5, 2022 12:50:02 GMT

Jun 3, 2022 13:54:33 GMT elmer said:

Note 3: Some of the ugliness is because PCEAS is inconsistent about when it does or does not increment the ".page" number when crossing a bank boundary. I am trying to change PCEAS to be consistent and predictable about this so that we can just write ...

_Lev0_Layout:	
	.word _Lev0Scr0 - _Lev0_Layout

... and know that it will always work, but it'll take a couple of days of testing to make sure that the change doesn't break anything.

That would be great. For me, to have such a built-in ability to get a correct offset writting label1 - label2 would preserve universality of the generated code. And it's just logical to have a correct offset in the data within the asm file using a simple operation.

Jun 3, 2022 13:54:33 GMT elmer said:

Jun 3, 2022 11:03:16 GMT 0x8bitdev said:

Another blind spot in my understanding on what is going on here. How many banks a HuC program can take? In segment usage I saw two banks with code in range $A000-$BFFF.

What is the limitation?

The only limitation is the size of your target ROM (or target SuperCD RAM).

The current low-level assembly-language library code for HuC was a bit limited in size (until turboxray made some changes), and is still a PITA to add new capabilities to, while PCEAS's "procedures" OTOH, which are what HuC uses for C code, are only limited to a maximum of 455 procedures with a maximum size of 8KB each, and they can use up as many total banks as you need/have.

Good news!

Jun 3, 2022 13:54:33 GMT elmer said:

Jun 3, 2022 11:03:16 GMT 0x8bitdev said:

As I quessed, the turboxray 's idea with 3 banks mapping isn't for HuC...

Nope, his idea is perfectly applicable to HuC, you just need to hide it behind a library call written in assembly-language.

This is what you don't seem to understand yet ... HuC relies on assembly-language libraries to do all of the hard-work, and then the HuC user calls those assembly-language libraries from C code when they want to do something.

It seems I've understood. I need to place my asm code in HuC system code pages (6 or 7) using #incasmlabel ([upd]: or #incasm) for that. Perhaps there is another way, which I do not know.

Last Edit: Jun 6, 2022 6:55:39 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

elmer
ANIKIIII!

Posts: 1,040

Getting data in assembly (HuC/ASM) Jun 7, 2022 19:54:39 GMT

Quote

Post by elmer on Jun 7, 2022 19:54:39 GMT

Jun 5, 2022 12:50:02 GMT 0x8bitdev said:

Jun 3, 2022 13:54:33 GMT elmer said:

I am trying to change PCEAS to be consistent and predictable about this so that we can just write ...

_Lev0_Layout:	
	.word _Lev0Scr0 - _Lev0_Layout

... and know that it will always work, but it'll take a couple of days of testing to make sure that the change doesn't break anything.

That would be great. For me, to have such a built-in ability to get a correct offset writting label1 - label2 would preserve universality of the generated code. And it's just logical to have a correct offset in the data within the asm file using a simple operation.

The changes are checked in, and github's automated build is available.

With the latest build label1 - label2 should give you an offset value (up to 64KB) ... but only in HuC and PCEAS when using the default settings.

There is now a new function to call to get a label's 32-bit address offset from the start of the ROM (or CD/SuperCD RAM).

If you need to calculate the ROM offset between two labels, then this is the best way to do it linear(label1) - linear(label2).

That method is guaranteed to work in all circumstanmces, and with different PCEAS options.

The reason is that while PCEAS is now consistant about updating the PAGE number correctly when CODE or DATA overflows a bank, there is also now a new option ".opt d+" to disable incrementing the PAGE number when the contents of a DATA section overflow from one bank to another.

Using the new options allows developers to guarantee that all labels in the DATA section are mapped into MPR3, which makes the banking subroutines a bit shorter and faster.

Both KickC and my ASM examples now use this new flag.

Jun 5, 2022 12:50:02 GMT 0x8bitdev said:

It seems I've understood. I need to place my asm code in HuC system code pages (6 or 7) using #incasmlabel ([upd]: or #incasm) for that. Perhaps there is another way, which I do not know.

Errrmmm ... "yes" and "no".

Are you talking about the asm data that you generate, or the asm library functions?

The data really has to go in the DATA section, which you can do simply with #incasm.

Your asm library code *could* be put in HuC's library banks, but you should only put the maplibfunc calls in LIB1_BANK, and then other code would probably go in LIB3_BANK.

If you're going to do that, you should probably talk to turboxray and see what he recommends.

Last Edit: Jun 7, 2022 21:49:55 GMT by elmer

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 8, 2022 16:31:21 GMT

Quote

Post by 0x8bitdev on Jun 8, 2022 16:31:21 GMT

Jun 7, 2022 19:54:39 GMT elmer said:

The changes are checked in, and github's automated build is available.

With the latest build label1 - label2 should give you an offset value (up to 64KB) ... but only in HuC and PCEAS when using the default settings.

Thanks! I will test it.

[upd]: Tested! It works!

Jun 7, 2022 19:54:39 GMT elmer said:

Your asm library code *could* be put in HuC's library banks, but you should only put the maplibfunc calls in LIB1_BANK, and then other code would probably go in LIB3_BANK.

If you're going to do that, you should probably talk to turboxray and see what he recommends.

I did not know such subtleties.

Ok, I'll keep that in mind.

Last Edit: Jun 8, 2022 16:49:22 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 8, 2022 17:52:25 GMT

Quote

Post by 0x8bitdev on Jun 8, 2022 17:52:25 GMT

elmer The new HuC broke one of the multi-dir scroll samples... And it depends on data size in asm file. With the '.opt d+' it works well.

Should that fixed routine - '_mpd_farptr_add_offset' work the same with '.opt d+' and without it?

[upd]: I've found the bug.

Such things work different in the new HuC with '.opt d+' and without:

---data.asm---

_arr:
    .word 1
    .word 2
    .word 3

----data.h----

#incasm("data.asm")

extern unsigned short arr[];

...
val = arr[ N ];
...

I've replaced the 'arr[ N ]' with my farpeek routine and everything works well.

Last Edit: Jun 8, 2022 20:10:16 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

elmer
ANIKIIII!

Posts: 1,040

Getting data in assembly (HuC/ASM) Jun 9, 2022 17:32:40 GMT

Quote

Post by elmer on Jun 9, 2022 17:32:40 GMT

Jun 7, 2022 19:54:39 GMT elmer said:

Your asm library code *could* be put in HuC's library banks, but you should only put the maplibfunc calls in LIB1_BANK, and then other code would probably go in LIB3_BANK.

Jun 8, 2022 16:31:21 GMT 0x8bitdev said:

I did not know such subtleties.

Ok, I'll keep that in mind.

Once again, this is another situation comes from a mix of the both the underlying HuC6280 CPU, and the historical foundation of the MagicKit library that HuC is based upon.

In HuC, LIB1_BANK is the only permanently-mapped bank for library-code, and so every library function that resides in LIB2_BANK or LIB3_BANK needs to put a small (18-byte) piece of code in LIB1_BANK in order to page the correct library bank into MPR5 and then call the appropriate library function itself.

Together with all of the startup code and actual library functions in LIB1_BANK, this makes LIB1_BANK a very congested bank of memory that is often overflowing and causing problems.

When HuC was written, the designers created the "procedure" system to work around this annoying problem for C code ... but the main MagicKit assembly-language libraries were never re-written to take advantage of the new capability.

From the point-of-view of creating new libraries for HuC, you're welcome to use procedures if you wish, either in pure assembly-language, or as inline-assembly within a C function.

Jun 8, 2022 17:52:25 GMT 0x8bitdev said:

[upd]: I've found the bug.

Such things work different in the new HuC with '.opt d+' and without:

val = arr[ N ];
I've replaced the 'arr[ N ]' with my farpeek routine and everything works well.

The bug is in your code ... you are assuming that HuC knows how to accesses banked-data, but it doesn't. The only reason that *ever* worked for you is that you got very, very lucky with the locations of various pieces of data in your example program.

While ".opt d+" recreated the lucky conditions in your example that made it work again, you cannot use that option and still write ".word label1 - label2", you have to write ".word linear(label1) - linear(label2)".

And even then ... your code would still be wrong because HuC does not know about the need to page data into MPR3, nor does it know how to deal with things if your data crosses over the end of a bank.

As you've found, the "farpeekw" is how you're supposed to read something from the DATA segment in HuC ... but remember, it still cannot currently handle data that crosses over the end of a bank.

You may well be the first programmer that has ever tried to get HuC to operate with far-data in this way, and you are getting to see some of the limits of the compiler.

This kind of thing is *easy* to do in assembly-language, but you keep on refusing to take the hint.

If you wish to keep on trying to do this stuff in HuC, may I suggest that you map your data into both MPR3 and MPR4 so that you can guarantee the availability of 8KB of data from any far-address that is banked into the physical address-space.

Doing that will mean having to restore MPR4 before calling another HuC function ... but that's no different from what you'd have to do in an assembly-language library function for HuC.

Jun 8, 2022 17:52:25 GMT 0x8bitdev said:

Should that fixed routine - '_mpd_farptr_add_offset' work the same with '.opt d+' and without it?

The assembly-language code in '_mpd_farptr_add_offset' is still correct, but the the offset value in your data will be wrong whenever a bank is crossed if you use ".opt d+" ... see above.

Last Edit: Jun 9, 2022 17:33:16 GMT by elmer

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 10, 2022 14:22:50 GMT

Quote

Post by 0x8bitdev on Jun 10, 2022 14:22:50 GMT

Jun 9, 2022 17:32:40 GMT elmer said:

Once again, this is another situation comes from a mix of the both the underlying HuC6280 CPU, and the historical foundation of the MagicKit library that HuC is based upon.

In HuC, LIB1_BANK is the only permanently-mapped bank for library-code, and so every library function that resides in LIB2_BANK or LIB3_BANK needs to put a small (18-byte) piece of code in LIB1_BANK in order to page the correct library bank into MPR5 and then call the appropriate library function itself.

Together with all of the startup code and actual library functions in LIB1_BANK, this makes LIB1_BANK a very congested bank of memory that is often overflowing and causing problems.

When HuC was written, the designers created the "procedure" system to work around this annoying problem for C code ... but the main MagicKit assembly-language libraries were never re-written to take advantage of the new capability.

From the point-of-view of creating new libraries for HuC, you're welcome to use procedures if you wish, either in pure assembly-language, or as inline-assembly within a C function.

Thanks, for the info!

Jun 9, 2022 17:32:40 GMT elmer said:

The bug is in your code ... you are assuming that HuC knows how to accesses banked-data, but it doesn't. The only reason that *ever* worked for you is that you got very, very lucky with the locations of various pieces of data in your example program.

~~HuC doesn't know how to access banked-data with the latest changes? Or it never worked correct with external data before?~~
[upd]: You are right! ...The more often you take a break from coding, the more discoveries you will find in your code!

BTW, if you meant that example with 'val = arr[ N ]', I did use that for ASM data in H file only. But some pieces of code have not yet been rewritten. The work in progress.

Jun 9, 2022 17:32:40 GMT elmer said:

While ".opt d+" recreated the lucky conditions in your example that made it work again, you cannot use that option and still write ".word label1 - label2", you have to write ".word linear(label1) - linear(label2)".

I know, it is obvious that using the '.opt d+' is incompatible with the correct calculation of label1 - label2. I don't use them together.

Jun 9, 2022 17:32:40 GMT elmer said:

And even then ... your code would still be wrong because HuC does not know about the need to page data into MPR3, nor does it know how to deal with things if your data crosses over the end of a bank.

As you've found, the "farpeekw" is how you're supposed to read something from the DATA segment in HuC ... but remember, it still cannot currently handle data that crosses over the end of a bank.

Yes, I remember about that sad issue.

Jun 9, 2022 17:32:40 GMT elmer said:

You may well be the first programmer that has ever tried to get HuC to operate with far-data in this way, and you are getting to see some of the limits of the compiler.

This kind of thing is *easy* to do in assembly-language, but you keep on refusing to take the hint.

If you wish to keep on trying to do this stuff in HuC, may I suggest that you map your data into both MPR3 and MPR4 so that you can guarantee the availability of 8KB of data from any far-address that is banked into the physical address-space.

Doing that will mean having to restore MPR4 before calling another HuC function ... but that's no different from what you'd have to do in an assembly-language library function for HuC.

If you meant the 'arr[ N ]' for far-data... As I mentioned. It's just a piece of code left over from when I did use ASM data blocks in an H file. I'm still rewriting data work.

If you meant the farpeekw ... The problem is that I need to peek data in one loop from different arrays (up to 3) that could potentially not be inside 8K... So it would require bank switching anyway. This is a point for optimization and may require data size limits for users.

[upd]: But on data init I will use farpeeks anyway without any optimizations. Because it doesn't affect run-time too much and simplifies code.

Of course it requires farpeekw fixing... There will be two farpeeks: built-in and fixed! The more farpeeks the better!

Last Edit: Jun 10, 2022 19:02:33 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) Jun 11, 2022 13:16:25 GMT

Quote

Post by 0x8bitdev on Jun 11, 2022 13:16:25 GMT

I was thinking about optimization ways, how to reduce the number of banks switching in the main tiles processing loops and respectively get rid of the farpeek calls there.

One of the ways is tiles caching per row/column of tiles with separate processing of map data and tiles data.
It requires additional '( Screen_Width_Pixels >> 3 ) + 1' bytes of RAM for cached data.

Pass #1: Map tiles indices caching
Pass #2: Tiles 4x4 caching
Pass #3: Blocks 2x2 caching

Each pass requires banks switching.
The #2,#3 passes can be combined into one banks switching.

As a result I'll have an array of screen tiles ready to load to BAT.

Thus, for maps with Tiles 4x4 it will take max 2-3 banks switching per row/column of tiles. For maps with Blocks 2x2 - 2. It's pretty good. In assembly it will be fast.

The advantage of separating of map data and tiles data processing is that there is no need to map 3 banks at once.
There is no neeed to map whole map data into memory at once +everything else. Two banks are enough. And a map data may exceed 3 banks!

I will map that part of a map that contains required map data using the same 24bit address + offset.

So there is almost no additional limitation on a map size in memory.

The limitation on the number of screens in height is:

For column ordered data:

Tiles 4x4: $2000 / ( Screen_Width_Pixels >> 5 ) / ( Screen_Height_Pixels >> 5 ) for 256x224: 146 screens
Blocks 2x2: $2000 / ( Screen_Width_Pixels >> 4 ) / ( Screen_Height_Pixels >> 4 ) for 256x224: 36 screens

For row ordered data: limited by MAPeD

The limitation on the number of screens in width is:

For row ordered data:

Tiles 4x4: the same 146 screens for 256x224
Blocks 2x2: the same 36 screens for 256x224

For column ordered data: limited by MAPeD

These limitations are for multi-directional maps only. The bi-directional ones will not have any HuC related limitations.

Last Edit: Jun 11, 2022 13:17:09 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

hyperfighting
Punkic Cyborg

Posts: 224

Getting data in assembly (HuC/ASM) May 12, 2023 15:21:29 GMT

Quote

Post by hyperfighting on May 12, 2023 15:21:29 GMT

Happy Weekend! Hope everyone has been having a great week!

Something has been on my mind for a while regarding arrays of Far Pointers I figured since 0x8bitdev referenced them in this thread I would post my question here as well.
(I have read the thread but I'm a bit lost when it comes to solving this problem)

My issue is I have a particular "switch based function" per sprite that I am hoping to eliminate.

Goal: Eliminate several functions to free up function name space (I think I said that correctly...I have learned we are limited to the number of functions we write...the trampolines lol...you get that error if too many functions are compiled)

Goal: Gain performance by eliminating my much loved switch statement....

This is an example of what I currently have going on:

NOTE: The far pointers are "SPReD_ZUK_chr(x)" sometimes a unique far pointer is not required EX: ENT_RUN and ENT_JUMP share the same far pointer

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, char far *sprite<__bl:__si>, short offset<__dx> );

void ZUK_SCALE_ATLAS ()
{
	switch (state[vramSlot])
	{
		case ENT_IDLE:	
			loadvram_scale_16to32(vramLoc[vramSlot]+0,   SPReD_ZUK_chr2, 0+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+256, SPReD_ZUK_chr2, 128+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+512, SPReD_ZUK_chr2, 256+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+768, SPReD_ZUK_chr2, 384+FRAME_LOC_32[frameIndex[vramSlot]]);
			break;		
			
		case ENT_RUN:
			loadvram_scale_16to32(vramLoc[vramSlot]+0,   SPReD_ZUK_chr3,   0+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+256, SPReD_ZUK_chr3, 128+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+512, SPReD_ZUK_chr3, 256+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+768, SPReD_ZUK_chr3, 384+FRAME_LOC_32[frameIndex[vramSlot]]);
			break;		

		case ENT_JUMP:
			loadvram_scale_16to32(vramLoc[vramSlot]+0,   SPReD_ZUK_chr3,   0+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+256, SPReD_ZUK_chr3, 128+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+512, SPReD_ZUK_chr3, 256+FRAME_LOC_32[frameIndex[vramSlot]]);
			loadvram_scale_16to32(vramLoc[vramSlot]+768, SPReD_ZUK_chr3, 384+FRAME_LOC_32[frameIndex[vramSlot]]);
			break;

	       ......
	}
}

what I would love to do is...something like this

void DRAW_SPRITE (FarPointerArray * FarPointerSpriteLabelArray)
{ 
   loadvram_scale_16to32(vramLoc[vramSlot]+0,   FarPointerSpriteLabelArray[Spriteindex],   0+FRAME_LOC_32[frameIndex[vramSlot]]);
   loadvram_scale_16to32(vramLoc[vramSlot]+256, FarPointerSpriteLabelArray[Spriteindex], 128+FRAME_LOC_32[frameIndex[vramSlot]]);
   loadvram_scale_16to32(vramLoc[vramSlot]+512, FarPointerSpriteLabelArray[Spriteindex], 256+FRAME_LOC_32[frameIndex[vramSlot]]);
   loadvram_scale_16to32(vramLoc[vramSlot]+768, FarPointerSpriteLabelArray[Spriteindex], 384+FRAME_LOC_32[frameIndex[vramSlot]]);
}

Is this possible to achieve by storing pointers to far pointers in an ASM table and referencing them somehow?

upd: It appears an array of these far pointers already exists (automatically generated from SPReD) but getting a pointer to a specific index is still a challenge

_SPReD_ZUK_SG_arr:	
	.word 512,  _SPReD_ZUK_chr0, bank(_SPReD_ZUK_chr0)
	.word 768,  _SPReD_ZUK_chr1, bank(_SPReD_ZUK_chr1)
	.word 512,  _SPReD_ZUK_chr2, bank(_SPReD_ZUK_chr2)
	.word 4096, _SPReD_ZUK_chr3, bank(_SPReD_ZUK_chr3)
	.word 3072, _SPReD_ZUK_chr4, bank(_SPReD_ZUK_chr4)
	.word 512,  _SPReD_ZUK_chr5, bank(_SPReD_ZUK_chr5)
	.word 1024, _SPReD_ZUK_chr6, bank(_SPReD_ZUK_chr6)
	.word 2048, _SPReD_ZUK_chr7, bank(_SPReD_ZUK_chr7)
	.word 2048, _SPReD_ZUK_chr8, bank(_SPReD_ZUK_chr8)
	.word 1024, _SPReD_ZUK_chr9, bank(_SPReD_ZUK_chr9)
	.word 1024, _SPReD_ZUK_chr10, bank(_SPReD_ZUK_chr10)
	.word 1536, _SPReD_ZUK_chr11, bank(_SPReD_ZUK_chr11)
	.word 1536, _SPReD_ZUK_chr12, bank(_SPReD_ZUK_chr12)
	.word 1536, _SPReD_ZUK_chr13, bank(_SPReD_ZUK_chr13)
	.word 1536, _SPReD_ZUK_chr14, bank(_SPReD_ZUK_chr14)
	.word 2048, _SPReD_ZUK_chr15, bank(_SPReD_ZUK_chr15)
	.word 2560, _SPReD_ZUK_chr16, bank(_SPReD_ZUK_chr16)
	.word 0, 0, 0	; skipped data

After spinning my wheels I'm starting to think my best approach is to modify the function to accept an array of sprite labels opposed to a single sprite label.

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, char far *sprite<__bl:__si>, unsigned char arrayIndex<__ax>, short offset<__dx> );

opposed to this:
loadvram_scale_16to32(vramLoc[vramSlot]+0,   SPReD_ZUK_chr2, 0+FRAME_LOC_32[frameIndex[vramSlot]]);

I would call this:
loadvram_scale_16to32(vramLoc[vramSlot]+0,   _SPReD_ZUK_SG_arr, 2, 0+FRAME_LOC_32[frameIndex[vramSlot]]);

This requires directly modifying the function which is kinda scary to me but I think it is my best bet...If I can get that to work then the plan would be to pass the array to the function via the __fastcall method of passing a sprite label.

Last Edit: May 16, 2023 12:36:10 GMT by hyperfighting

0x8bitdev
Punkic Cyborg

Posts: 233

Getting data in assembly (HuC/ASM) May 16, 2023 16:09:13 GMT

Quote

Post by 0x8bitdev on May 16, 2023 16:09:13 GMT

hyperfighting - as you may know Huc doesn't support far pointer arrays.

I can suggest the following:

Replace this declaration:

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, char far *sprite<__bl:__si>, unsigned char arrayIndex<__ax>, short offset<__dx> );

by this one:

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, unsigned char data_bank<__bl>, unsigned short data_offset<__si>, unsigned char arrayIndex<__ax>, short offset<__dx> );

Now you can pass a bank number and data offset as arguments to the function separately.
After that you need two parallel arrays of bank numbers and data offsets:

extern u8 data_banks[N];
#asm
_data_banks:
	.db bank(far_ptr1)
	.db bank(far_ptr2)
	.db bank(far_ptr3)
	...
#endasm

extern u16 data_offset[N];
#asm
_data_offset:
	.dw far_ptr1
	.dw far_ptr2
	.dw far_ptr3
	...
#endasm

Using these arrays you can arrange your data the way you want.

And then call the loadvram_scale_16to32:

loadvram_scale_16to32(vaddr, data_banks[farptr_index], data_offsets[farptr_index], offset);

Last Edit: May 16, 2023 16:09:48 GMT by 0x8bitdev

github.com/0x8BitDev/MAPeD-SPReD

hyperfighting
Punkic Cyborg

Posts: 224

Getting data in assembly (HuC/ASM) May 16, 2023 18:42:55 GMT

Quote

Post by hyperfighting on May 16, 2023 18:42:55 GMT

0x8bitdev - This is a HUGE help! THANK YOU!

Based on your insight I have an early example running and all the arms and legs are in the right places!

Its still early but I'm hoping to get this integrated across the project ASAP.

originally the prototypes were:

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, char far *sprite<__bl:__si>);

void __fastcall loadvram_scale_16to32(unsigned int vaddr<__di>, char far *sprite<__bl:__si>, short offset<__dx> );

They are now:

void __fastcall loadvram_scale_16to321(unsigned int vaddr<__di>, unsigned char data_bank<__bl>, unsigned short data_offset<__si>);

void __fastcall loadvram_scale_16to321(unsigned int vaddr<__di>, unsigned char data_bank<__bl>, unsigned short data_offset<__si>, short offset<__dx> );

with the additional procedure arguments adjusted within the body of the ASM function itself!

There is now a working function!

void SCALE_16_32( unsigned char data_bank, unsigned short data_offset )

{
	loadvram_scale_16to321(vramLoc[vramSlot]+0,   data_bank, data_offset,   0+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+256, data_bank, data_offset, 128+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+512, data_bank, data_offset, 256+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+768, data_bank, data_offset, 384+FRAME_LOC_32[frameIndex[vramSlot]]);

}	

called: SCALE_16_32(zuk_data_banks[2], zuk_data_offsets[2]);

I tried to fastcall it but no dice...my attempt was

extern unsigned char _bl;
extern short _si; 

void __fastcall __nop farPointer(unsigned char data_bank<__bl>, unsigned short data_offset<__si>);
void __fastcall __nop getBL (unsigned char data_bank<__bl>);
void __fastcall __nop getSI (unsigned short data_offset<__si>);

[code]void SCALE_16_32( void )

{
	loadvram_scale_16to321(vramLoc[vramSlot]+0,   _bl, _si,   0+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+256, _bl, _si, 128+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+512, _bl, _si, 256+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+768, _bl, _si, 384+FRAME_LOC_32[frameIndex[vramSlot]]);

}

Also tried:

void SCALE_16_32( void )
{
        loadvram_scale_16to321(vramLoc[vramSlot]+0,   getBL(_bl), getBL(_si),   0+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+256, getBL(_bl), getBL(_si), 128+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+512, getBL(_bl), getBL(_si), 256+FRAME_LOC_32[frameIndex[vramSlot]]);
	loadvram_scale_16to321(vramLoc[vramSlot]+768, getBL(_bl), getBL(_si), 384+FRAME_LOC_32[frameIndex[vramSlot]]);
}

called: SCALE_16_32(farPointer(zuk_data_banks[2], zuk_data_offsets[2]));

compiles in both cases but junk is displayed on the screen....

I can definitely work with the first instance of the function and if there is a way to fastcall it I can adjust later on.

Thanks Again!

0x8bitdev Punkic Cyborg Posts: 233	Getting data in assembly (HuC/ASM) May 19, 2023 7:21:56 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by 0x8bitdev on May 19, 2023 7:21:56 GMT hyperfighting , you are doing something strange in your 'fastcall' implementation... So you have the strange result.
	github.com/0x8BitDev/MAPeD-SPReD