@everyone this my personal synopsis based on the information shared in this thread to this point.
I'm hoping anyone jumping in here will get some clear insights.
That said admittedly, I am the weakest link among all the great programmers contributing to this thread so I may have got some stuff wrong.
elmer - Huge thanks again for all of your insights! You have really clearly defined some invaluable information about the hard truths regarding HuC and ASM!
Also that was pure gold regarding the placement of ".endp" I obviously would have never known that was a thing regarding overlapping the memory region etc...
1. The ".lst file that HuC/pceas outputs" will always be expand to more instructions than writing in assembly.
Example 1: Backing up a variable and changing a variable - In this case used for tracking the players state and previous state.
2A. This simple C function breaks out to something like 64 instructions.
void changeSubState1 ( unsigned char SUB_STATE1)
{
prevSubState1[vramSlot]=subState1[vramSlot];
subState1[vramSlot]=SUB_STATE1;
}
2B. The function above translates to this function in Assembly. In this case you get 7 instructions to perform the same task.
//The Test of documenting this C function in ASM
//Take these comments from the perspective of a beginner these are by no means pro comments
//The function itself is sound as it was built by Elmer
//The Topic of this thread is "Beginner Fastcall Advice" this function demonstrates to call ASM in HuC it is not necessary to use Fastcall.
void quick_changeSubState1 ( unsigned char _SUB_STATE1 );
#asm
;_quick_changeSubState1 .proc //This does not compile
.proc _quick_changeSubState1 ;The name of a "C function" is called a procedure in ASM
txa ;transfer X to accumulator - X is __SUB_STATE1 now its in the accumulator
ldx _vramSlot ;load X - //X is "VramSlot" which is used as the index to our array
ldy _subState1, x ;load Y - //Y is the value of "subState1[VramSlot]" - This is where the old value of the SubState is stored until it eventually is pushed to "prevSubState1[vramSlot]"
sta _subState1, x ;store accumulator //This must be like "subState1[VramSlot]=accumulator" where _SUB_STATE1's value is in the accumulator.
;I don't quite get this but I do get that _SUB_STATE1 is the first thing we pushed to the accumulator in "txa" above
tya ;transfer Y to accumulator //Move value stored in Y to Accumulator this is the previous value of subState1[VramSlot]
sta _prevSubState1, x ;store accumulator //This must be like "prevSubState1[VramSlot]=accumulator" which is y which is the previous value...
rts ;This is basically a "return;" in C
.endp
#endasm
turboxray - Many thanks! For your solid explanations and use cases for __Fastcall.
Note: As a rule I am leaning toward interfacing all ASM with __fastcall due to the points listed below.
3A. It sounds like there is a benefit of using "__fastcall" on functions that accept one parameter. Without "__fastcall" you are saving to the parameter stack. - "Saving to the parameter stack doesn't destroy A:X"
3B. As a style perspective, using "__fastcall" as the primary conduit to interface ASM with HuC isn't that bad of an idea. You can search "__fastcall" and all of your ASM based functions will be listed.
4. extern vs global variables.
-extern variables exist under the hood and are used for library functions. You can also use them. If you use them and weird stuff happens they are getting "trampled" by library functions and you can resort to global variables instead.
-The benefit of extern varibales is they "are treated the same as if they were static defines inside the function - as in, they aren't converted to "stack variables" which is slower"
-global variables lack the risk of being "trampled" by library functions but they are stack variables and are slower.
-you might need to make additional global variables EX: My globals are in an array format and I couldn't get those variables to plug into the example turboxray provided so I made new standalone global variables. This cost me the space of two new variables I wouldn't have used otherwise.
Both methods were tested and are fully working examples - In the end the extern method is what I am currently using I believe this is the hybrid of C and __Fastcall you get a C function that uses __fastcall to pass parameters.
//Setup
extern unsigned int _ax, _bx, _cx;
void __fastcall __nop getStateVars (unsigned char tmp1<__ax>, unsigned char tmp2<__bx>);
void quick_changeSubState1 ( void )
{
prevSubState1[vramSlot] = _ax;
subState1[vramSlot] = _bx;
}
//Application
quick_changeSubState1(getStateVars(subState1[vramSlot],NOT_ASSIGNED));
For the record I adapted the above function for the following functions I haven't seen any broken code or oddities to this point.
Note: "quick" is my labeling to separate C functions from "special C functions that take __fastcall as a parameter"
- void quick_changeState ( void )
- void quick_changeSubState1 ( void )
- void quick_changeSubState2 ( void )
- void quick_changeSubState3 ( void )
- void quick_changeAtkState ( void )
- void quick_changeAtkSubState1 ( void )
- void quick_changeGravityState ( void )
The question now is is the __Fastcall above using Extern Variables more efficient than the Assembly example? This I do not know? Are the examples above still breaking up into 64 instructions or is __Fastcall somehow circumventing the compilation to be optimal?
It has not fallen on deaf ears "Only optimize where it counts" in my scenario I have few function that accept parameters and I am aware that __fastcall can make passing parameters more efficient so I was curious to the application.
The final function to tackle parameter passing is this one:
void ANIM_INSTRUCTIONS (unsigned char ATLAS, unsigned char STATE, const unsigned char LOOKUP[])
turboxray The first two variable are straight forward using the example above but I'm clueless as to the incorporating "const unsigned char LOOKUP[]" into the format above can it be done with extern variables?
5. __fastcall supports "spritelabels"! - uncharted territory for me
6. __fastcall __macro is a faster function call at the expense of ROM - uncharted territory for me
Example 2: A C switch statement converted to ASM
Thanks to
0x8bitdev and
elmer we have an example of a C switch converted to ASM. (This is more up to date than the game prototype at present due to Elmer's observation of the placement ".endp")
Here's what I know... I was observing
0x8bitdev 's game prototype and noticed a pattern.
In my archaic mind I have embedded "ASM GOOD!" - "C BAD!" (I love C and clutch it for dear life)
Apply the logic in 1. above - the C switch must breakout into several more instructions than necessary so we get some savings with the ASM/C hybrid below.
So I attempted to adapt it to my project.
//Setup
void __fastcall quick_entity_process ( unsigned char _ind<acc> );
#asm
.proc _quick_entity200.1
txa
asl a
tax
jmp [_process_entity_func_arr200, x]
_process_entity_func_arr200:
.dw _check_null ;switch case 1
.dw _proEnemy ;switch case 2
; ... lots of stuff removed for easier reading
_proEnemy:
call _ProcessEnemy //C function call
call _DrawEntity //C function call
rts
.endp
#endasm
//Application
quick_entity_process(entityType[vramSlot]);
If anything I wrote is of interest it is ideal to re-read the thread and get the true insights from the original authors.