|
Post by hyperfighting on Feb 5, 2023 20:30:52 GMT
Hi Guys,
This will look and sound terrible but here goes....
I have noticed _fastcall is accessible in HuC. To pretend I understand how this works would be a lie.
Tom on PC Engine Fans wrote:
Based on the description above if used under certain circumstances I can improve my codes speed pertaining to executing certain functions.
I have a few fastcall functions compiling and appearing to function as expected. That said I'm not sure if what I am doing is simply calling the functions in an alternative way to my typical C method without any real gain.
void __fastcall check_collision();
#asm .proc _check_collision
call _playerMovement200 call _collisionPlayer rts
.endp #endasm
void __fastcall quick_joystick();
#asm .proc _quick_joystick
call _MapJoy call _JoyInput rts
.endp #endasm
void __fastcall quick_frameUpdate();
#asm .proc _quick_frameUpdate call _FrameUpdate rts
.endp #endasm
They are all basically the same design. My aim is to give you an idea of where I am using the calls. I've isolated potential bottlenecks and wrapped them in fastcall to hopefully gain performance.
Any insights would be greatly appreciated. Thanks for checking this post out. Sorry for how horrid this must look to some. I'm out of my depth here.
|
|
|
Post by turboxray on Feb 6, 2023 4:03:50 GMT
If you're using the newer HuC, which it looks like your are, then __fastcall also has some other options.
__fastcall __nop myFunc(unsigned char temp<__ax>); allows you to not generate a function call but it will update the assigned memory address __ax (which in this case is a variable, but you could use it for registers too). Likewise, there's also __fastcall __macro myFunc(unsigned char temp<__ax>);
It looks like this
int __fastcall __macro test222(unsigned char channel<__al>); #asm myFunc.1 .macro ; code lda <__ax ldx <__ax+1 .endm #endasm
It basically amounts to an inline function rather than a call, which is faster but eats up some mem (rom). INT is the return type, so LSbyte goes in A and MSbyte goes in X. If it was a char, then just return something in A and clear X. It doesn't enforce your code to do this, so make sure you do.
And finally, you don't need to pair __fastcall with assembly. It can be used with normal C defined functions (as long as you don't use __macro). In that case, it will just use the faster parameter passing method.
|
|
|
Post by hyperfighting on Feb 6, 2023 16:26:27 GMT
turboxray many thanks for coming to the rescue on this one. I am using HuC (v4.00.4.gb95184f, 2022-11-11) currently. In my case I'm very if not all the way green on the __nop implementation but happy to know it exists if I ever level up to use it. I think based on what I have been attempting the __macro method suits best for possibly gaining some speed on some function calls! I adapted this based on your example...It compiles but I can't seem to figure out how to call it in my C code? void __fastcall __macro quick_macro_joy();
#asm myFunc.1 .macro call _MapJoy call _JoyInput ;lda <__ax ;ldx <__ax+1 .endm #endasm I also have a standard C function that passes 3 variables so I attempted to append __fastcall to it see if it would compile and boost the parameter passing but I get " missing fastcall register" as a compile error void __fastcall ANIM_INSTRUCTIONS (unsigned char ATLAS, unsigned char STATE, const unsigned char LOOKUP[]) **Edit I've been mucking around....I got an error stating something like "__fastcall can only be used in prototypes" So I tried this... When the code below attempts to compile "****** implementation of fastcall functions in C not supported yet ******" so is this a case my HuC version doesn't allow C and Macro support? If that's the case any suggestion on a version to try out? void __fastcall ANIM_INSTRUCTIONS (unsigned char _ATLAS<__di>, unsigned char _STATE<__dx>, const unsigned char _LOOKUP[]<acc>);
void ANIM_INSTRUCTIONS (unsigned char ATLAS, unsigned char STATE, const unsigned char LOOKUP[])
elmer does your build of HuC support the features turboxray mentions?
|
|
|
Post by elmer on Feb 11, 2023 16:57:03 GMT
elmer does your build of HuC support the features turboxray mentions? I believe so. I have all of turboxray's checked-in changes from October 2021, and he's not checked-in anything since then. FWIW using __fastcall to pass function parameters in memory locations rather than on the C stack is definitely a time-saver in HuC ... but only if you actually pass any parameters! And in practical terms, you can mostly get the same speedup just by using global variables instead of local variables, unless you're writing assembly-language library functions where you're temporarily using the System Card variables __ax,__bx,__cx,__dx,__si,__di ... which is what turboxray is usually writing. From what I see in this thread you're trying to use it for some function calls that have no parameters, which is absolutely pointless, and it's also kinda pointless if the function is only called once or twice in each game loop. At the end-of-the-day, you can only judge the effectiveness of optimizations like these if you actually look at, and can understand, the assembly-language code that HuC produces when it translates your C code into asm.
|
|
|
Post by hyperfighting on Feb 14, 2023 23:20:28 GMT
elmer and turboxray thanks so much for elaborating on the __fastcall function type. I've been aware of it but failed to have a clear understanding of how it works. I think I'm getting a bit closer to getting it now. However please prepare for some terrible insights as I know I've likely got stuff wrong. My exploratory coles notes are: 1. System Card variables "__ax,__bx,__cx,__dx,__si,__di ..." are the variables passed to __fastcall functions...also registers... 2. If you can effectively analyze the ASM code HuC produces you can potentially generate more efficient code integrating C functions with ASM __fastcall. 3. If you have a function that takes parameters and this function is accessed several times via your main loop. Modifying the function so that its variables are accessed via __fastcall can give you faster access speeds. I do have some function calls that pass parameters to them. I would love to integrate __fastcall with these functions but I haven't been able to get something to compile. EX: HuC (v4.00.4.gb95184f, 2022-11-11)
void changeSubState1 ( unsigned char SUB_STATE1) //Standard Function
//I modify this function with the inclusion of __fastcall I get an error " __fastcall can only be used in prototypes" //So I make a prototype before calling the function...
void __fastcall changeSubState1 ( unsigned char _SUB_STATE1<__bh>);
void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; }
//In this case I get "implementation of fastcall functions in C not supported yet" 4. __fastcall Macro's as per Turboxray's example. "faster but eats up some mem (rom)" elmer I hear you loud and clear thank you for saving me from wrapping functions with no parameters in __fastcall. However, I have to ask would those functions benefit if ever so slightly by being wrapped in a macro? I can get the code below to compile but I can't figure the syntax to call the macro. Everything I try seems to fail. void __fastcall __macro quick_macro_joy();
#asm myFunc.1 .macro call _MapJoy call _JoyInput .endm #endasm 5. __fastcall __nop as per Turboxray's example. __fastcall __nop myFunc(unsigned char temp<__ax>); "allows you to not generate a function call but it will update the assigned memory address __ax (which in this case is a variable, but you could use it for registers too). Does this mean where ever "myFunc(unsigned char temp);" is referenced the temp variable will use __ax under the hood for faster access speeds?
|
|
|
Post by elmer on Feb 16, 2023 0:47:40 GMT
1. System Card variables "__ax,__bx,__cx,__dx,__si,__di ..." are the variables passed to __fastcall functions...also registers... Nope, AFAIK you can use any zero-page variable, it's just that those are the commonly-available zero-page variables for library functions. 2. If you can effectively analyze the ASM code HuC produces you can potentially generate more efficient code integrating C functions with ASM __fastcall. If you can understand assembly-language, then you can see how gawd-awful the code that HuC produces really is, and you'll see that using __fastcall is the least of your worries. 3. If you have a function that takes parameters and this function is accessed several times via your main loop. Modifying the function so that its variables are accessed via __fastcall can give you faster access speeds. Yes, but you can get 99% of the way there by using global variables instead of local variables. Function parameters are just another kind of local variables. void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; } No, no, no! Making this a __fastcall might save you 50 cycles per call. Being able to understand assembly language, and then looking at what the assembly language is that HuC generates for that particular 2 lines of code, and then rewriting it ... well that would save you hundreds and hundreds of cycles! Please stop looking at __fastcall as some magical panacea to make your code significantly faster ... it isn't. It is a very useful tool for making code that is already fast, a tiny bit faster. You're not at the skill-level (yet) where this tool is important for you to learn/use ... you've still got a long way to go.
|
|
|
Post by hyperfighting on Feb 16, 2023 15:55:10 GMT
elmer Many thanks for more clarifications on this __fastcall business. I am in agreement you logic is sound here! One important take away is "Don't think you can wrap your shitty code in __fastcall and think your problems will magically go away!" I also see that a practical use is to improve already fast code by augmenting with the use of __fastcall and the minor versus major gains you demonstrate by having a proper handle on analyzing what's going on under the hood. I have adopted global variables and am happy at this point with the structure of my code base. In my case I would love to document some working examples of integrating __fastcall with C. At this stage I have identified two scenarios I cannot seem to integrate into my existing HuC project. I totally understand that the intention of why I want use these "features" may be unacceptable to the more experienced coders but from my standpoint I would love to document if the "Macro functionality" and "augmentation of functions passing variables with __fastcall" is indeed possible within your current HuC build. If there is a way to build the examples I have listed above I will at least have some documentation on how to successfully compile them. Any more experienced new comer reading this may get some good insights on features that can help them improve their existing code. Last thought is if my code was in a complete state and I was already happy with performance saving 50 cycles would be an improvement all be it wrong based on hundreds of cycles but if at the time I didn't have the proper no how to rebuild those functions in ASM I could still get a marginal boost in performance to the best of the C implementations available and my level of skill. I suspect that the features I'm trying to access aren't yet available in HuC based on my compilation errors. Please let me know if I can compile the examples I listed as I would love to have some examples documented. **Edit If we had documented cases where utilitarian functions were converted to ASM that could be very useful to the C guys. If in the ASM community the function below is known to be leaps and bounds better in ASM what would be the easiest implementation of this ASM in C. I understand this crosses a line of doing someone's work for them. All I can say is: - I would never expect anyone to do anything they don't want to do. - I don't know ASM. - From a C standpoint I think this is a very basic operation that could help people get some clarity on the ASM structure and potentially save hundreds of cycles!? EX: Functions like this are very helpful for me to change the "state" of the player and keep record of the players "previous state" This function can be spun with unique variables to EX: changeState, changeSubState1, changeSubState2, changeSubState3, changeAttackState, changeAttackSubState1 etc void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; } I'm just spit balling please don't hate me.
|
|
|
Post by elmer on Feb 16, 2023 19:55:14 GMT
I'm just spit balling please don't hate me. I have no hate at all for you, please don't think that I do! But neither do I have a desire to spend my time answering questions about how to use HuC ... that's something for turboxray and your HuC-using fellow developers. As I have mentioned many, many times before, I don't actually use HuC, I just help fix bugs in it when they're found. In my case I would love to document some working examples of integrating __fastcall with C. Excellent, that's a laudable goal, and I'm sure that it will be useful knowledge to post when you've got it all working! Perhaps you should be talking to turboxray for help, since he's the person that seems to have put you on this particular path. It would have helped if he hadn't given you some inaccurate information right from the start, but I kinda doubt that's what is actually causing you problems. When he said ... INT is the return type, so LSbyte goes in A and MSbyte goes in X. If it was a char, then just return something in A and clear X. He was thinking like an assembly-language programmer, and not a HuC programmer. In HuC it's the opposite ... the hi-byte of an int (the MSbyte) goes in A, and the lo-byte of an int (the LSbyte) goes in X. If we had documented cases where utilitarian functions were converted to ASM that could be very useful to the C guys. If in the ASM community the function below is known to be leaps and bounds better in ASM what would be the easiest implementation of this ASM in C. Errrmmm ... if you don't know ASM, and don't want to learn ASM, then I do wonder what the value of posting ASM conversions is! In this case, I've already posted about exactly this kind of optimization before in response to DarkKobold , but once again, just in case someone is interested ... void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; } OK, so if you look as the .lst file that HuC/pceas outputs for your project, then after removing the conditionals and macro names, you'd see something like the following asm code for _changeSubState1 ... .proc _changeSubState1 dec <__stack sta [__stack] dec <__stack sax sta [__stack] sax ldx #low(_prevSubState1) lda #high(_prevSubState1) clc sax adc _vramSlot sax adc #0 dec <__stack sta [__stack] dec <__stack sax sta [__stack] sax ldx #low(_subState1) lda #high(_subState1) clc sax adc _vramSlot sax adc #0 stx <__ptr sta <__ptr+1 lda [__ptr] tax cla pha phx lda [__stack] tax ldy #1 lda [__stack],Y stx <__ptr sta <__ptr+1 pla plx sta [__ptr] sax inc <__stack inc <__stack ; => 12 ldx #low(_subState1) lda #high(_subState1) clc sax adc _vramSlot sax adc #0 stx <__ptr sta <__ptr+1 ldy #0 lda [__sp],Y tax cla sax sta [__ptr] sax inc <__stack inc <__stack ; => 12 rts .endp
That's 64 instructions. Let's replace that with something that an assembly-language programmer might write ... .proc _changeSubState1 txa ldx _vramSlot ldy _subState1, x sta _subState1, x tya sta _prevSubState1, x rts .endp
That's 7 instructions. Can you guess which version is both shorter and faster?
|
|
|
Post by hyperfighting on Feb 17, 2023 15:09:36 GMT
elmer - Thank you so much for staying on top of HuC and making it it available with all the fixes and improvements! Of course a huge shout out to all the contributors to HuC! I'm glad you don't hate me! I know beginner questions from novice programmers can be annoying for OG's deep in the game! Dually noted.. On the HuC front I won't bug you regarding it's usage! I guess regarding the "_changeSubState1" example you must mean that the 64 instruction version of the code is the superior version! It's longer listing must mean it is using more optimization tricks making it faster!!! The ASM routine you show must bog the system down with it's lack of complexity and barebone approach to what is obviously a very complex problem! Kidding!For the record it is a complex problem for me! Thank you so much for the solid example and explanation. One day I will attempt to document your code to see if I can grasp the instructions etc. As of now I have just about replaced every instance of "_changeSubState1" and the code you have shared will help eliminate a large portion of the functions that pass parameters! This code seems to be working...I plan to modify it for the other functions that have different variable names etc.. //This simple C code breakdowns to something like 64 instructions! void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; }
//This is the same code in ASM but it is 7 instructions. void __fastcall quick_changeSubState1 ( unsigned char _SUB_STATE1<__dx> ); #asm .proc _quick_changeSubState1.1 txa ldx _vramSlot ldy _subState1, x sta _subState1, x tya sta _prevSubState1, x rts .endp #endasm
//The shorter C code expands to be much larger under the hood. The longer ASM code is what it is under the hood. The irony or lack there of I'm not sure. Hopefully someone who comes across this thread can employ this approach for simple functions like the one above!
|
|
|
Post by elmer on Feb 17, 2023 19:33:46 GMT
Hopefully someone who comes across this thread can employ this approach for simple functions like the one above! Your use of __fastcall in that way for this function just makes the code slower, and it destroys an unrelated memory location! So my questions for you are ... 1) Why did you make this a __fastcall? If you can't come up with a better reason than "Well, turboxray said they're faster.", then you don't yet have the required understanding and knowledge to use them. 2) Why did you put unsigned char _SUB_STATE1<__dx>? What does the <__dx> mean? If you don't have a good answer, and trust me, you don't, then you don't yet have the required understanding and knowledge to use them. <sigh> Just in case you or someone else can actually learn anything from this, here are 3 different ways to do the optimization correctly ... a) This produces *EXACTLY* the same code in HuC as the __fastcall version when it's used ... void quick_changeSubState1 ( unsigned char _SUB_STATE1 );
#asm _quick_changeSubState1 .proc txa ldx _vramSlot ldy _subState1, x sta _subState1, x tya sta _prevSubState1, x rts .endp #endasm b) This is a totally pointless use of __fastcall, but at least it's a correct use ... void __fastcall quick_changeSubState1 ( unsigned char _SUB_STATE1<acc> );
#asm _quick_changeSubState1.1 .proc txa ldx _vramSlot ldy _subState1, x sta _subState1, x tya sta _prevSubState1, x rts .endp #endasm c) This produces the fastest possible code in HuC, but it takes considerably more memory in your program. It's probably not worth using, but the technique can make sense *IF* you're able to profile your code and really understand where the bottlenecks are. void __fastcall __macro quick_changeSubState1 ( unsigned char _SUB_STATE1<acc> );
#asm _quick_changeSubState1.1 .macro txa ldx _vramSlot ldy _subState1, x sta _subState1, x tya sta _prevSubState1, x .endm #endasm
|
|
dogen
Deep Blooper
Posts: 30
|
Post by dogen on Feb 17, 2023 19:37:45 GMT
you don't have to know how to write it to get an idea of what it's doing. just look up what those 6502 instructions do. (nothing exotic, trust me)
|
|
|
Post by elmer on Feb 17, 2023 19:51:17 GMT
just look up what those 6502 instructions do. (nothing exotic, trust me)Exactly! In the 1980s 12-year-olds were learning 6502 and/or Z80 assembly language. It's simple. It has to be, or those old processors designed in the 1970s couldn't have handled it. Its biggest problem is that it's so darned simple that it takes lots of instructions to do things that you can do in a single line in a high-level language. It's slower to write/create code in assembly-language, but it's not particularly difficult. Modern high-level languages are (IMHO) far more complex and complicated to use.
|
|
|
Post by hyperfighting on Feb 18, 2023 14:50:56 GMT
"(PC) Engine, (PC) Engine Number 9 On The New York Transit Line. If My Train Falls Off The Track? Pick it Up, Pick it Up! Back On The Scene Crispy And Clean!" -Black Sheep elmer thanks for putting this train back on the right track! To answer question 1: I believed to interface ASM with HuC you have to use __fastcall (I was obviously wrong) To answer question 2: I noticed 'x' in your ASM source so I used _dx and it worked! (I was obviously wrong) I know both of the answers equal 0% on the test but there they are. Epic fail! Luckily we have a great teacher (you) and if you learn from your mistakes you don't actually fail in the end! I've added your source to my project and took dogen 's advice and attempted to make heads or tails of it. From my perspective this is just enough of an example to get the wheels turning on understanding the in's out's of what's happening. It's just enough for me to attempt to comment... //The purpose of this function is to track the previous state of the player. //We always know what the player is doing and what the player has done.
void changeSubState1 ( unsigned char SUB_STATE1) { prevSubState1[vramSlot]=subState1[vramSlot]; subState1[vramSlot]=SUB_STATE1; }
//The Test of documenting this C function in ASM
void quick_changeSubState1 ( unsigned char _SUB_STATE1 );
#asm ;_quick_changeSubState1 .proc //This does not compile .proc _quick_changeSubState1 ;The name of a "C function" is called a procedure in ASM txa ;transfer X to accumulator - X is __SUB_STATE1 now its in the accumulator ldx _vramSlot ;load X - //X is "VramSlot" which is used as the index to our array ldy _subState1, x ;load Y - //Y is the value of "subState1[VramSlot]" - This is where the old value of the SubState is stored until it eventually is pushed to "prevSubState1[vramSlot]" sta _subState1, x ;store accumulator //This must be like "subState1[VramSlot]=accumulator" where _SUB_STATE1's value is in the accumulator. ;I don't quite get this but I do get that _SUB_STATE1 is the first thing we pushed to the accumulator in "txa" above tya ;transfer Y to accumulator //Move value stored in Y to Accumulator this is the previous value of subState1[VramSlot] sta _prevSubState1, x ;store accumulator //This must be like "prevSubState1[VramSlot]=accumulator" which is y which is the previous value... rts ;This is basically a "return;" in C .endp
#endasm In your example 2 the pointless __fastcall. It was not pointless to me... As per 0x8bitdev 's game prototype I tried my hand at following one of his __fastcalls that in my interpretation was faster way of calling a C swtich() statement. void __fastcall quick_entity200 ( u8 _ind<__dx> ); // This was my previous declaration __dx variable
void __fastcall quick_entity200 ( unsigned char _ind<acc> ); // I modified the function to use acc and it works great //As per Elmer this might be a pointless use of __fastcall??
#asm .proc _quick_entity200.1 txa asl a tax jmp [_process_entity_func_arr200, x] .endp
_process_entity_func_arr200:
.dw _check_null .dw _proPlayer .dw _proPlayerCopy .dw _proStar .dw _proEnemy
_check_null:
call _DrawEntity rts
_proPlayer:
call _ProcessPlayer call _DrawEntity rts
_proPlayerCopy:
call _ProcessPlayerCopy call _DrawEntity rts
_proStar
call _ProcessStar call _DrawEntity rts
_proEnemy:
call _ProcessEnemy call _DrawEntity rts
#endasm Finally thanks for the macro example! It hasn't been applied if I ever level up to code profiling I will consider pulling this functionality out and using it!
|
|
|
Post by elmer on Feb 18, 2023 18:17:37 GMT
I've added your source to my project and took dogen 's advice and attempted to make heads or tails of it. From my perspective this is just enough of an example to get the wheels turning on understanding the in's out's of what's happening. It's just enough for me to attempt to comment... Excellent job! Now that you've identified what the purpose of each instruction is in that simple example, you're probably a little less scared of the assembly language boogeyman, and a little more comfortable with the idea of reading a 6502 book (or web page) to understanding what those instructions actually mean ... you're already 80% of the way there! Congratulations, you're taking some significant steps in expanding your capability to get the most out of the PC Engine. As per 0x8bitdev 's game prototype I tried my hand at following one of his __fastcalls that in my interpretation was faster way of calling a C swtich() statement. OK, now this is a much more interesting example for a grumpy-old-man to pontificate on! First of all, as you expected, I'm going to say that this is a pointless use of a __fastcall. That's because in HuC, when you only pass a single parameter to a function, then HuC always puts it into the "acc" (i.e. the HuC accumulator, i.e. the A and X registers in assembly language). This is documented in huc/doc/huc/huc_doc.htm in the "C/asm interface" section. The quick_entity200() code that you list is a very nice example of using a jump table to optimize the implementation of a state-machine! The only problem that I see in it is my concern about where the HuC compiler is actually going to put most of the code that is generated. For normal use, where the _process_entity_func_arr200 table, and the functions themselves, are only used by the quick_entity200() function, then you'd want to move the .endp down to the end of the block of code, because otherwise they're being assembled into the precious memory used for library code. So, to put everything into a single block of memory ... #asm .proc _quick_entity200.1 txa asl a tax jmp [_process_entity_func_arr200, x]
_process_entity_func_arr200: .dw _check_null
; ... lots of stuff removed for easier reading
_proEnemy:
call _ProcessEnemy call _DrawEntity rts
.endp #endasm Now, if the _process_entity_func_arr200 table, and the functions themselves are actually used by other state-machines functions apart from quick_entity200(), then you could either just leave the code as you had it written, and hope that you don't run out of memory in the library bank, or you can group all of the state-machines together into a single bank, but that's a more-advanced level of assembly-language coding that I'd prefer to avoid discussing right now, partly because it would need a much better example to show how and why to use it.
|
|
|
Post by turboxray on Feb 19, 2023 3:55:40 GMT
INT is the return type, so LSbyte goes in A and MSbyte goes in X. If it was a char, then just return something in A and clear X. Looks like I got that backwards. That was a copy/paste of a piece of test code. Sorry, was in a rush (some of us are still gainfully employed busy software devs and aren't retired.. so no free unfortunately). It's a bit manual (and ugly, but I mean this is HuC.. it's not even full C, so..) The setup: extern unsigned int _ax, _bx, _cx;
void __fastcall __nop getAX (unsigned int tmp1<__ax> ); void __fastcall __nop getAX_BX (unsigned int tmp1<__ax>, unsigned int tmp2<__bx> ); void __fastcall __nop getAX_CX_BX (unsigned int tmp1<__ax>, unsigned int tmp2<__bx>, unsigned int tmp3<__cx>);
void changeSubState1 ( void ) { prevSubState1[vramSlot] = _ax; subState1[vramSlot] = _bx; subState2[vramSlot] = _cx; }
Example of usage.. int main() { changeSubState1(getAX_CX_BX(1,2,3)); }
Like has already been mentioned; optimize where it counts. If you're calling a function a lot in a single NTSC video frame, then the savings adds up. If not, then it's potentially a waste for the amount of refactoring work. Like any optimization and voodoo stuff, there's a risk here too: One; this isn't recursive unless you manually save these vars to the stack (or a stack). I doubt people are writing recursive functions in HuC, but it should be said. Two; some other function library call in HuC might ( will) trample these (like the map function, as an example). So it's probably a good idea to use your own reserved defined labels/variables for this kind of thing. What's the advantage of this, aside from the faster calling convention? Well, the _ax, _bx, and _cx are treated the same as if they were static defines inside the function - as in, they aren't converted to "stack variables" which is slower.. if you're writing C code in your function. Such as loops, etc. You can do the same thing with static vars declared inside your function; this just gives you a one and down want to get values into static variables for the functions to use. Can use globals (gross..) - here you go: The setup: void __fastcall __nop getchangeSubState1Vars (unsigned int tmp1<_changeSubState1Var0>, unsigned int tmp2<_changeSubState1Var1>, unsigned int tmp3<_changeSubState1Var2>);
unsigned int changeSubState1Var0; unsigned int changeSubState1Var1; unsigned int changeSubState1Var2;
void changeSubState1 ( void ) { prevSubState1[vramSlot] = changeSubState1Var0; subState1[vramSlot] = changeSubState1Var1; subState2[vramSlot] = changeSubState1Var2; }
Example usages: changeSubState1(getchangeSubState1Vars(1,2,3));
So it solves the problem if you're using internal variables and they're getting trampled on by other HuC builtin library functions (or other functions you write that also take them as parameters). Macros work the same way; you can substitute __nop for __macro and have code "modify" your input data. Like __nop, __macro doesn't not call any functions. Ever wanted to pass a long pointer to a C function? As in, a label to a sprite? Either of these two __fastcall types will allow that. Could do you this without it? Definitely, because they're globals. But it's cleaner and easier to read, and has more advantages than the simple example I showed here. But like what's already been said; you can't just use this stuff and expect performance gains (I wish it were that magical haha). You have to know what obstacles you're trying to over come. Sometimes the performance approach it just straight up simple assembly substitution, like the array example - but sometimes (working with HuC) the issue is getting the right data to the right place and the right access. Having a fast asm array routine, or anything like that, isn't going to help you if you also can't get access to the data you need. I know that's kind of vague without context, but DK ran into similar issues once he started writing small array routines in inline assembly - we made a far pointer fastcall macro to get access to data. The recent changes gone into HuC builds newer than 3.98 or such, were specifically to address this. There's still more work to be done, but that's priority one in my opinion. Because you can always learn a tiny bit of assembly that can go a long way to performance gains. DK has had direct experience. I know dogen has been diving into assembly too. The more tutorials and such that can be written, for just some basic optimizations, the better off the HuC user is. That's because in HuC, when you only pass a single parameter to a function, then HuC always puts it into the "acc" (i.e. the HuC accumulator, i.e. the A and X registers in assembly language). Sure, but even if you don't use it - HuC will immediately save it to the parameter stack as the first thing into the function. Saving to the parameter stack doesn't destroy A:X, but it isn't pointless if don't want the wasted cycles and bytes on that.
|
|