|
Post by dshadoff on Oct 9, 2021 18:44:07 GMT
Part of the process of "fixing" things like this, includes the identification of specific example cases where the output is excessively large/slow/broken (etc.).
In some cases, there are likely improvements which can be made. In some cases, it would take an immense amount of effort to improve, because of the original compiler (or other foundational concepts) which it is based upon. In other cases, it will not be able to be improved without removing its "general purpose" aspects. In yet other cases, it could also be the case that the input code is not efficient, or could be substantially better, and no improvement to the compiler/libraries are needed.
Each example needs to be taken on a case-by-case basis, and that's what people were considering when we got on the track of examining output code (with reference to input code).
If we don't discuss specific examples, the people who are capable of improvements won't have those on their lists.
I'm pretty sure that elmer and turboxray know what things they want to expend their efforts on, but these may or may not match what you need.
|
|
|
Post by elmer on Oct 9, 2021 21:06:48 GMT
Sorry to pop in late but I figured I'd add my own two cents to this whole soup. It's good to see you back and posting again, even if you choose to make this a flying-visit just because this particular subject interests you! I've done a few rather huge games in >99% HuC, using very little inline assembly. ... HuC may have its faults but it's a beautiful system to the user. Nobody is denying this, and if it works for you, then nobody is trying to stop you from using it. As I said when posting the example of the horrible assembly-language code that HuC produces for that line of C code ... the generated-code was fast enough that it didn't slow down DarkKobold's game, and so, does it really need to be better?
|
|
|
Post by elmer on Oct 9, 2021 21:13:44 GMT
Part of the process of "fixing" things like this, includes the identification of specific example cases where the output is excessively large/slow/broken (etc.). ... Each example needs to be taken on a case-by-case basis, and that's what people were considering when we got on the track of examining output code (with reference to input code). If we don't discuss specific examples, the people who are capable of improvements won't have those on their lists. Absolutely this! Looking at the specific piece of code that I posted, it's a perfect candidate for HuC's peehole-optimizer to make improvements to, but there are still some hard limits to the gains that you can make from that approach. If you construct a game in HuC and don’t it is too slow or uses too much memory, is cleanup of bulky ASM something a more ASM-oriented developer could reasonably assist the original developer with? I have found that good designers aren’t always good coders and vice versa. So it makes sense for a game designer to want to work with HuC. I am firmly of the opinion that the original programmer is best off learning those critical skills for themselves - it's a great feeling of accomplishment when it happens, and the ordeal leaves them a better programmer, with a better overall toolset. As an old-skool self-taught programmer, I absolutely agree with Dave's opinion here ... but I also acknowledge that there are lots of folks who don't share my personal-passions, and that the world is a more-interesting place because people have different interests and skills. Now, having said that ... I do not personally have the passion/patience to go around optimizing other people's HuC code, but there may well be folks that do.
|
|
|
Post by turboxray on Oct 9, 2021 21:32:04 GMT
Things like "good enough" might cut it.. but if you're right on the threshold.. then adding stuff like the HuTrack chiptune engine, sample playback, BG color #0 updates, actually doing more than 4 hsync scrolls, etc... that stuff isn't free. DK has already been doing some optimization prep work for HuTrack. We got a nice sprite color #0 macro updating for performance benchmarking in HuC, so that was nice to see the optimizations taking effect.
|
|
|
Post by elmer on Oct 10, 2021 0:43:30 GMT
I'm not sure what you wish to accomplish by putting an include inside a .proc/.endp, because from my POV, the power for creating optional libraries is going to come from putting procedures *within* include files, and then choosing which set of library includes to use in a particular project. OK, after some extra testing, it *looks* like this should work out pretty well (IMHO), because whoever added the .proc/.endp functionality into PCEAS was thinking ahead, and they also added the ability to nest multiple procedures inside a .procgroup/.endprocgroup, and then have the whole group relocated into the same bank as a single contiguous chunk of code. I don't think that has ever been used by HuC itself, but it is a critical capability needed by assembly-language code (i.e. HuC's libraries) so that you can have a number of different entry points to a piece of shared code. For instance, this is an example of how it might work for a small section of code in HuC/MagicKit's library.asm ... .procgroup
; ---- ; int _memcmp(char *dest [__di], char *src [__si], int count [acc]) ; ---- ; Compare memory ; ----
.proc _memcmp.3
eor #$ff sta <__temp txa eor #$ff tax cly .loop: inx beq .page .test: lda [__di],y cmp [__si],y bmi cmp_minus bne cmp_plus iny bne .loop inc <__si+1 inc <__di+1 bra .loop .page: inc <__temp bne .test ; bra cmp_same
.endp
cmp_same: ldx #$00 cla rts
cmp_plus: ldx #$01 cla rts
cmp_minus: ldx #$FF txa rts
; ---- ; int _strcmp(char *dest [__di], char *src [__si]) ; ---- ; Compare strings ; ----
.proc _strcmp.2 .endp
cly .loop: lda [__di],y cmp [__si],y bmi cmp_minus bne cmp_plus cmp #0 beq cmp_same iny bne .loop bra cmp_same
; ---- ; int _strncmp(char *dest [__di], char *src [__si], unsigned char count [acc]) ; ---- ; Compare strings ; ----
.proc _strncmp.3
txa eor #$ff tax cly .loop: inx beq cmp_same .test: lda [__di],y cmp [__si],y bmi cmp_minus bne cmp_plus cmp #0 beq cmp_same iny bne .loop bra cmp_same
.endp
.endprocgroup
Note that if you're defining procudures within a group, then the definition of _strcmp.2 shows that you don't even need to include any code within a procedure in order to declare the label as relocatable (and thus correctly fixed-up when executed by a call pseudo-op).
|
|
|
Post by elmer on Oct 10, 2021 1:31:12 GMT
Oh, one more thing ... since procedures aren't actually allocated any memory in the ROM/CD-ROM until the end of 1st-pass, it would *theoretically* be possible to throw away all unreferenced/unused procedures at that time, and so reduce the size of the ROM/CD-ROM. This should, I think, also apply to unused functions in C code. Now PCEAS does not support this right now, and there may be an unsolvable-problem in implementing it ... but it is an intriguing idea!
|
|
|
Post by elmer on Oct 11, 2021 0:39:30 GMT
I reverted back the getopt inclusion from Mooz's fork. Though it was more because it broke something in the command line args (that he wasn't using, so he didn't encounter it). I think he might have fixed it by now. I should have tested that further.. scratch that - there needs to be some test scripts so I wouldn't have to do manual regression testing hahah. Attach it to a pipeline/policy. I saw that you reverted the addition of getopt() back on May 22nd 2020 ... but then it all came back into the code again in with your "Checking updates" check-in on Aug 9th 2020, along with a duplicate set of the /doc, /examples & /include directory into a new tree /huc/doc, /huc/examples & /huc/include. I'm getting rather confused! What is this new duplicated set of files in the "huc/" tree for? Anyway, I found and fixed what was broken in Mooz's use of getopt_long_only() for parameter-passing, should I check-in that fix, or are you going to revert that stuff again? If you consider your HuC github repository to be "unstable" at the moment, then I can try to revert all of your changes from my repository ... but I was hoping to keep up-to-date with your changes so that we don't run the danger of drifting apart and fragmenting what little developer-base there is!
|
|
|
Post by turboxray on Oct 11, 2021 2:49:00 GMT
If you fixed the getopt then check in that fix - I won't revert it. I can't remember now, but I think like 'raw' and some other options weren't working. Yeah, I can't remember know. The gitignore has the wrong ignore directory, so that's why huc temp folder was showing up. I haven't really been worrying about any extra stuff because no one was even working on source except mooz. I've building huc with current changes for DK and it's working fine.
|
|
|
Post by elmer on Oct 11, 2021 3:14:00 GMT
If you fixed the getopt then check in that fix - I won't revert it. I can't remember now, but I think like 'raw' and some other options weren't working. Yeah, I can't remember know. The gitignore has the wrong ignore directory, so that's why huc temp folder was showing up. I haven't really been worrying about any extra stuff because no one was even working on source except mooz. I've building huc with current changes for DK and it's working fine. OK, thanks! The problem with getopt_long_only() was just an easy-to-make misunderstanding of the return code ... Mooz's code would only return the first long-option on the command line, because it was it was checking for a 0 as the end-of-options marker instead of -1. As for the extra huc directory, I'll ignore it for now and assume that you'll nuke it at some point soon. I still have some more testing to do before I sync any of my changes with github, but I *think* that I have just succeeded in persuading PCEAS to strip out unreferenced/unused procedures and groups ... although they do still currently appear in the listing.
|
|
|
Post by elmer on Oct 11, 2021 18:49:31 GMT
I still have some more testing to do before I sync any of my changes with github, but I *think* that I have just succeeded in persuading PCEAS to strip out unreferenced/unused procedures and groups ... although they do still currently appear in the listing. Yep, that seems to have worked, and I have checked-in everything into my github. PCEAS now has new -strip and -newproc options. Warning: I shortened the code trampolines for the -newproc option, and the new trampolines are not compatible with the current versions of HuC or MagicKit. For the technical audience, the old generated-code looks like ... tay tma5 pha lda #bank tam5 tya jsr banked-procedure-in-mpr5 tay pla tam5 tya rts ... and the new code is just ... tma6 pha lda #bank tam6 jmp banked-procedure-in-mpr6 The idea is that this saves both CPU cycles, and memory in MPR7, by making it the calling-party's responsibility to save the A register if it wants to, and by making it the procedure's responsibility to end with a " jmp exitproc" instead of an " rts". IMHO this makes more sense for assembly-language programmers moving forward, and it wouldn't be hard to change HuC and MagicKit to use this method if anyone wanted to while they were working on making the HuC libraries more modular. For HuC, my original thoughts had been towards a "linker" of sorts, which would exclude the unused library functions, and magically "compress"... but when you see how the paging is in place, that becomes a lot more challenging that you might think. Since 8KB can fill up quickly, one would also need to know in advance how large each of the functions is, in order to "fit" it into a bank... and all this stuff is possible by hand, but so much more difficult programmatically. From my perspective, I *think* that you've probably already got the capability to achieve 95% of what you want to do with the current state of the toolchain. Between Uli's addition of -l<libraryname> to the HuC command line, the ability to define .proc functions that get relocated by PCEAS at build time to maximize bank usage in ROM space, and now the ability to have PCEAS strip out unused procedures as well ... it looks like it should be possible to make some pretty modular libraries, and just keep maybe 2 banks or less of permanent fixed low level system-support code. What is your opinion? Is there something else needed that's missing from the tools?
|
|
|
Post by dshadoff on Oct 11, 2021 20:55:12 GMT
The reason for preserving A during the call was that HuC passed the first variable in A and X, in order to avoid the pseudo-stack for the common 1-parameter case.
But if you’re not concerned about HuC, this will certainly help other use cases.
|
|
|
Post by elmer on Oct 11, 2021 21:40:36 GMT
The reason for preserving A during the call was that HuC passed the first variable in A and X, in order to avoid the pseudo-stack for the common 1-parameter case. Sure, I know ... but in those cases, there is absolutely no reason why the HuC compiler can't be modified to put the tay in the calling code, and the tya in the destination procedure, and so move the space cost from out of the one bank with the trampolines and into the many banks with the HuC code. Then there are also those functions that don't take any parameters at all, or those fastcall functions that pass their parameters in the CD variables (__ax,__bx,__cx,__dx,__si,__di,__bp). Overall, I see changing the trampoline as a win, especially when moving the trampolines into the limited-sized first bank with the system-support code. Honestly, I thought that you'd be more concerned with changing from jsr/rts to jmp/jmp, which is something that is less of an obvious benefit ... 2 bytes and 6 cycles saved per procedure, but it does mean remembering to make that change to the rts in every assembly-language procedure.
|
|
|
Post by dshadoff on Oct 12, 2021 0:06:24 GMT
Oh, I hadn't noticed that about the JMP... I didn't look at the code closely. I assume that you have a common exit in the trampoline bank which goes something like this:
pla tam6 rts ...for each bank capable of paging like this.
I don't mind this... but... again this makes an assumption. The assumption is that each of the externalized functions in a given bank will not benefit from a direct-call (i.e. without paging that bank again), since the exit of the function would be a jump back to the trampoline bank to page it back out.
I haven't looked in a long time, but I'm reasonably confident that there are library functions where A calls B, and you would want to externalize both without taking a second page-out penalty when A calls B. Something along the lines of "do something to sprite <n>", or "do something to all sprites", or something like that.
I don't have specifics at the moment though.
|
|
|
Post by elmer on Oct 12, 2021 16:31:12 GMT
I assume that you have a common exit in the trampoline bank which goes something like this: Yes, that's right. I don't mind this... but... again this makes an assumption. The assumption is that each of the externalized functions in a given bank will not benefit from a direct-call (i.e. without paging that bank again), since the exit of the function would be a jump back to the trampoline bank to page it back out. Absolutely ... and that is why I'm less-convinced of the wisdom of that particular change, even though the memory-saving is nice. From the engine and library programmers POV, it's something that is easy to hide behind a macro, say " leave", and then be able to easily change it later on. Since using the " -newproc" option would require so much work to be done on HuC/MagicKit anyway, I think that there's plenty of time to come to some consensus as to the wisdom/need of the exact implementation.
|
|
|
Post by Galahad on Oct 31, 2021 2:41:06 GMT
Change is a good thing,I like what I read in this thread,the future looks bright for pce dev.
|
|