On HuC and PCE development futures

dshadoff
ANIKIIII!

Implementing the systems of the '90s... since 1978

Posts: 1,241

On HuC and PCE development futures Oct 9, 2021 18:44:07 GMT

Quote

Post by dshadoff on Oct 9, 2021 18:44:07 GMT

Part of the process of "fixing" things like this, includes the identification of specific example cases where the output is excessively large/slow/broken (etc.).

In some cases, there are likely improvements which can be made.
In some cases, it would take an immense amount of effort to improve, because of the original compiler (or other foundational concepts) which it is based upon.
In other cases, it will not be able to be improved without removing its "general purpose" aspects.
In yet other cases, it could also be the case that the input code is not efficient, or could be substantially better, and no improvement to the compiler/libraries are needed.

Each example needs to be taken on a case-by-case basis, and that's what people were considering when we got on the track of examining output code (with reference to input code).

If we don't discuss specific examples, the people who are capable of improvements won't have those on their lists.

I'm pretty sure that elmer and turboxray know what things they want to expend their efforts on, but these may or may not match what you need.

Last Edit: Oct 9, 2021 18:45:19 GMT by dshadoff

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 9, 2021 21:06:48 GMT

Quote

Post by elmer on Oct 9, 2021 21:06:48 GMT

Oct 8, 2021 22:49:19 GMT TheOldRover said:

Sorry to pop in late but I figured I'd add my own two cents to this whole soup.

It's good to see you back and posting again, even if you choose to make this a flying-visit just because this particular subject interests you!

Oct 8, 2021 22:49:19 GMT TheOldRover said:

I've done a few rather huge games in >99% HuC, using very little inline assembly.
...
HuC may have its faults but it's a beautiful system to the user.

Nobody is denying this, and if it works for you, then nobody is trying to stop you from using it.

As I said when posting the example of the horrible assembly-language code that HuC produces for that line of C code ... the generated-code was fast enough that it didn't slow down DarkKobold's game, and so, does it really need to be better?

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 9, 2021 21:13:44 GMT

Quote

Post by elmer on Oct 9, 2021 21:13:44 GMT

Oct 9, 2021 18:44:07 GMT dshadoff said:

Part of the process of "fixing" things like this, includes the identification of specific example cases where the output is excessively large/slow/broken (etc.).
...
Each example needs to be taken on a case-by-case basis, and that's what people were considering when we got on the track of examining output code (with reference to input code).

If we don't discuss specific examples, the people who are capable of improvements won't have those on their lists.

Absolutely this!

Looking at the specific piece of code that I posted, it's a perfect candidate for HuC's peehole-optimizer to make improvements to, but there are still some hard limits to the gains that you can make from that approach.

Oct 9, 2021 0:32:40 GMT dshadoff said:

Oct 9, 2021 0:17:41 GMT spenoza said:

If you construct a game in HuC and don’t it is too slow or uses too much memory, is cleanup of bulky ASM something a more ASM-oriented developer could reasonably assist the original developer with? I have found that good designers aren’t always good coders and vice versa. So it makes sense for a game designer to want to work with HuC.

I am firmly of the opinion that the original programmer is best off learning those critical skills for themselves - it's a great feeling of accomplishment when it happens, and the ordeal leaves them a better programmer, with a better overall toolset.

As an old-skool self-taught programmer, I absolutely agree with Dave's opinion here ... but I also acknowledge that there are lots of folks who don't share my personal-passions, and that the world is a more-interesting place because people have different interests and skills.

Now, having said that ... I do not personally have the passion/patience to go around optimizing other people's HuC code, but there may well be folks that do.

Last Edit: Oct 9, 2021 21:31:16 GMT by elmer

turboxray
Lord of Thunder

Posts: 487

On HuC and PCE development futures Oct 9, 2021 21:32:04 GMT

Quote

Post by turboxray on Oct 9, 2021 21:32:04 GMT

Things like "good enough" might cut it.. but if you're right on the threshold.. then adding stuff like the HuTrack chiptune engine, sample playback, BG color #0 updates, actually doing more than 4 hsync scrolls, etc... that stuff isn't free. DK has already been doing some optimization prep work for HuTrack. We got a nice sprite color #0 macro updating for performance benchmarking in HuC, so that was nice to see the optimizations taking effect.

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 10, 2021 0:43:30 GMT

Quote

Post by elmer on Oct 10, 2021 0:43:30 GMT

Oct 8, 2021 19:54:48 GMT elmer said:

I'm not sure what you wish to accomplish by putting an include inside a .proc/.endp, because from my POV, the power for creating optional libraries is going to come from putting procedures *within* include files, and then choosing which set of library includes to use in a particular project.

OK, after some extra testing, it *looks* like this should work out pretty well (IMHO), because whoever added the .proc/.endp functionality into PCEAS was thinking ahead, and they also added the ability to nest multiple procedures inside a .procgroup/.endprocgroup, and then have the whole group relocated into the same bank as a single contiguous chunk of code.

I don't think that has ever been used by HuC itself, but it is a critical capability needed by assembly-language code (i.e. HuC's libraries) so that you can have a number of different entry points to a piece of shared code.

For instance, this is an example of how it might work for a small section of code in HuC/MagicKit's library.asm ...

                .procgroup

; ----
; int _memcmp(char *dest [__di], char *src [__si], int count [acc])
; ----
; Compare memory
; ----

                .proc   _memcmp.3

                eor     #$ff
                sta     <__temp
                txa
                eor     #$ff
                tax
                cly
.loop:          inx
                beq     .page
.test:          lda     [__di],y
                cmp     [__si],y
                bmi     cmp_minus
                bne     cmp_plus
                iny
                bne     .loop
                inc     <__si+1
                inc     <__di+1
                bra     .loop
.page:          inc     <__temp
                bne     .test
;               bra     cmp_same

                .endp

cmp_same:       ldx     #$00
                cla
                rts

cmp_plus:       ldx     #$01
                cla
                rts

cmp_minus:      ldx     #$FF
                txa
                rts


; ----
; int _strcmp(char *dest [__di], char *src [__si])
; ----
; Compare strings
; ----

                .proc   _strcmp.2
                .endp

                cly
.loop:          lda     [__di],y
                cmp     [__si],y
                bmi     cmp_minus
                bne     cmp_plus
                cmp     #0
                beq     cmp_same
                iny
                bne     .loop
                bra     cmp_same


; ----
; int _strncmp(char *dest [__di], char *src [__si], unsigned char count [acc])
; ----
; Compare strings
; ----

                .proc   _strncmp.3

                txa
                eor     #$ff
                tax
                cly
.loop:          inx
                beq     cmp_same
.test:          lda     [__di],y
                cmp     [__si],y
                bmi     cmp_minus
                bne     cmp_plus
                cmp     #0
                beq     cmp_same
                iny
                bne     .loop
                bra     cmp_same

                .endp

                .endprocgroup

Note that if you're defining procudures within a group, then the definition of _strcmp.2 shows that you don't even need to include any code within a procedure in order to declare the label as relocatable (and thus correctly fixed-up when executed by a call pseudo-op).

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 10, 2021 1:31:12 GMT

Quote

Post by elmer on Oct 10, 2021 1:31:12 GMT

Oh, one more thing ... since procedures aren't actually allocated any memory in the ROM/CD-ROM until the end of 1st-pass, it would *theoretically* be possible to throw away all unreferenced/unused procedures at that time, and so reduce the size of the ROM/CD-ROM. This should, I think, also apply to unused functions in C code.

Now PCEAS does not support this right now, and there may be an unsolvable-problem in implementing it ... but it is an intriguing idea!

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 11, 2021 0:39:30 GMT

Quote

Post by elmer on Oct 11, 2021 0:39:30 GMT

Sept 29, 2021 18:41:31 GMT turboxray said:

I reverted back the getopt inclusion from Mooz's fork. Though it was more because it broke something in the command line args (that he wasn't using, so he didn't encounter it). I think he might have fixed it by now. I should have tested that further.. scratch that - there needs to be some test scripts so I wouldn't have to do manual regression testing hahah. Attach it to a pipeline/policy.

I saw that you reverted the addition of getopt() back on May 22nd 2020 ... but then it all came back into the code again in with your "Checking updates" check-in on Aug 9th 2020, along with a duplicate set of the /doc, /examples & /include directory into a new tree /huc/doc, /huc/examples & /huc/include.

I'm getting rather confused! What is this new duplicated set of files in the "huc/" tree for?

Anyway, I found and fixed what was broken in Mooz's use of getopt_long_only() for parameter-passing, should I check-in that fix, or are you going to revert that stuff again?

If you consider your HuC github repository to be "unstable" at the moment, then I can try to revert all of your changes from my repository ... but I was hoping to keep up-to-date with your changes so that we don't run the danger of drifting apart and fragmenting what little developer-base there is!

Last Edit: Oct 11, 2021 1:02:22 GMT by elmer

turboxray
Lord of Thunder

Posts: 487

On HuC and PCE development futures Oct 11, 2021 2:49:00 GMT

Quote

Post by turboxray on Oct 11, 2021 2:49:00 GMT

If you fixed the getopt then check in that fix - I won't revert it. I can't remember now, but I think like 'raw' and some other options weren't working. Yeah, I can't remember know. The gitignore has the wrong ignore directory, so that's why huc temp folder was showing up. I haven't really been worrying about any extra stuff because no one was even working on source except mooz. I've building huc with current changes for DK and it's working fine.

Last Edit: Oct 11, 2021 2:49:21 GMT by turboxray

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 11, 2021 3:14:00 GMT

Quote

Post by elmer on Oct 11, 2021 3:14:00 GMT

Oct 11, 2021 2:49:00 GMT turboxray said:

If you fixed the getopt then check in that fix - I won't revert it. I can't remember now, but I think like 'raw' and some other options weren't working. Yeah, I can't remember know. The gitignore has the wrong ignore directory, so that's why huc temp folder was showing up. I haven't really been worrying about any extra stuff because no one was even working on source except mooz. I've building huc with current changes for DK and it's working fine.

OK, thanks!

The problem with getopt_long_only() was just an easy-to-make misunderstanding of the return code ... Mooz's code would only return the first long-option on the command line, because it was it was checking for a 0 as the end-of-options marker instead of -1.

As for the extra huc directory, I'll ignore it for now and assume that you'll nuke it at some point soon.

I still have some more testing to do before I sync any of my changes with github, but I *think* that I have just succeeded in persuading PCEAS to strip out unreferenced/unused procedures and groups ... although they do still currently appear in the listing.

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 11, 2021 18:49:31 GMT

Quote

Post by elmer on Oct 11, 2021 18:49:31 GMT

Oct 11, 2021 3:14:00 GMT elmer said:

I still have some more testing to do before I sync any of my changes with github, but I *think* that I have just succeeded in persuading PCEAS to strip out unreferenced/unused procedures and groups ... although they do still currently appear in the listing.

Yep, that seems to have worked, and I have checked-in everything into my github.

PCEAS now has new -strip and -newproc options.

Warning: I shortened the code trampolines for the -newproc option, and the new trampolines are not compatible with the current versions of HuC or MagicKit.

For the technical audience, the old generated-code looks like ...

tay
tma5
pha
lda #bank
tam5
tya
jsr banked-procedure-in-mpr5
tay
pla
tam5
tya
rts

... and the new code is just ...

tma6
pha
lda #bank
tam6
jmp banked-procedure-in-mpr6

The idea is that this saves both CPU cycles, and memory in MPR7, by making it the calling-party's responsibility to save the A register if it wants to, and by making it the procedure's responsibility to end with a "jmp exitproc" instead of an "rts".

IMHO this makes more sense for assembly-language programmers moving forward, and it wouldn't be hard to change HuC and MagicKit to use this method if anyone wanted to while they were working on making the HuC libraries more modular.

Sept 29, 2021 2:47:30 GMT dshadoff said:

For HuC, my original thoughts had been towards a "linker" of sorts, which would exclude the unused library functions, and magically "compress"... but when you see how the paging is in place, that becomes a lot more challenging that you might think. Since 8KB can fill up quickly, one would also need to know in advance how large each of the functions is, in order to "fit" it into a bank... and all this stuff is possible by hand, but so much more difficult programmatically.

From my perspective, I *think* that you've probably already got the capability to achieve 95% of what you want to do with the current state of the toolchain.

Between Uli's addition of -l<libraryname> to the HuC command line, the ability to define .proc functions that get relocated by PCEAS at build time to maximize bank usage in ROM space, and now the ability to have PCEAS strip out unused procedures as well ... it looks like it should be possible to make some pretty modular libraries, and just keep maybe 2 banks or less of permanent fixed low level system-support code.

What is your opinion? Is there something else needed that's missing from the tools?

Last Edit: Oct 11, 2021 18:50:37 GMT by elmer

dshadoff
ANIKIIII!

Implementing the systems of the '90s... since 1978

Posts: 1,241

On HuC and PCE development futures Oct 11, 2021 20:55:12 GMT

Quote

Post by dshadoff on Oct 11, 2021 20:55:12 GMT

The reason for preserving A during the call was that HuC passed the first variable in A and X, in order to avoid the pseudo-stack for the common 1-parameter case.

But if you’re not concerned about HuC, this will certainly help other use cases.

Last Edit: Oct 11, 2021 20:55:23 GMT by dshadoff

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 11, 2021 21:40:36 GMT

Quote

Post by elmer on Oct 11, 2021 21:40:36 GMT

Oct 11, 2021 20:55:12 GMT dshadoff said:

The reason for preserving A during the call was that HuC passed the first variable in A and X, in order to avoid the pseudo-stack for the common 1-parameter case.

Sure, I know ... but in those cases, there is absolutely no reason why the HuC compiler can't be modified to put the tay in the calling code, and the tya in the destination procedure, and so move the space cost from out of the one bank with the trampolines and into the many banks with the HuC code.

Then there are also those functions that don't take any parameters at all, or those fastcall functions that pass their parameters in the CD variables (__ax,__bx,__cx,__dx,__si,__di,__bp).

Overall, I see changing the trampoline as a win, especially when moving the trampolines into the limited-sized first bank with the system-support code.

Honestly, I thought that you'd be more concerned with changing from jsr/rts to jmp/jmp, which is something that is less of an obvious benefit ... 2 bytes and 6 cycles saved per procedure, but it does mean remembering to make that change to the rts in every assembly-language procedure.

Last Edit: Oct 11, 2021 21:45:33 GMT by elmer

dshadoff
ANIKIIII!

Implementing the systems of the '90s... since 1978

Posts: 1,241

On HuC and PCE development futures Oct 12, 2021 0:06:24 GMT

Quote

Post by dshadoff on Oct 12, 2021 0:06:24 GMT

Oh, I hadn't noticed that about the JMP... I didn't look at the code closely.
I assume that you have a common exit in the trampoline bank which goes something like this:

pla
tam6
rts

...for each bank capable of paging like this.

I don't mind this... but... again this makes an assumption. The assumption is that each of the externalized functions in a given bank will not benefit from a direct-call (i.e. without paging that bank again), since the exit of the function would be a jump back to the trampoline bank to page it back out.

I haven't looked in a long time, but I'm reasonably confident that there are library functions where A calls B, and you would want to externalize both without taking a second page-out penalty when A calls B. Something along the lines of "do something to sprite <n>", or "do something to all sprites", or something like that.

I don't have specifics at the moment though.

elmer
ANIKIIII!

Posts: 1,040

On HuC and PCE development futures Oct 12, 2021 16:31:12 GMT

Quote

Post by elmer on Oct 12, 2021 16:31:12 GMT

Oct 12, 2021 0:06:24 GMT dshadoff said:

I assume that you have a common exit in the trampoline bank which goes something like this:

Yes, that's right.

Oct 12, 2021 0:06:24 GMT dshadoff said:

I don't mind this... but... again this makes an assumption. The assumption is that each of the externalized functions in a given bank will not benefit from a direct-call (i.e. without paging that bank again), since the exit of the function would be a jump back to the trampoline bank to page it back out.

Absolutely ... and that is why I'm less-convinced of the wisdom of that particular change, even though the memory-saving is nice.

From the engine and library programmers POV, it's something that is easy to hide behind a macro, say "leave", and then be able to easily change it later on.

Since using the "-newproc" option would require so much work to be done on HuC/MagicKit anyway, I think that there's plenty of time to come to some consensus as to the wisdom/need of the exact implementation.

Galahad Sapphirical Official Warning Posts: 753 Homebrew skills: 6502,6809,68K,Z80,C,Lua Currently Playing: Turtle World Of Warcraft	On HuC and PCE development futures Oct 31, 2021 2:41:06 GMT Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Galahad on Oct 31, 2021 2:41:06 GMT Change is a good thing,I like what I read in this thread,the future looks bright for pce dev.
	Last Edit: Oct 31, 2021 3:04:37 GMT by Galahad