|
Post by DarkKobold on Sept 25, 2019 5:19:13 GMT
while (!pass) { pass=1; c1++; for (zic=1; zic<ztemp3; zic++) { zcc=zic-1; if (zy[zcc]<zy[zic]) { ztemp1=zy[zcc]; ztemp2=zl[zcc]; zy[zcc]=zy[zic]; zl[zcc]=zl[zic]; zy[zic]=ztemp1; zl[zic]=ztemp2; pass=0; } } }/**/ So, not surprisingly, a Z-sorting function (bubble sort, specifically) grows at O(n^2), and slows down my code when it gets too many objects on screen.
My options are - try and make it so the code is fast enough, such that 14^2 is never an issue, or figure out a better way to sort. The code isn't slow at 5-7 sorts (based on c1). once it hits 8-10, I find that it causes a ton of lag. I've tried a ton of things - pre-sorting into 4 bins, grabbing the minimum value one at a time and putting it into a new array... all of which are just too slow.
This is really stopping some cool things, so suggestions would be great!
|
|
|
Post by dshadoff on Sept 25, 2019 11:34:59 GMT
I am assuming that: - You are using HuC - ztemp3 is the size of your array - this looks like a bubble sort with early exit
...What are the datatypes of *every* variable ? -> int are about 3 times slower for any operation than char. -> if your arrays are holding 'int' variables but don't need to, please change.
I'm not sure what the purpose of c1 is...
Some hints: - each array reference uses calculations to arrive at the storage location. so for zy[zic], the address is calculated 3 times. I can't remember how it's implemented on HuC. but you might want to try assigning the value to a pointer so it doesn't need to be recalculated so much. - there are more efficient sorts than bubble sort - you could also move the function to assembler for a huge speedup. HuC was built to make loops over array *possible*, but it's incredibly hard to make them *efficient* without considering the limitations of the Hu6280/6502.
Back in the '70's, there were a lot of really good articles about algorithms written for efficiency. At that time, I recall reading a book which had a Shell-Metzner sort, which was significantly faster than bubble sort for larger sets, but larger code and not as easy to visualize what it was doing. There's lots of good stuff from those days which could help give the mindset of this architecture. Starting in the 90's, the push was for higher complexity rather than efficiency, so you might not find so much discussion of algorithms after that.
Also, just how large an array was your target ?
Given that bubble sort does ((array size) * (array size -1)) compare/swap operations: - 5 elements would be 20 operations - 7 elements would be 42 operations - 14 elements would be 182 operations ...which is nowhere near linear; it's closer to the square of the number of elements.
Dave
|
|
|
Post by DarkKobold on Sept 25, 2019 14:59:27 GMT
dshadoff - ASM is always the solution,unfortunately one I can't take. C1 was just my counter to tell me how many times the outer sort was required.
Interestingly, I sorta solved it after this topic - First, I switched to insertion sort. That actually did a really good job of reducing operations. Second, I realized that the z-sorting did not change that much frame to frame. So, I record which list-position each enemy/character/projectile was in, and then updated its Y value in the list. Sort the list, rerecord the new position, and repeat.
|
|
|
Post by turboxray on Oct 20, 2019 22:34:35 GMT
Insertion sort might be O(n^2) like bubble sort, but its actual run time is (n^2+n)/2 worst case compared to n^2 worse case of bubble sort. Definitely faster. dshadoff is right though. The some of those operations are expensive in huc. Setting zy[zcc] to a temp variable outside the for/loop would save some considerable amount of time (i.e. you're occurring that performance hit even on non-swaps). I.e. move ztemp2=zl[zcc]; and ztemp1=zl[zcc]; outside the for/loop.
|
|
|
Post by elmer on Oct 21, 2019 20:24:29 GMT
Back in the '70's, there were a lot of really good articles about algorithms written for efficiency. At that time, I recall reading a book which had a Shell-Metzner sort, which was significantly faster than bubble sort for larger sets, but larger code and not as easy to visualize what it was doing. There's lots of good stuff from those days which could help give the mindset of this architecture. Starting in the 90's, the push was for higher complexity rather than efficiency, so you might not find so much discussion of algorithms after that. Yep, starting in the 1990s the designers of new sorting algorithms were beginning to consider larger and larger datasets, multiple processors, and cache-coherency issues, which all lead to more and more complexity in the code itself, but faster end-results. For those of us still dealing with old single-processor architectures, the old algorithms are perfectly fine. Personally, I usually found that when a simple insertion-sort wasn't appropriate for the data, then a comb-sort was the next-best alternative. I've never actually needed to use a classic shell-sort in practice, or the complexity of a quick-sort. Then again, the games that I've worked on don't do a lot of sorting in realtime. Unfortunately Jim Veale's "An Improved Comb Sort with Pre-Defined Gap Table" page has disappeared from the interwebs (and it's not on the Wayback Machine), but you can still find his pre-defined gap table in a bunch of Open Source projects. For those that haven't seen it, here is the table, and the pseudo-code for the sort ... // ************************************************************************** // * N.B. Jim Veale's "An Improved Comb Sort with Pre-Defined Gap Table" * // * http://world.std.com/~jdveale/combsort.htm * // **************************************************************************
// // Jim Veale's pre-defined gap table for sorts of up to 100 million items. //
const int g_aCombSortGapTable [] = { 1, // 1.247330950103979 1, // 1.247330950103979 2, // 1.247330950103979 3, // 1.247330950103979 4, // 1.247330950103979 5, // 1.247330950103979 6, // 1.247330950103979 7, // 1.247330950103979 8, // 1.247330950103979 9, // * 9.40 1.5423 11, // 11.72 1.6818 13, // 14.62 1.7877 17, // 18.23 1.9284 22, // * 22.74 2.0284 26, // * 28.37 2.0865 34, // * 35.38 2.1680 43, // 44.13 2.2243 53, // 55.05 2.2696 67, // 68.67 2.3153 83, // 85.65 2.3529 103, // 106.83 2.3885 131, // 133.25 2.4235 163, // 166.21 2.4519 206, // * 207.32 2.4794 257, // 258.60 2.5031 317, // 322.56 2.5249 401, // 402.34 2.5476 499, // 501.85 2.5668 622, // * 625.97 2.5853 778, // * 780.79 2.6029 971, // 973.91 2.6193 1213, // 1214.78 2.6347 1511, // 1515.24 2.6492 1889, // 1890.00 2.6631 2357, // 2357.46 2.6761 2939, // 2940.53 2.6884 3662, // * 3667.82 2.7002 4567, // 4574.98 2.7115 5701, // 5706.52 2.7222 7114, // * 7117.92 2.7324 8871, // * 8878.40 2.7421 11071, // 11074.30 2.7514 13807, // 13813.32 2.7602 17229, // * 17229.78 2.7687 21491, // 21491.23 2.7767 26801, // 26806.68 2.7845 33427, // 33436.80 2.7919 41698, // * 41706.76 2.7990 52021, // 52022.13 2.8059 64879, // 64888.81 2.8124 80933, // 80937.83 2.8188 100948, // * 100956.25 2.8248 125921, // 125925.86 2.8307 157061, // 157071.22 2.8363 195919, // 195919.80 2.8417 244367, // 244376.83 2.8470 304813, // 304818.78 2.8520 380207, // 380209.90 2.8569 474241, // 474247.58 2.8617 591538, // * 591543.68 2.8662 737843, // 737850.74 2.8706 920333, // 920344.07 2.8749 1147969, // 1147973.64 2.8790 1431894, // * 1431903.05 2.8831 1786046, // * 1786056.99 2.8869 2227801, // 2227804.16 2.8907 2778799, // 2778809.08 2.8944 3466082, // * 3466094.57 2.8979 4323359, // 4323367.04 2.9014 5392669, // 5392669.51 2.9047 6726431, // 6726443.59 2.9080 8390093, // 8390101.27 2.9111 10465223, // 10465232.99 2.9142 13053598, // * 13053609.01 2.9172 16282159, // 16282170.52 2.9201 20309243, // 20309255.23 2.9230 25332355, // * 25332362.62 2.9257 31597823, // 31597839.94 2.9284 39412949, // 39412963.71 2.9311 49161004, // * 49161009.47 2.9336 61320041, // 61320048.65 2.9361 76486387, // 76486394.54 2.9386 95403829, // 95403847.17 2.9409
2147483647 };
// // We are sorting 'count' items in the 'data' array. // Note: all arrays have zero-based indexing. // // failSafe:; // gapidx = 0; // switchCnt = 0; // while (gapTab[gapidx] lt count) gapidx++; // do loop // gapidx--; // gapidx = max(gapidx, 0); // gap = gapTab[gapidx]; // switchCnt = 0; // // for (l=0; l lt count-gap; l++) // loop // if (data[l] gt data[l+gap]) // thendo // switchCnt++; // datT = data[l ]; // data[l ] = data[l+gap]; // data[l+gap] = datT; // end // end // end while (gapidx ne 0); // if (switchCnt) // then goto failSafe; //
|
|
|
Post by dshadoff on Oct 22, 2019 0:46:45 GMT
Along the same lines (although not necessarily related to sorting), 1970s-era books contained a lot of good information about algorithms and assembly-language programming... people might want to look at: - random issues of Byte magazine or Dr. Dobb's Journal, or anything PET or Apple II -related. - Donald Knuth's "The Art of Computer Programming" series
|
|
|
Post by Arkhan on Oct 22, 2019 5:07:09 GMT
This might be a good exercise in learning some assembly, as it's a brief loop and some array access. It's not very complicated. I'm half alert at the moment, but I think you can use an XOR swap to not need to mess with temps. It might turn out to be faster. en.wikipedia.org/wiki/XOR_swap_algorithm
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Oct 22, 2019 11:55:44 GMT
My bubble sort routine(if it can help):
#asm
dec <nb_entrees
.bubble_sort_dec:
clv ; // TURN EXCHANGE FLAG OFF (V = 0)
ldy <nb_entrees ; // FETCH ELEMENT COUNT
.nxtel:
lda _tab_entrees , Y ; // FETCH ELEMENT
cmp _tab_entrees - 1 , Y ; // IS IT LARGER THAN THE NEXT ELEMENT?
ble .chkend ; // YES. EXCHANGE ELEMENTS IN MEMORY
tax ; // BY SAVING LOW BYTE ON X REGISTER.
lda _tab_entrees - 1 , Y ; // THEN GET HIGH BYTE AND
sta _tab_entrees , Y
txa ; // PULL LOW BYTE FROM X
sta _tab_entrees - 1 , Y
; // TURN EXCHANGE FLAG ON (V = 1)
lda #%01000000
bit #%01000000
.chkend:
dey ; // END OF LIST?
bne .nxtel ; // NO. FETCH NEXT ELEMENT
; // YES. EXCHANGE FLAG STILL OFF?
; // NO. GO THROUGH LIST AGAIN
bvs .bubble_sort_dec
; // YES. LIST IS NOW ORDERED
rts
#endasm
|
|
|
Post by DarkKobold on Oct 22, 2019 21:29:11 GMT
What's interesting is, this is only a 14 element maximum list. So, 196 maximum, at the absolute worst. It was lagging around the 10 iteration (14*10 I guess?) bubble sort loop. Here's my insertion sort code:
for (zic = 1; zic < 14; zic++) { ztemp2 = zy[zic]; ztemp1 = zl[zic]; ztemp3 = zskip[zic]; zcc = zic; while ((zcc > 0) && (zy[zcc-1] < ztemp2)) { zcc--; zy[zcc + 1] = zy[zcc]; zl[zcc + 1] = zl[zcc]; zskip[zcc + 1] = zskip[zcc]; } zy[zcc] = ztemp2; zl[zcc] = ztemp1; zskip[zcc]=ztemp3; } Note the zskip variable I added is mostly for flicker effects - rather than resorting every time a sprite goes off screen, it just holds that sprite in it's previous list position, and skips drawing it. I find this has really good performance, now that I'm not resorting every frame.
That said, converting this to ASM would still help performance. I'd need help though. Teach a man to fish sorta thing.
|
|
touko
Punkic Cyborg
Posts: 106
|
Post by touko on Oct 23, 2019 8:51:06 GMT
I translated your code directly in asm(i assume that all variables are global .), so no optimisations .
#asm
ldx #1
.for_loop:
lda _zy , X sta _ztemp2
lda _zl , X sta _ztemp1
lda _zskip , X sta _ztemp3 txa tay
.while_loop:
lda _zy , Y sta _zy + 1 , Y
lda _zl , Y sta _zl + 1 , Y
lda _zskip , Y sta _zskip + 1 , Y
dey beq .end_while_loop
lda _zy - 1 , Y cmp _ztemp2 bcs .while_loop
.end_while_loop:
lda _ztemp2 sta _zy , Y
lda _ztemp1 sta _zl , Y
lda _ztemp3 sta _zskip , Y
inx cpx #14 bcc .for_loop
#endasm
|
|
|
Post by DarkKobold on Oct 23, 2019 17:35:01 GMT
I translated your code directly in asm(i assume that all variables are global .), so no optimisations . #asm
ldx #1
.for_loop:
lda _zy , X sta _ztemp2
lda _zl , X sta _ztemp1
lda _zskip , X sta _ztemp3 txa tay
.while_loop:
lda _zy , Y sta _zy + 1 , Y
lda _zl , Y sta _zl + 1 , Y
lda _zskip , Y sta _zskip + 1 , Y
dey beq .end_while_loop
lda _zy - 1 , Y cmp _ztemp2 bcs .while_loop
.end_while_loop:
lda _ztemp2 sta _zy , Y
lda _ztemp1 sta _zl , Y
lda _ztemp3 sta _zskip , Y
inx cpx #14 bcc .for_loop
#endasm
This is awesome - correct me if I'm wrong though, doesn't it need a branch condition before the while loop starts? Because that won't always be true? Or am I thinking about it wrong?
|
|
|
Post by Arkhan on Oct 23, 2019 18:01:28 GMT
I think you *might* be right, and also technically the for loop should check beforehand to behave like a for loop, but 6502 assembly doesn't really have the sophisticated loop/branching of C, and doesn't have the stuff Z80 has, so you have to do it all yourself like this.
If you know certain things, like the forloop always executing at least once because of the constraints, you can get away with not checking at the start, and other stuff like that.
|
|
|
Post by DarkKobold on Oct 24, 2019 0:55:43 GMT
#asm
ldx #1
.for_loop:
lda _zy , X sta _ztemp2
lda _zl , X sta _ztemp1
lda _zskip , X sta _ztemp3 txa tay
.while_loop:
lda _zy - 1 , Y cmp _ztemp2 bcs .end_while_loop lda _zy -1 , Y sta _zy , Y
lda _zl -1 , Y sta _zl , Y
lda _zskip -1 , Y sta _zskip , Y
dey bne .while_loop
.end_while_loop:
lda _ztemp2 sta _zy , Y
lda _ztemp1 sta _zl , Y
lda _ztemp3 sta _zskip , Y
inx cpx #14 bcc .for_loop
#endasm
I actually managed to fix the asm! Here's the updated Zsort function. Note that the decy is in the wrong place, so I had to also change the referencing in the code.
|
|
|
Post by Arkhan on Oct 24, 2019 2:29:14 GMT
Motherfucker doin him an asm.
|
|
|
Post by elmer on Oct 24, 2019 2:33:39 GMT
I actually managed to fix the asm! Here's the updated Zsort function. Note that the decy is in the wrong place, so I had to also change the referencing in the code. Congratulations! We'll bring you over to the Dark Side of assembly-language someday, young padwan!
|
|