Sorting Failures

DarkKobold
Lord of Thunder

Posts: 297
Homebrew skills: Coding

Sorting Failures Sept 25, 2019 5:19:13 GMT

Quote

Post by DarkKobold on Sept 25, 2019 5:19:13 GMT

while (!pass)
	{
		pass=1; 
		c1++; 
		for (zic=1; zic<ztemp3; zic++)
		{
			zcc=zic-1; 
			if (zy[zcc]<zy[zic])
			{
				ztemp1=zy[zcc];
				ztemp2=zl[zcc];
				zy[zcc]=zy[zic];
				zl[zcc]=zl[zic];
				zy[zic]=ztemp1;
				zl[zic]=ztemp2;
				pass=0; 
			}
		}		
	}/**/

So, not surprisingly, a Z-sorting function (bubble sort, specifically) grows at O(n^2), and slows down my code when it gets too many objects on screen.

My options are - try and make it so the code is fast enough, such that 14^2 is never an issue, or figure out a better way to sort. The code isn't slow at 5-7 sorts (based on c1). once it hits 8-10, I find that it causes a ton of lag. I've tried a ton of things - pre-sorting into 4 bins, grabbing the minimum value one at a time and putting it into a new array... all of which are just too slow.

This is really stopping some cool things, so suggestions would be great!

dshadoff
ANIKIIII!

Implementing the systems of the '90s... since 1978

Posts: 1,238

Sorting Failures Sept 25, 2019 11:34:59 GMT

Quote

Post by dshadoff on Sept 25, 2019 11:34:59 GMT

I am assuming that:
- You are using HuC
- ztemp3 is the size of your array
- this looks like a bubble sort with early exit

...What are the datatypes of *every* variable ?
-> int are about 3 times slower for any operation than char.
-> if your arrays are holding 'int' variables but don't need to, please change.

I'm not sure what the purpose of c1 is...

Some hints:
- each array reference uses calculations to arrive at the storage location. so for zy[zic], the address is calculated 3 times. I can't remember how it's implemented on HuC. but you might want to try assigning the value to a pointer so it doesn't need to be recalculated so much.
- there are more efficient sorts than bubble sort
- you could also move the function to assembler for a huge speedup. HuC was built to make loops over array *possible*, but it's incredibly hard to make them *efficient* without considering the limitations of the Hu6280/6502.

Back in the '70's, there were a lot of really good articles about algorithms written for efficiency. At that time, I recall reading a book which had a Shell-Metzner sort, which was significantly faster than bubble sort for larger sets, but larger code and not as easy to visualize what it was doing. There's lots of good stuff from those days which could help give the mindset of this architecture. Starting in the 90's, the push was for higher complexity rather than efficiency, so you might not find so much discussion of algorithms after that.

Also, just how large an array was your target ?

Given that bubble sort does ((array size) * (array size -1)) compare/swap operations:
- 5 elements would be 20 operations
- 7 elements would be 42 operations
- 14 elements would be 182 operations
...which is nowhere near linear; it's closer to the square of the number of elements.

Dave

Last Edit: Sept 25, 2019 12:27:44 GMT by dshadoff

DarkKobold
Lord of Thunder

Posts: 297
Homebrew skills: Coding

Sorting Failures Sept 25, 2019 14:59:27 GMT

Quote

Post by DarkKobold on Sept 25, 2019 14:59:27 GMT

dshadoff - ASM is always the solution,unfortunately one I can't take. C1 was just my counter to tell me how many times the outer sort was required.

Interestingly, I sorta solved it after this topic - First, I switched to insertion sort. That actually did a really good job of reducing operations. Second, I realized that the z-sorting did not change that much frame to frame. So, I record which list-position each enemy/character/projectile was in, and then updated its Y value in the list. Sort the list, rerecord the new position, and repeat.

turboxray
Lord of Thunder

Posts: 487

Sorting Failures Oct 20, 2019 22:34:35 GMT gredler and elmer like this

Quote

Post by turboxray on Oct 20, 2019 22:34:35 GMT

Insertion sort might be O(n^2) like bubble sort, but its actual run time is (n^2+n)/2 worst case compared to n^2 worse case of bubble sort. Definitely faster. dshadoff is right though. The some of those operations are expensive in huc. Setting zy[zcc] to a temp variable outside the for/loop would save some considerable amount of time (i.e. you're occurring that performance hit even on non-swaps). I.e. move ztemp2=zl[zcc]; and ztemp1=zl[zcc]; outside the for/loop.

elmer
ANIKIIII!

Posts: 1,040

Sorting Failures Oct 21, 2019 20:24:29 GMT

Quote

Post by elmer on Oct 21, 2019 20:24:29 GMT

Sept 25, 2019 11:34:59 GMT dshadoff said:

Back in the '70's, there were a lot of really good articles about algorithms written for efficiency. At that time, I recall reading a book which had a Shell-Metzner sort, which was significantly faster than bubble sort for larger sets, but larger code and not as easy to visualize what it was doing. There's lots of good stuff from those days which could help give the mindset of this architecture. Starting in the 90's, the push was for higher complexity rather than efficiency, so you might not find so much discussion of algorithms after that.

Yep, starting in the 1990s the designers of new sorting algorithms were beginning to consider larger and larger datasets, multiple processors, and cache-coherency issues, which all lead to more and more complexity in the code itself, but faster end-results.

For those of us still dealing with old single-processor architectures, the old algorithms are perfectly fine.

Personally, I usually found that when a simple insertion-sort wasn't appropriate for the data, then a comb-sort was the next-best alternative. I've never actually needed to use a classic shell-sort in practice, or the complexity of a quick-sort.

Then again, the games that I've worked on don't do a lot of sorting in realtime.

Unfortunately Jim Veale's "An Improved Comb Sort with Pre-Defined Gap Table" page has disappeared from the interwebs (and it's not on the Wayback Machine), but you can still find his pre-defined gap table in a bunch of Open Source projects.

For those that haven't seen it, here is the table, and the pseudo-code for the sort ...

// **************************************************************************
// * N.B.    Jim Veale's "An Improved Comb Sort with Pre-Defined Gap Table" *
// *         http://world.std.com/~jdveale/combsort.htm                     *
// **************************************************************************

//
// Jim Veale's pre-defined gap table for sorts of up to 100 million items.
//

const int g_aCombSortGapTable [] =
    {
            1,  // 1.247330950103979
            1,  // 1.247330950103979
            2,  // 1.247330950103979
            3,  // 1.247330950103979
            4,  // 1.247330950103979
            5,  // 1.247330950103979
            6,  // 1.247330950103979
            7,  // 1.247330950103979
            8,  // 1.247330950103979
            9,  // *        9.40  1.5423
           11,  //         11.72  1.6818
           13,  //         14.62  1.7877
           17,  //         18.23  1.9284
           22,  // *       22.74  2.0284
           26,  // *       28.37  2.0865
           34,  // *       35.38  2.1680
           43,  //         44.13  2.2243
           53,  //         55.05  2.2696
           67,  //         68.67  2.3153
           83,  //         85.65  2.3529
          103,  //        106.83  2.3885
          131,  //        133.25  2.4235
          163,  //        166.21  2.4519
          206,  // *      207.32  2.4794
          257,  //        258.60  2.5031
          317,  //        322.56  2.5249
          401,  //        402.34  2.5476
          499,  //        501.85  2.5668
          622,  // *      625.97  2.5853
          778,  // *      780.79  2.6029
          971,  //        973.91  2.6193
         1213,  //       1214.78  2.6347
         1511,  //       1515.24  2.6492
         1889,  //       1890.00  2.6631
         2357,  //       2357.46  2.6761
         2939,  //       2940.53  2.6884
         3662,  // *     3667.82  2.7002
         4567,  //       4574.98  2.7115
         5701,  //       5706.52  2.7222
         7114,  // *     7117.92  2.7324
         8871,  // *     8878.40  2.7421
        11071,  //      11074.30  2.7514
        13807,  //      13813.32  2.7602
        17229,  // *    17229.78  2.7687
        21491,  //      21491.23  2.7767
        26801,  //      26806.68  2.7845
        33427,  //      33436.80  2.7919
        41698,  // *    41706.76  2.7990
        52021,  //      52022.13  2.8059
        64879,  //      64888.81  2.8124
        80933,  //      80937.83  2.8188
       100948,  // *   100956.25  2.8248
       125921,  //     125925.86  2.8307
       157061,  //     157071.22  2.8363
       195919,  //     195919.80  2.8417
       244367,  //     244376.83  2.8470
       304813,  //     304818.78  2.8520
       380207,  //     380209.90  2.8569
       474241,  //     474247.58  2.8617
       591538,  // *   591543.68  2.8662
       737843,  //     737850.74  2.8706
       920333,  //     920344.07  2.8749
      1147969,  //    1147973.64  2.8790
      1431894,  // *  1431903.05  2.8831
      1786046,  // *  1786056.99  2.8869
      2227801,  //    2227804.16  2.8907
      2778799,  //    2778809.08  2.8944
      3466082,  // *  3466094.57  2.8979
      4323359,  //    4323367.04  2.9014
      5392669,  //    5392669.51  2.9047
      6726431,  //    6726443.59  2.9080
      8390093,  //    8390101.27  2.9111
     10465223,  //   10465232.99  2.9142
     13053598,  // * 13053609.01  2.9172
     16282159,  //   16282170.52  2.9201
     20309243,  //   20309255.23  2.9230
     25332355,  // * 25332362.62  2.9257
     31597823,  //   31597839.94  2.9284
     39412949,  //   39412963.71  2.9311
     49161004,  // * 49161009.47  2.9336
     61320041,  //   61320048.65  2.9361
     76486387,  //   76486394.54  2.9386
     95403829,  //   95403847.17  2.9409

    2147483647
    };

//
// We are sorting 'count' items in the 'data' array.
// Note: all arrays have zero-based indexing.
//
// failSafe:;
//     gapidx    = 0;
//     switchCnt = 0;
//     while (gapTab[gapidx] lt count) gapidx++;
//     do loop
//         gapidx--;
//         gapidx    = max(gapidx, 0);
//         gap       = gapTab[gapidx];
//         switchCnt = 0;
//
//         for (l=0; l lt count-gap; l++)
//         loop
//             if  (data[l]  gt  data[l+gap])
//             thendo
//                 switchCnt++;
//                 datT        = data[l    ];
//                 data[l    ] = data[l+gap];
//                 data[l+gap] = datT;
//                 end
//             end
//         end while (gapidx ne 0);
//     if (switchCnt)
//     then goto failSafe;
//

dshadoff
ANIKIIII!

Implementing the systems of the '90s... since 1978

Posts: 1,238

Sorting Failures Oct 22, 2019 0:46:45 GMT

Quote

Post by dshadoff on Oct 22, 2019 0:46:45 GMT

Along the same lines (although not necessarily related to sorting), 1970s-era books contained a lot of good information about algorithms and assembly-language programming... people might want to look at:
- random issues of Byte magazine or Dr. Dobb's Journal, or anything PET or Apple II -related.
- Donald Knuth's "The Art of Computer Programming" series

Arkhan
Lord of Thunder

Shut your binch ass up

Posts: 293
Homebrew skills: MML, Shitty Atari Ports, Incessant Negativity
Fave PCE Shooter: Go Go Birdie Chance
Fave PCE Platformer: Necromancer
Fave PCE Game Overall: Super Metroid
Fave PCE RPG: Steam Hearts
Currently Playing: with myself

Sorting Failures Oct 22, 2019 5:07:09 GMT

Quote

Post by Arkhan on Oct 22, 2019 5:07:09 GMT

This might be a good exercise in learning some assembly, as it's a brief loop and some array access. It's not very complicated.

I'm half alert at the moment, but I think you can use an XOR swap to not need to mess with temps. It might turn out to be faster.

en.wikipedia.org/wiki/XOR_swap_algorithm

[Fri 19:34]<nectarsis> been wanting to try that one for awhile now Ope
[Fri 19:33]<Opethian> l;ol huge dong

Support Homebrew (or dont):

I'm a max level Forum Warrior. I'm immortal.
If you're not ready to defend your claims, don't post em

touko
Punkic Cyborg

Posts: 106

Sorting Failures Oct 22, 2019 11:55:44 GMT

Quote

Post by touko on Oct 22, 2019 11:55:44 GMT

My bubble sort routine(if it can help):

#asm

	dec		<nb_entrees		

  .bubble_sort_dec:	

	clv					; // TURN EXCHANGE FLAG OFF (V = 0)							

	ldy 	<nb_entrees			; // FETCH ELEMENT COUNT	

		
  .nxtel:    

	lda 	_tab_entrees , Y   		; // FETCH ELEMENT		

	cmp 	_tab_entrees - 1 , Y  		; // IS IT LARGER THAN THE NEXT ELEMENT?

 	ble	.chkend				; // YES. EXCHANGE ELEMENTS IN MEMORY

	tax     				; // BY SAVING LOW BYTE ON X REGISTER.

	lda 	_tab_entrees - 1 , Y  		; // THEN GET HIGH BYTE AND		

	sta 	_tab_entrees , Y   

	txa 					; // PULL LOW BYTE FROM X	

	sta 	_tab_entrees - 1 , Y

	; // TURN EXCHANGE FLAG ON (V = 1)	

	lda		#%01000000

	bit		#%01000000				

  .chkend:		

	dey 					; // END OF LIST?

	bne 	.nxtel 				; // NO. FETCH NEXT ELEMENT
		

	; // YES. EXCHANGE FLAG STILL OFF?

	; // NO. GO THROUGH LIST AGAIN		

	bvs		.bubble_sort_dec
		

	; // YES. LIST IS NOW ORDERED			 

	rts           									

#endasm

Last Edit: Oct 23, 2019 16:27:57 GMT by touko

DarkKobold
Lord of Thunder

Posts: 297
Homebrew skills: Coding

Sorting Failures Oct 22, 2019 21:29:11 GMT

Quote

Post by DarkKobold on Oct 22, 2019 21:29:11 GMT

What's interesting is, this is only a 14 element maximum list. So, 196 maximum, at the absolute worst. It was lagging around the 10 iteration (14*10 I guess?) bubble sort loop.

Here's my insertion sort code:

 for (zic = 1; zic < 14; zic++) 
    {  
        ztemp2 = zy[zic];  
        ztemp1 = zl[zic]; 
        ztemp3 = zskip[zic]; 
        zcc = zic;            
        while ((zcc > 0) && (zy[zcc-1] < ztemp2))
        {  
	        zcc--;  
            zy[zcc + 1] = zy[zcc];  
            zl[zcc + 1] = zl[zcc];  
            zskip[zcc + 1] = zskip[zcc];            
        }  
        zy[zcc] = ztemp2;  
        zl[zcc] = ztemp1;  
        zskip[zcc]=ztemp3; 
    }

Note the zskip variable I added is mostly for flicker effects - rather than resorting every time a sprite goes off screen, it just holds that sprite in it's previous list position, and skips drawing it. I find this has really good performance, now that I'm not resorting every frame.

That said, converting this to ASM would still help performance. I'd need help though. Teach a man to fish sorta thing.

touko
Punkic Cyborg

Posts: 106

Sorting Failures Oct 23, 2019 8:51:06 GMT gredler and elmer like this

Quote

Post by touko on Oct 23, 2019 8:51:06 GMT

I translated your code directly in asm(i assume that all variables are global .), so no optimisations .

#asm

		ldx		#1	

	.for_loop:

		lda		_zy , X
		sta		_ztemp2


		lda		_zl , X
		sta		_ztemp1


		lda		_zskip , X 
		sta		_ztemp3
		
		txa
		tay

	.while_loop:

		lda		_zy , Y
		sta		_zy + 1 , Y

		lda		_zl , Y
		sta		_zl + 1 , Y

		lda		_zskip , Y
		sta		_zskip + 1 , Y

		dey
		beq		.end_while_loop

		lda		_zy - 1 , Y
		cmp		_ztemp2
		bcs		.while_loop

	.end_while_loop:

		lda		_ztemp2
		sta		_zy , Y

		lda		_ztemp1
        	sta		_zl , Y

		lda		_ztemp3
		sta		_zskip , Y

		inx
		cpx		#14
		bcc		.for_loop		

#endasm

Last Edit: Oct 23, 2019 9:02:18 GMT by touko

DarkKobold
Lord of Thunder

Posts: 297
Homebrew skills: Coding

Sorting Failures Oct 23, 2019 17:35:01 GMT

Quote

Post by DarkKobold on Oct 23, 2019 17:35:01 GMT

Oct 23, 2019 8:51:06 GMT touko said:

I translated your code directly in asm(i assume that all variables are global .), so no optimisations .

#asm

		ldx		#1	

	.for_loop:

		lda		_zy , X
		sta		_ztemp2


		lda		_zl , X
		sta		_ztemp1


		lda		_zskip , X 
		sta		_ztemp3
		
		txa
		tay

	.while_loop:

		lda		_zy , Y
		sta		_zy + 1 , Y

		lda		_zl , Y
		sta		_zl + 1 , Y

		lda		_zskip , Y
		sta		_zskip + 1 , Y

		dey
		beq		.end_while_loop

		lda		_zy - 1 , Y
		cmp		_ztemp2
		bcs		.while_loop

	.end_while_loop:

		lda		_ztemp2
		sta		_zy , Y

		lda		_ztemp1
        	sta		_zl , Y

		lda		_ztemp3
		sta		_zskip , Y

		inx
		cpx		#14
		bcc		.for_loop		

#endasm

This is awesome - correct me if I'm wrong though, doesn't it need a branch condition before the while loop starts? Because that won't always be true? Or am I thinking about it wrong?

Arkhan
Lord of Thunder

Shut your binch ass up

Posts: 293
Homebrew skills: MML, Shitty Atari Ports, Incessant Negativity
Fave PCE Shooter: Go Go Birdie Chance
Fave PCE Platformer: Necromancer
Fave PCE Game Overall: Super Metroid
Fave PCE RPG: Steam Hearts
Currently Playing: with myself

Sorting Failures Oct 23, 2019 18:01:28 GMT

Quote

Post by Arkhan on Oct 23, 2019 18:01:28 GMT

I think you *might* be right, and also technically the for loop should check beforehand to behave like a for loop, but 6502 assembly doesn't really have the sophisticated loop/branching of C, and doesn't have the stuff Z80 has, so you have to do it all yourself like this.

If you know certain things, like the forloop always executing at least once because of the constraints, you can get away with not checking at the start, and other stuff like that.

[Fri 19:34]<nectarsis> been wanting to try that one for awhile now Ope
[Fri 19:33]<Opethian> l;ol huge dong

Support Homebrew (or dont):

I'm a max level Forum Warrior. I'm immortal.
If you're not ready to defend your claims, don't post em

DarkKobold
Lord of Thunder

Posts: 297
Homebrew skills: Coding

Sorting Failures Oct 24, 2019 0:55:43 GMT gredler likes this

Quote

Post by DarkKobold on Oct 24, 2019 0:55:43 GMT

	#asm

		ldx		#1	

	.for_loop:

		lda		_zy , X
		sta		_ztemp2


		lda		_zl , X
		sta		_ztemp1


		lda		_zskip , X 
		sta		_ztemp3
		
		txa
		tay

	.while_loop:

		lda		_zy - 1 , Y
		cmp		_ztemp2
		bcs		.end_while_loop
		
		
		
		lda		_zy -1 , Y
		sta		_zy , Y

		lda		_zl -1 , Y
		sta		_zl , Y

		lda		_zskip -1 , Y
		sta		_zskip , Y

		dey
		bne		.while_loop

		

	.end_while_loop:

		lda		_ztemp2
		sta		_zy , Y

		lda		_ztemp1
        sta		_zl , Y

		lda		_ztemp3
		sta		_zskip , Y

		inx
		cpx		#14
		bcc		.for_loop

#endasm

I actually managed to fix the asm! Here's the updated Zsort function. Note that the decy is in the wrong place, so I had to also change the referencing in the code.

Arkhan Lord of Thunder Shut your binch ass up Posts: 293 Homebrew skills: MML, Shitty Atari Ports, Incessant Negativity Fave PCE Shooter: Go Go Birdie Chance Fave PCE Platformer: Necromancer Fave PCE Game Overall: Super Metroid Fave PCE RPG: Steam Hearts Currently Playing: with myself	Sorting Failures Oct 24, 2019 2:29:14 GMT gredler likes this Quote Select Post Deselect Post Link to Post Member Give Gift Back to Top Post by Arkhan on Oct 24, 2019 2:29:14 GMT Motherfucker doin him an asm.
	[Fri 19:34]<nectarsis> been wanting to try that one for awhile now Ope [Fri 19:33]<Opethian> l;ol huge dong Support Homebrew (or dont): I'm a max level Forum Warrior. I'm immortal. If you're not ready to defend your claims, don't post em

elmer
ANIKIIII!

Posts: 1,040

Sorting Failures Oct 24, 2019 2:33:39 GMT gredler likes this

Quote

Post by elmer on Oct 24, 2019 2:33:39 GMT

Oct 24, 2019 0:55:43 GMT DarkKobold said:

I actually managed to fix the asm! Here's the updated Zsort function. Note that the decy is in the wrong place, so I had to also change the referencing in the code.

Congratulations!