|
Post by DarkKobold on Jan 5, 2021 2:44:50 GMT
I saw a cool video, and thought this would be easily possible in HuC:
The thing is, I suck at asm. I have no idea how to get variables where I want them. Here was my first horrible approximation.
scaletable: .db 0 .db 0b00000011 .db 0b00001100 .db 0b00001111 .db 0b00110000 .db 0b00110011 .db 0b00111100 .db 0b00111111 .db 0b11000000 .db 0b11000011 .db 0b11001100 .db 0b11001111 .db 0b11110000 .db 0b11110011 .db 0b11111100 .db 0b11111111
load_vram16t32:
; ---- ; map data ; jsr map_data
; ---- ; set vram address ; jsr set_write
; ---- ; copy data ; cly ldx <_cl beq .l3 ; -- .lvv1: lda [_si],Y sty ror 4 tay lda scaletable,Y sta video_data_l iny iny sta video_data_l decy decy lda [_si],Y and 0x0F tay lda scaletable,Y sta video_data_h addy 2 bne .lvv2 inc <_si+1 ; -- .lvv2: dex bne .lvv1 ; -- jsr remap_data ; -- .lvv3: dec <_ch bpl .lvv1
; ---- ; unmap data ; jmp unmap_data
|
|
|
Post by DarkKobold on Jan 5, 2021 3:50:15 GMT
Here's how I imagine the psuedocode looks
load_vram16t32(vram_address, sprite); shift=0; for j=0 to 3 //each bitplane
for i=0 to 7 //for the top 8 pixels temp = sprite[i*2+shift] //get the sprite low byte temp = temp>>4 //shift it right 4 for lookup table (only 8 byte lookup) temp = LUT[temp] //get the lookup value vram_address[i*2]=temp // put the new byte in vram vram_address[i*2+2]=temp //put it in the next stripe as well temp = sprite[i*2+shift] //get the sprite low byte temp = temp&7 //take only the first 4 bytes temp = LUT[temp] //get the lookup value vram_address[i*2+1]=temp // put the new byte in vram in the right side vram_address[i*2+3]=temp //put it in the next stripe as well //Do the right side of the new 32x32 by using the high byte temp = sprite[i*2+shift+1] //get the sprite high byte temp = temp>>4 //shift it right 4 for lookup table (only 8 byte lookup) temp = LUT[temp] //get the lookup value vram_address[i*2+0x40]=temp // put the new byte in vram vram_address[i*2+2+0x40]=temp //put it in the next stripe as well temp = sprite[i*2+shift+1] //get the sprite high byte temp = temp&7 //take only the first 4 bytes temp = LUT[temp] //get the lookup value vram_address[i*2+1+0x40]=temp // put the new byte in vram in the right side vram_address[i*2+3+0x40]=temp //put it in the next stripe as well end shift+=16 //increment by 16 to get to the next bitplane end
//repeat for the bottom 8 pixels for j=0 to 3 //each bitplane
for i=0 to 7 //for the top 8 pixels temp = sprite[i*2+shift] //get the sprite low byte temp = temp>>4 //shift it right 4 for lookup table (only 8 byte lookup) temp = LUT[temp] //get the lookup value vram_address[i*2+0x80]=temp // put the new byte in vram vram_address[i*2+2+0x80]=temp //put it in the next stripe as well temp = sprite[i*2+shift] //get the sprite low byte temp = temp&7 //take only the first 4 bytes temp = LUT[temp] //get the lookup value vram_address[i*2+1+0x80]=temp // put the new byte in vram in the right side vram_address[i*2+3+0x80]=temp //put it in the next stripe as well //Do the right side of the new 32x32 by using the high byte temp = sprite[i*2+shift+1] //get the sprite high byte temp = temp>>4 //shift it right 4 for lookup table (only 8 byte lookup) temp = LUT[temp] //get the lookup value vram_address[i*2+0xC0]=temp // put the new byte in vram vram_address[i*2+2+0xC0]=temp //put it in the next stripe as well temp = sprite[i*2+shift+1] //get the sprite high byte temp = temp&7 //take only the first 4 bytes temp = LUT[temp] //get the lookup value vram_address[i*2+1+0xC0]=temp // put the new byte in vram in the right side vram_address[i*2+3+0xC0]=temp //put it in the next stripe as well shift+=16 //increment by 16 to get to the next bitplane end
The idea is to break the 16x16 sprite into 8x8 sections, and scale each bitplane of the 8x8 into a corresponding new 16x16 section of VRAM, creating the 32x32.
|
|
|
Post by turboxray on Jan 7, 2021 19:13:50 GMT
EDIT: Added slight optimization
loadvram_scale_16to32:
; NOTE: needs code here for HuC to map in data, set local address to _si, and set vram address.
.plane.offset = $10 plane.count = _dl
lda #$04 sta <plane.count
lda #$ff pha lda #$10 ; lower right pha lda #$11 ; lower left pha lda #$00 ; upper right pha lda #$01 ; upper left
.corner.loop tay
.plane.loop
ldx #8 .column.corner ; write 8x8 pixel block as 16x16 block.. single plane phy lda [_si],Y pha and #$0f tay lda .scaletable,Y sta video_data_l
pla lsr a lsr a lsr a lsr a tay lda .scaletable,Y sta video_data_h
sta video_data_h
ply iny iny dex bne .column.corner
dec <plane.count beq .set.next.corner tya clc adc #.plane.offset ; next set of 16 1bit pixels tay bra .plane.loop
.set.next.corner
lda #$04 sta <plane.count
pla ; get next corner offset bmi .out bra .corner.loop
.out rts
.scaletable
.db %00000000 .db %00000011 .db %00001100 .db %00001111
.db %00110000 .db %00110011 .db %00111100 .db %00111111
.db %11000000 .db %11000011 .db %11001100 .db %11001111
.db %11110000 .db %11110011 .db %11111100 .db %11111111
I clocked the whole set of iterations to be 9722 cycles or ~8.2% cpu resource.
|
|
|
Post by DarkKobold on Jan 8, 2021 1:27:31 GMT
|
|