|
Post by elmer on Apr 11, 2019 3:27:09 GMT
DK & Gredler's talk of potentially needing to compress some of their graphics, so that Catastrophy can fit within a 1MB ROM, has reminded me of my previous discussion of different compression methods in my old Xanadu Blog on PCEFX. For those folks that don't remember (or have chosen to forget) the level of terminally boring detail and tests that I went into in that thread, the conclusion that I came to back then was that Jorgen Ibsen's aPLib compressor was probably one of the best options for PCE developers, because it offered a slightly (but noticeably) better compression ratio than the SWD5 code that I am using in the LoX translations, and it did so with a similar cost in decompression time. Since part of the point of writing the replacement OS for the Turbo Everdrive is to get a bunch of practical PCE assembly-language code out into the Open Source world, so that new PCE assembly-language programmers can tear it apart (and perhaps, even improve it), I've decided that I'm going compress the font data in the replacement OS. I've written a simple command-line compression utility for Windows/Linux that uses aPLib to compress a bunch of data files into a single output file that can be included in a HuCard or CD-ROM project. The data format is similar to the one that both of the Legend of Xanadu games use to store their graphics/code assets for each level. You can find the compressor source, and Windows executables, on github here ... APLPAK. The next stage is to write an assembly-language decompressor for the data. Jorgen Ibsen's aPLib distribution comes with a 65C02 assembly-language decompressor for the Apple II that was contributed by Peter Ferrie (with a C version, too). So ... the question arises ... can we come up with anything better than that for our PC Engine? I'm going to be writing my own version, and am putting the challenge out there to any assembly-language programmers. Let's see what we can do, and talk about the choices that we make while coding the decompressor. Peter Ferrie's code is only about 250 bytes long, less than 300 lines of code with the comments. That's a pretty small routine, and not a major time-commitment to re-write ... more like a crossword puzzle. Anyone up for the challenge?
|
|
|
Post by DarkKobold on Apr 11, 2019 3:51:56 GMT
DK & Gredler's talk of potentially needing to compress some of their graphics, so that Catastrophy can fit within a 1MB ROM, has reminded me of my previous discussion of different compression methods in my old Xanadu Blog on PCEFX. Anyone up for the challenge?
I have never felt so out of my depth, looking at his source code. I appreciate you trying this, and hope it comes up with something great. However, I am 100% not up for the challenge.
|
|
|
Post by elmer on Apr 14, 2019 23:48:42 GMT
I have never felt so out of my depth, looking at his source code. You're still a fairly-new programmer, so don't worry about it in the slightest. It usually takes a few years before people become confident-enough with the basics that they feel ready to explore some of the "uuuh ... how does this actually work?" areas of programming. And even then, data-compression is sometimes seen as complex or difficult, even though the basic ideas behind most of the techniques can usually be explained in a few sentences. It's the details of exactly how the basic ideas work in practice that can get complicated ... or, sometimes, not. I remember trying to explain the the simple compression in Emerald Dragon to NightWolve, and he didn't seem interested in trying to understand it. He had made up his mind that it was too complicated, and that he wasn't interested, and that was all. There's nothing wrong with that; we all have the right to choose what interests us, and the things that we're going to spend our time on. Anyway ... I guess that nobody else here is interested in this particular challenge/discussion, so, just in case anyone cares to see an example of converting C code into assembly-language, I'll bore everyone else and post the results anyway (which I've uploaded to the github project listed earlier). ...
|
|
|
Post by elmer on Apr 14, 2019 23:51:22 GMT
Jorgen Ibsen's decompressor written in C ...
/* * aPLib compression library - the smaller the better :) * * C depacker * * Copyright (c) 1998-2014 Joergen Ibsen * All Rights Reserved * * http://www.ibsensoftware.com/ */
#include "depack.h"
/* internal data structure */ struct APDSTATE { const unsigned char *source; unsigned char *destination; unsigned int tag; unsigned int bitcount; };
static unsigned int aP_getbit(struct APDSTATE *ud) { unsigned int bit;
/* check if tag is empty */ if (!ud->bitcount--) { /* load next tag */ ud->tag = *ud->source++; ud->bitcount = 7; }
/* shift bit out of tag */ bit = (ud->tag >> 7) & 0x01; ud->tag <<= 1;
return bit; }
static unsigned int aP_getgamma(struct APDSTATE *ud) { unsigned int result = 1;
/* input gamma2-encoded bits */ do { result = (result << 1) + aP_getbit(ud); } while (aP_getbit(ud));
return result; }
unsigned int aP_depack(const void *source, void *destination) { struct APDSTATE ud; unsigned int offs, len, R0, LWM; int done; int i;
ud.source = (const unsigned char *) source; ud.destination = (unsigned char *) destination; ud.bitcount = 0;
R0 = (unsigned int) -1; LWM = 0; done = 0;
/* first byte verbatim */ *ud.destination++ = *ud.source++;
/* main decompression loop */ while (!done) { if (aP_getbit(&ud)) { if (aP_getbit(&ud)) { if (aP_getbit(&ud)) { offs = 0;
for (i = 4; i; i--) { offs = (offs << 1) + aP_getbit(&ud); }
if (offs) { *ud.destination = *(ud.destination - offs); ud.destination++; } else { *ud.destination++ = 0x00; }
LWM = 0; } else { offs = *ud.source++;
len = 2 + (offs & 0x0001);
offs >>= 1;
if (offs) { for (; len; len--) { *ud.destination = *(ud.destination - offs); ud.destination++; } } else { done = 1; }
R0 = offs; LWM = 1; } } else { offs = aP_getgamma(&ud);
if ((LWM == 0) && (offs == 2)) { offs = R0;
len = aP_getgamma(&ud);
for (; len; len--) { *ud.destination = *(ud.destination - offs); ud.destination++; } } else { if (LWM == 0) { offs -= 3; } else { offs -= 2; }
offs <<= 8; offs += *ud.source++;
len = aP_getgamma(&ud);
if (offs >= 32000) { len++; } if (offs >= 1280) { len++; } if (offs < 128) { len += 2; }
for (; len; len--) { *ud.destination = *(ud.destination - offs); ud.destination++; }
R0 = offs; }
LWM = 1; } } else { *ud.destination++ = *ud.source++; LWM = 0; } }
return (unsigned int) (ud.destination - (unsigned char *) destination); }
|
|
|
Post by elmer on Apr 14, 2019 23:55:05 GMT
And my assembly-language version for the PC Engine ...
; *************************************************************************** ; *************************************************************************** ; ; Decompression Options & Macros ;
; ; Take 25% loss in speed to make the code smaller? ;
APL_BEST_SIZE = 0
; ; Assume that we're decompessing from a large multi-bank ; compressed data file and that MPR3 (and MPR4) are used ; as a window into that file? ;
APL_FROM_MPR3 = 0
; ; Macro to increment the source pointer to the next page. ;
if APL_FROM_MPR3 APL_INC_PAGE macro bsr .next_page endm else APL_INC_PAGE macro inc <apl_srcptr + 1 endm endif
; ; Macros to read a byte/bit from the compressed source data. ;
if APL_BEST_SIZE
APL_JMP_TII = 0 APL_GET_BIT macro bsr .get_bit endm APL_GET_SRC macro bsr .get_src endm
else
APL_JMP_TII = 1 APL_GET_BIT macro asl <apl_bitbuf bne .skip\@ bsr .load_bit .skip\@: endm APL_GET_SRC macro lda [apl_srcptr],y iny bne .skip\@ APL_INC_PAGE .skip\@: endm
APL_GET_SRCX macro lda [apl_srcptr] ; Can be used if you *really* inc <apl_srcptr + 0 ; don't want to map MPR4. bne .skip\@ APL_INC_PAGE .skip\@: endm
endif
; *************************************************************************** ; *************************************************************************** ; ; Data usage is pretty-much all of the temporary System Card locations. ; ; Note that the TII instruction runs into the bottom 2 bytes of the stack. ;
apl_bitbuf = __bp ; 1 byte.
apl_srcptr = __si ; 1 word. apl_offset = __di ; 1 word.
apl_winptr = __ah ; Part of TII instruction. apl_dstptr = __bh ; Part of TII instruction. apl_length = __ch ; Part of TII instruction.
apl_lwmbit = apl_length ; 1 byte.
; *************************************************************************** ; *************************************************************************** ; ; apl_decompress - Decompress data stored in Jorgen Ibsen's aPLib format. ; ; Args: apl_srcptr = ptr to compessed data ; Args: apl_dstptr = ptr to output buffer ; Uses: __bp, __si, __di, __ax, __bx, __cx, __dx ; ; If compiled with APL_FROM_MPR3, then apl_srcptr should be within MPR3, i.e. ; in the range $6000..$7FFF, and both MPR3 & MPR4 should be set. ; ; As an optimization, the code to handle window offsets > 32000 bytes has ; been commented-out, since these don't occur in typical PC Engine usage. ;
apl_decompress: lda #$73 ; TII instruction. sta <__al
if APL_JMP_TII tii .tii_end, __dh, 3 ; TII ends with JMP. else lda #$60 ; TII ends with RTS. sta <__dh ; Saves 6 bytes of code. endif
lda #$80 ; Initialize an empty sta <apl_bitbuf ; bit-buffer.
cly ; Initialize source index.
; ; 0 bbbbbbbb - One byte from compressed data, i.e. a "literal". ;
.literal: APL_GET_SRC
.write_byte: ldx #-3 ; LWM=-3 (C code uses 0).
sta [apl_dstptr] ; Write the byte directly to inc <apl_dstptr + 0 ; the output. bne .next_tag inc <apl_dstptr + 1
.next_tag: APL_GET_BIT ; 0 bbbbbbbb bcc .literal
APL_GET_BIT ; 1 0 <offset> <length> bcc .code_pair
APL_GET_BIT ; 1 1 0 dddddddn bcc .copy_two_three
; 1 1 1 dddd - Copy 1 byte within 15 bytes (or zero).
.copy_one: lda #$10 .nibble_loop: APL_GET_BIT rol a bcc .nibble_loop beq .write_byte ; Offset=0 means write zero.
eor #$FF adc <apl_dstptr + 0 sta <apl_winptr + 0 lda #$FF adc <apl_dstptr + 1 sta <apl_winptr + 1
lda [apl_winptr] bra .write_byte
; ; Subroutines for byte & bit handling. ;
if APL_BEST_SIZE .get_bit: asl <apl_bitbuf ; Get the next bit value from beq .load_bit ; the bit-buffer. rts
.get_src: lda [apl_srcptr],y ; Get the next byte value from iny ; from the compressed source. beq .next_page rts if APL_FROM_MPR3 else .next_page: inc <apl_srcptr + 1 rts endif endif
if APL_FROM_MPR3 .next_page: inc <apl_srcptr + 1 ; Increment source page, and bmi .next_bank ; check if changed bank. rts
.next_bank: pha ; Increment source bank view tma3 ; within a large data file. inc a tam3 inc a tam4 lda #$60 sta <apl_srcptr + 1 pla rts endif
.load_bit: pha ; Reload an empty bit-buffer APL_GET_SRC ; from the compressed source. rol a sta <apl_bitbuf pla rts
.get_gamma: lda #1 ; Get a gamma-coded value. stz <apl_length + 1 .gamma_loop: APL_GET_BIT rol a rol <apl_length + 1 APL_GET_BIT bcs .gamma_loop rts
; ; 1 1 0 dddddddn - Copy 2 or 3 within 128 bytes. ;
.copy_two_three:APL_GET_SRC ; 1 1 0 dddddddn lsr a beq .finished ; offset 0 == EOF.
sta <apl_offset + 0 ; Preserve offset. stz <apl_offset + 1 cla adc #2 sta <apl_length + 0 stz <apl_length + 1 bra .do_match
.finished: rts ; All decompressed!
; ; 1 0 <offset> <length> - gamma-coded LZSS pair. ;
.code_pair: bsr .get_gamma ; Bits 8..15 of Offset (min 2).
stx <apl_lwmbit ; Add LWM (-3 or -2) to Offset. adc <apl_lwmbit ; tax ; Removing these lines limits ; lda <apl_length + 1 ; the Offset to < 64768. ; adc #$FF bcs .normal_pair ; CC if LWM==-3 && Offset==2.
bsr .get_gamma ; Get Length (lo-byte in A). bra .do_match ; Use previous Offset.
.normal_pair: sta <apl_offset + 1 ; Bits 8..15 of Offset.
APL_GET_SRC sta <apl_offset + 0 ; Bits 0...7 of Offset.
bsr .get_gamma ; Get length (lo-byte in A).
ldx <apl_offset + 1 ; If offset < 256. beq .lt256 ; cpx #$7D ; If offset >= 32000, length += 2. ; bcs .match_plus2 cpx #$05 ; If offset >= 1280, length += 1. bcs .match_plus1 bra .do_match .lt256: ldx <apl_offset + 0 ; If offset < 128, length += 2. bmi .do_match
.match_plus2: inc a ; Increment length. bne .match_plus1 inc <apl_length + 1
.match_plus1: inc a ; Increment length. bne .do_match inc <apl_length + 1
.do_match: sta <apl_length + 0
sec ; Calc address of match. lda <apl_dstptr + 0 sbc <apl_offset + 0 sta <apl_winptr + 0 lda <apl_dstptr + 1 sbc <apl_offset + 1 sta <apl_winptr + 1
if APL_JMP_TII jmp __ax else jsr __ax endif
.match_done: clc ; Update destination address. lda <apl_length + 0 adc <apl_dstptr + 0 sta <apl_dstptr + 0 lda <apl_length + 1 adc <apl_dstptr + 1 sta <apl_dstptr + 1
ldx #-2 ; LWM=-2 (C code uses 1).
jmp .next_tag
if APL_JMP_TII .tii_end: jmp .match_done endif
|
|
|
Post by DarkKobold on Apr 15, 2019 6:24:52 GMT
I have never felt so out of my depth, looking at his source code. You're still a fairly-new programmer, so don't worry about it in the slightest.
LOL, fyi, I've been coding since I was 8, about 31 years ago now. Granted, programming has always been a side interest, and while I do some programming in my job, it's few and far between.
It's also been almost 20 years since I took my one class on ASM, and it was in 68HC11, which with it's multiple registers is just fairly easier to read.
|
|
|
Post by elmer on Apr 20, 2019 19:13:37 GMT
Whoops, I was trying to be encouraging, and instead probably just came across as condescending ... sorry!
|
|