|
Post by elmer on Dec 22, 2019 6:31:19 GMT
Using the easily findable gpl-3.0.txt file (35147 bytes long) as a example ... apdstr (Peter Ferrie's 65C02 code from aPLib) take 6,551,627 cycles to decompress it. My aplib 6502 code takes 2,668,144 cycles to decompress it (running on a PCE). My aplib 6280 code takes 2,202,548 cycles to decompress it (using TII, and locking out interrupts). I've been playing with LZSA2, so ... Peter Ferrie's 6502 LZSA2 decompressor takes 2,723,148 cycles. My new 6502 LZSA2 decompressor takes 1,836,189 cycles. That's approximately 52 cycles per decompressed-byte average, which is really close to the performance that I have seen quoted for 6502 LZ4 decompressors. That is in comparison to the 75 cycles per decompressed-byte average that my aPLib decompressor gets. The problem with all of these high-performance compressors is that once you can efficiently encode 2-byte and 3-byte matches (which you need for the best compression), then your decompression code starts to get overwhelmed by the transition between very short runs of literal-copies and match-copies, and the speed of your inner-loop gets drowned out by all of the setup for the copies. Ironically, that is *exactly* where the old LZSS algorithm actually does really well.
|
|
|
Post by elmer on Dec 30, 2019 1:07:37 GMT
elmer do you have numbers on for decompression on the target side (how fast)? Back at you Tom, how are things going with helping out DK & Catastrophy, and do you have any numbers for pucrunch performance? Emmanuel Marty has been kind enough to implement an "enhanced" mode in APULTRA that changes the aPLib format a tiny bit to make it decompress 11% faster on the 6502/HuC6280. He has also implemented a "window size" option, which means that the aPlib format can now be used for decompressing to the VDC/VCE on a PC Engine. I've written an HuC6280 decompressor for both the "standard" and "enhanced" modes that can decompress to either RAM (no window) or VDC/VCE (using a window). The decompressor uses no self-modifying code, so it can run directly from a HuCard. I'm not exactly happy with the result, and I'm curious to hear what kind of performance you're getting with your pucrunch code. Even when decompressing to RAM, my HuC6280 code runs 20% slower than my 6502 code, and that drops to 50% slower when using the window and decompressing to the VDC. To be honest, that's kinda what I suspected would happen, and it just seems to reinforce my belief that Falcom had the right idea when they just decompressed to RAM and then did a TIA afterwards to copy the data to the VDC, which is only about a 10% overhead on decompressing to RAM.
|
|
|
Post by DarkKobold on Dec 30, 2019 21:19:28 GMT
elmer do you have numbers on for decompression on the target side (how fast)? Back at you Tom, how are things going with helping out DK & Catastrophy, and do you have any numbers for pucrunch performance? This is delayed, and has been 100% my fault.
I've needed to provide him with some sample art, and I just haven't. I don't have a good excuse.
|
|
|
Post by elmer on Jan 2, 2020 2:45:04 GMT
This is delayed, and has been 100% my fault. I've needed to provide him with some sample art, and I just haven't. I don't have a good excuse. No problem, we all get delayed at this time of year. I'm still curious about how you're planning to do this. DK, unless I'm mistaken, you're still doing all of your art conversion within HuC/PCEAS itself, and you're not using a separate toolchain to convert your art ... which suggests that having something built into HuC/PCEAS is going to be the far-easiest option for you. From my side of things, I've now got an LZSA1 decompressor to add to my LZSA2 decompressor ... but I really don't think that something as complex as either LZSA or APLIB belong inside PCEAS itself. Normally, by the time that a development team really needs to deal with compression, they're already using their own customized toolchain that is targeted to their own specialized development requirements.
|
|
|
Post by DarkKobold on Jan 2, 2020 6:18:12 GMT
This is delayed, and has been 100% my fault. I've needed to provide him with some sample art, and I just haven't. I don't have a good excuse. No problem, we all get delayed at this time of year. I'm still curious about how you're planning to do this. DK, unless I'm mistaken, you're still doing all of your art conversion within HuC/PCEAS itself, and you're not using a separate toolchain to convert your art ... which suggests that having something built into HuC/PCEAS is going to be the far-easiest option for you. From my side of things, I've now got an LZSA1 decompressor to add to my LZSA2 decompressor ... but I really don't think that something as complex as either LZSA or APLIB belong inside PCEAS itself. Normally, by the time that a development team really needs to deal with compression, they're already using their own customized toolchain that is targeted to their own specialized development requirements.
You could argue that we do have a customized toolchain - HuC has built in png to both tile format and sprite format. You customized a promotion stm importer which is fucking spectacular, like beyond awesome. We can quickly build maps and shtuff in promotion and turn it into real-life game art.
Further, My build batch file always rebuilds squirrel's mml and then compiles the ROM. So we have a toolchain. Its short, but I think it qualifies.
|
|
|
Post by elmer on Jan 7, 2020 3:00:17 GMT
You could argue that we do have a customized toolchain You could certainly argue that, and I'll be polite enough to avoid challenging your argument! From my side of things, I've now got an LZSA1 decompressor to add to my LZSA2 decompressor ... but I really don't think that something as complex as either LZSA or APLIB belong inside PCEAS itself. Emmanuel Marty has added my 6502 decompressors to his LZSA and APULTRA projects in github, and here are some HuC6280 decompressors for the PC Engine. The aPLib decompressor is the ugly one that I mentioned earlier, that is for HuCard projects that need to decompress to both RAM, and also directly to VRAM. All of the decompressors are written to provide multiple size-vs-speed tradeoffs, and a few bytes could be saved by picking just one set of options and then rearranging the code so that the few "jsr" calls could be replaced with "bsr". www.dropbox.com/s/ymeym8g8ed74g1k/pce_aplib_lzsa_decompressors.zip?dl=1
|
|
|
Post by Arkhan on Jan 12, 2020 16:32:56 GMT
Yeah..... sorry DK that's not a toolchain lol.
It's just you recompiling MML when it doesn't need to be everytime you build your game lol.
You need all them fabled utilities we talked about on Discord
|
|
|
Post by DarkKobold on Jan 13, 2020 17:32:29 GMT
Yeah..... sorry DK that's not a toolchain lol. It's just you recompiling MML when it doesn't need to be everytime you build your game lol. You need all them fabled utilities we talked about on Discord I must be misunderstanding toolchain. MML2PCE is a tool, huc is a tool, they execute in a chain? That said, I don't care if I have a toolchain, or customized utilities. I'm able to make games, and that's all I care about. HuC is fantastic. Squirrel is fantastic. Promotion is fantastic. It's all fucking great, because its enabled me to make games. So Thank you elmer, thank you Arkhan, and thank you dshadoff for your contributions to everything above. And yeah, 99% of the time its totally useless, but it takes microseconds whenever I compile. I guarantee that I waste more time typing "CD squirrel" "mml2pce cat.mml" "cd.." once than all the extra microseconds per build.
|
|
|
Post by Arkhan on Jan 14, 2020 7:03:08 GMT
semantically, you are absolutely right, it is a toolchain.
but its a chain with two links, and one of them barely needs to be there (Squirrel) unless songs change.
toolchain generally implies something more convoluted.
For example, imagine if the incs that elmer provided for stuff were provided as external utilities.
You'd have graphics converting in a utility, probably being compressed in another, maps being eaten and output into a binary as well, and then squirrel, and all of this would be output and used by the code.
You have more like, a tool shoestring at the moment. It is absolutely functional and fine. Its just ... barely a toolchain in the way that most people use the term.
toolchipclip?
You aren't wrong that it's a toolchain its just so basic it barely counts as one lol.
and not using a convoluted toolchain is fine. You just have lost some flexibility and as of now, using the CD stuff is going to be goofy? I think at least.
The data overlay part is still weird.
|
|
|
Post by elmer on Jan 21, 2020 21:47:46 GMT
DarkKobold gredler : Are you sure that you guys really want to go through the hassle of compressing tilesets? I have written a simple and fast compressor that would be suitable to include in HuC/PCEAS without slowing everything down too much, or bloating up PCEAS with code that few people are going to use. It is designed secifically for HuCard games, and it uses a small 2KByte chunk of memory in RAM for a decompression window, so that it can decompress stuff directly to the VDC. IMHO it is NOT a particularly good choice for CD games, where you are better-off using a more-sophisticated compressor, and just decompressing to RAM and then copying from RAM to the VDC. Here are compression results when tested on one of the files that I tried earlier in the thread. Note that I've added the results for a traditional LZSS16 compressor (as used by Hudson in Gate of Thunder, Seiya Monogatari and others) which uses a 4KByte window, and so isn't a great choice for HuCards. **********************************************************
40,164 popcore.bin (uncompressed)
28,612 popcore.bin.lzss16 (4KByte window buffer) 28,227 popcore.bin.lz4
--> 27,235 popcore.bin.lz2kw (2KByte window buffer)
26,345 popcore.bin.lzsa1
25,029 popcore.bin.pucrunch (2KByte window buffer)
24,366 popcore.bin.pucrunch
23,958 popcore.bin.aplib (2KByte window buffer) 23,929 popcore.bin.lzsa2 23,012 popcore.bin.deflate
22,926 popcore.bin.aplib
********************************************************** While those results might look OK, here are the results when testing with some actual PC Engine graphics from LoX2 ... **********************************************************
15,232 lox2_lvl1_chr.bin (uncompressed)
12,875 lox2_lvl1_chr.lzss16 (4KByte window buffer) --> 12,601 lox2_lvl1_chr.lz2kw (2KByte window buffer) 12,411 lox2_lvl1_chr.lz4 12,101 lox2_lvl1_chr.lzsa1
11,913 lox2_lvl1_chr.pucrunch(2KByte window buffer) 11,282 lox2_lvl1_chr.lzsa2
**********************************************************
13,088 lox2_lvl2_chr.bin (uncompressed)
10,788 lox2_lvl2_chr.lzss16 (4KByte window buffer) 10,571 lox2_lvl2_chr.lz4 --> 10,403 lox2_lvl2_map.lz2kw (2KByte window buffer) 10,146 lox2_lvl2_chr.lzsa1
9,737 lox2_lvl2_map.pucrunch(2KByte window buffer) 9,407 lox2_lvl2_chr.lzsa2
********************************************************** Apart from the result that the new compressor beats LZSS16, and is comparable to LZ4 ... the other thing to look at is how little compression any of these are getting with typical PC Engine tile data. Does Catastrophy really need to save 20% of space that badly? Understand that this compression is only going to help on tilesets and sprites that are loaded at the beginning of a level, and that it is too slow to use it on animations that you are uploading to VRAM while the game is playing.
|
|
|
Post by gredler on Jan 22, 2020 0:32:00 GMT
DarkKobold gredler : Does Catastrophy really need to save 20% of space that badly? Understand that this compression is only going to help on tilesets and sprites that are loaded at the beginning of a level, and that it is too slow to use it on animations that you are uploading to VRAM while the game is playing. Thanks for more help! Compression would probably be better than no compression - the more space to add stuff the better! Most of our game as you've seen is fairly divided between tilemaps, and there are a lot of large sprites that are only visible between screen fades (the large catastrophy and cat head, logo sprites, level select screen). So I imagine there are a lot of safe places we can hide the compression processes. Even each level is currently only using one tilemap, maxed out to ~255 tiles and 15 palettes, so same thing there hypothetically they could be decompressed between the level "title card" and the actual gameplay. We don't "need" any more space, really, but my initial curiosity was "how much can we fit" considering now DK is letting me provide sprites as background to add details beyond using sprites only for gameplay objects and characters. I would also love to have a nice full screen image at the ending of the game, etc.
|
|
|
Post by dshadoff on Jan 22, 2020 1:26:05 GMT
I wonder if you might want to make this into an includable chunk of code rather than a fixed part of standard HuC...
Toward the end of my involvement in HuC, the library had become pretty big, and the pinned-page jump table was getting unwieldy, so I was struggling with how to include only the library functions which are actually used/desired, yet balancing that against ease-of-mapping and keeping dependent code in the same bank wherever possible. I didn't get too far on those thoughts, but this seems like a pretty good candidate for keeping away from the core - hopefully, it won't be called 100 times per second, for example.
Just a thought...
|
|
|
Post by elmer on Jan 22, 2020 2:23:38 GMT
I wonder if you might want to make this into an includable chunk of code rather than a fixed part of standard HuC... That's definitely a good thought, especially since I don't want to dig around in those fixed banks and try to figure out what space is available. Luckily Uli already added a system to handle things like this. There is now a "library" system in HuC so that you can just add functional subsystems, like "malloc", into your project, and the .c files (with embedded assembly-language code) will be pulled in from the standard HuC include path. That means that a decompressor can just live in the same space as normal C code, and not be permanently mapped in.
|
|
|
Post by dshadoff on Jan 22, 2020 4:36:26 GMT
You could also make it modular... slightly different include name (but same function calls) for different compressors...
|
|
|
Post by DarkKobold on Jan 22, 2020 17:41:22 GMT
DarkKobold gredler : Are you sure that you guys really want to go through the hassle of compressing tilesets? Thanks a ton for doing this. I'll somewhat echo what Gredler said. We're at 840kB, still missing lots of music and art. So, I won't know if we need compression until we hit 1000kb, and are crying at the 3 banks left before we are over limit. I'll post a quick update in the Catastrophy thread.
|
|