|
Post by Black_Tiger on Mar 10, 2021 17:29:00 GMT
Don't forget people: the Neo Geo has only 1 background layer. When it comes to scrolling, whether there is parallax or not, the Neo Geo has zero background layers.
|
|
|
Post by gredler on Mar 10, 2021 18:08:27 GMT
Don't forget people: the Neo Geo has only 1 background layer. When it comes to scrolling, whether there is parallax or not, the Neo Geo has zero background layers. Haha ya neo geo only has one layer for everything and can't do tilemaps. Trash system
|
|
|
Post by sunteam_paul on Mar 10, 2021 19:47:40 GMT
It's a prime example that cherry picking system specifications for arguments is fruitless. You have to look at the whole package.
|
|
|
Post by SignOfZeta on Mar 10, 2021 20:32:36 GMT
No kidding. I mean, look at all the stuff out there with a 68k as the most powerful chip.
Are you seriously telling me that a Neo is on par with an Amiga or Mega Drive? They have the same 16-bit CPU but there is NO CONTEST. The same goes for pretty much any arcade PCB based on the 68k, of which there many. The Mac SE also was 68k based, it doesn’t even do color! Same for the SE30, with a 68030 in it.
If a PCE has a friggn CDROM and 2mb of RAM it’s clearly not a NES. That is pretty stupid stuff to say. Furthermore, if one is speaking of ancient video game crap like this at all you’d think some perspective would be in order. Whatever this guy prefers is as much stratified dinosaur shit as much as the PCE is.
|
|
|
Post by turboxray on Mar 12, 2021 20:00:45 GMT
Clock speeds topping out wasn't the only thing. The 16-bit shift was a big deal on microcomputers for real-world work as well. Being able to work on larger values and address more RAM space was important. Obviously for gaming there was a disconnect between audio and graphics and the ability of the CPU to work with larger word sizes. But there's also the fact that 16-bit CPUs, by virtue of being (usually) more modern designs, often had more modern instruction sets and accommodations. Larger ram/memory space was something 16bit processors provided but it's in no way exclusive to 16bit processors. I keep using the 6809 as a real world example for a reason. The CoCo 3 had 512k ram capability from the start - it wasn't a hack. It actually has a real multitasking OS developed for it. It has position-independent code. It can pair registers for some 16bit operations. It has hardware multiply. It wasn't limited to the 64k or 128k setups of the Commodore, Atari, and other 8bit computer lines. My CoCo3 has 2mebatytes of ram installed. The MMU doesn't have to be built into the processor like the huc6280. The 6809 is paired with an external MMU but has the exact same banking mechanism as the 6280 (and even the same external 21bit address range). An 8bit processor doesn't prevent you from working with larger values; 16bit, 24bit, 32bit, and larger for floating point precision. I mean, does it take more instructions to work with it? Sure. But those instructions times are faster than 8086 and 68k, so it's not like it takes exactly 2x or 4x or whatever amount of time for the equivalent operation. The 68k and 8086 instruction cycle times aren't fast. Their single instruction cycle times are actually slow compared to the equivalent on the 65x/6809. The speed comes from being able to work on more bits for a single instruction, but that doesn't translate into being an automatic 2x the performance. And what about all the instructions don't require 16bit or larger values? Those instructions are at minimum 2x faster on the 65x/6809 than on the 68k. And they happen a lot more often than 16bit operations. JSR on the 68k takes 18 cycles (optimized, more than that in general), and RTS is 16 cycles. That's 7 cycles for each on the 65x/6809. Conditional branches are everywhere; it's 8-10 cycles on the 68k and 2-4 cycles on the 65x/6809. What about fast context switching via interrupts? The 68k has like a 50 cycle overhead because it even enters the ISR. And that doesn't preserve any registers (on the 68k). And RTE is like 20 cycles. That kind of stuff requires additional hardware to handle that (like the "copper" on the Amiga). And 8bit processor is like 7 cycles in, 7 cycles return (which is why they remained popular as embedded controllers/devices year after). So what about X specific instruction or Y specific instruction, etc. These 8bit processors were on the scene in 1979. There's was nothing stopping the 8bit design from receiving new instructions. In fact, that did happen - just not in main stream. The hitachi 6309 (which is 6809 compatible) in native mode can do 16bit x 16bit -> 32bit multiply. It can pair registers for 16bit or 32bit operations, even though it's an 8bit processor. A computer is doing more than crunching 16bit/32bit/etc operations. Application logic need to be responsive. There's string manipulation is a thing. Etc. Like I said, 2x, 3x, 4x the clock speed of the original 8bit processors models of 1980, scales that performance nicely. Does it match it exactly? Of course not. But it gets close enough. An 8mhz 8bit cpu is going to get much closer to a 16bit 8mhz processor than a 1.89mhz or slower 8bit processor. And that's pretty much my whole point. At some point there is diminishing returns on the 8bit design. 16bit chips become cheaper to make, they lend themselves to better bus designs (8bit processors, outside the z80, are bus hogs), etc. Like I don't discount that at all! That's pretty obvious. I'm just saying, from 1979 to 1988.. was all stagnation for 8bit processor design and clock speeds, when it didn't need to be. There was definitely a space for that growth, especially given 8088/8086 and even later 286 dominating the computer market. The 68k is a modern design/ISA, the 8086/286 is not. Given the x86 dominated, I'd say while a modern ISA/design is nice, apparently it wasn't a requirement. I think RISC came along and proved that point haha. You don't need a fancy/complex/everything-needs-to-be-a-single-instruction-CISC ISA. You just need performance. The heart of RISC gets back to the roots of the 65x/6809 design philosophy; small/simple ISA but fast instruction times. Let's be honest here. While Treasure deserves the pedestal they're often put on for its game designs (IMO), they are hardly a valid/competent example of performance. Every single one of their games on the Genesis, and Saturn, have quite a bit of slow down. They're supposedly ex-Konami employees. Maybe coincidental, but Konami has the exact same issue; great game design, poorly written code. Gradius 3 on the SNES actually slows down to 3 frames! 3! Even at slowrom speed of the game (2.68mhz), that's about absurd. But as to Treasure's comment; the 68k only has mul/div/add/shift/etc. What real time calculations are they talking about? Are they literally creating trig/sqrt functions through iterations??!! God, I hope not haha. If you can use LUTs, and get the same precision but faster speed, why wouldn't you use LUTs? For the sake of not wanting to use LUTs? There's no context in any of these developer interviews, and half the time they're so generalized in their responses that it doesn't mean anything (something that I've noted in a lot of Japanese interviews) - or just blatantly incorrect. I've seen 68k developers make claims to convenience/ease of the ISA - translating to speed. And while that is indeed factual, it does NOT in anyway negate capable performance of other platforms/processors. If you're reliant on the 68k to make up for lack of skill as a professional programmer, that speaks more to your level of competence and skill than actually obtainable performance of another processor/platform. And this is exactly how I see that comment from Treasure. I remember reading the comments on the SuperGrafx and was like wow - some of them are just really incorrect/baseless/ignorant. And the amount of people that parrot those quoted comments too; "the cpu just isn't enough to handle the extra hardware".. you know, because it's 8bit haha. Developers are quoted saying it - it must be true. Never mind the fact that the PCE is actually using more CPU resource to fake overlapping BG layers (thousands of cycles) than you ever would by simply setting a scroll register (30 cycles) on the SGX 2nd BG layer. It's a little ridiculous haha.
|
|
|
Post by spenoza on Mar 12, 2021 20:37:38 GMT
I largely agree with you. I feel like you're arguing less with the points I made and more with the arguments other folks make, but with my quotes as convenient foils, and that's OK. That said, these points have been made repeatedly around here and I think (I hope) we all understand the basics of this pretty well by now. I would hope none of us would be so foolish as to say, "It's better because it's 16-bit!" Is the PC Engine an older design than the SNES and Genesis? Yes, it is. Does that mean it's not at all competitive? No! Sure, the newer consoles have a few advantages, but who cares? The PC Engine has a few tricks of its own, and all those systems make great games when in the hands of good craftspeople.
|
|
|
Post by Black_Tiger on Mar 18, 2021 19:05:11 GMT
What's most silly about the PCE myths is that most people talk theoretically about what they insust the PCE can't do and often back it up with their own misinterpretation of specs taken out of contest...
But we actually have real games already. Even if say, the PCE hardware wasn't pushed as much as it could be, we already have lots of software that contradicts all of these myths and what stands out as much as anything among the PCE's strengths is how much it can handle in 2D games.
What armchair console war spec sheet warriors insist is attributed to cpu power is where so many PCE games outperform the games MD and SNES fans champion as the most impressive in those libraries.
Thunder Force IV doesn't toss around nearly as much when it slows down as many PCE shooters do without slowdown.
But it takes a 13.1Mhz overclock to eliminate it. Almost doubling the cpu speed. And all other sound still mutes whenever a sampled sound is played. Mega-CD games reported to use the faster cpu also don't outperform PCE games.
Similarly, SA-1 hacks for SNES games triple the cpu power just to get closer to the performance of many PCE games.
Yet it's the "bit" level of the PCE cpu that so many people use to "prove" how "weak" the console is.
|
|
|
Post by spenoza on Mar 18, 2021 19:20:45 GMT
Most comparisons of CPU horsepower are actually comparisons of programmer/developer ability. Compare Gradius III to Space Megaforce on SNES. One slows way down and the other breezes along. What those two games are doing is not fundamentally different on a game design level, but they are clearly quite different in terms of execution. Even if Treasure's Hideyuki Suganami was using the Genesis CPU for real-time calculations in Alien Soldier and Gunstar Heroes, if you could do the same thing with LUT then that extra capability didn't actually do a lot for you except save a tiny bit of RAM. CPU speed and architecture and feature differences are important, but only so important.
|
|
|
Post by elmer on Mar 18, 2021 20:07:42 GMT
I'm just saying, from 1979 to 1988.. was all stagnation for 8bit processor design and clock speeds, when it didn't need to be. While I don't like WDC's poor design choices in the 65C816, I recently found out that Zilog released a really interesting, and very modern (for the times), software-compatible update to the Z80 in 1987 ... the Z280. The heart of RISC gets back to the roots of the 65x/6809 design philosophy; small/simple ISA but fast instruction times. The NEC V810 in the PC-FX as a beautiful example of that, and is IMHO an absolute joy to program in assembly language, nicer than the MIPS in the PlayStation, and infinitely better than the hideous SH-2 in the Saturn. Since the announcement of the end of further work on the MIPS architecture this month, that leaves the V850, a descendant of the V810, as the last survivor of those old RISC chips.
|
|
|
Post by turboxray on Mar 18, 2021 20:24:51 GMT
Not trying an complicate things, but NES games often employed stuff like 30hz split collision detections, etc. In other words, not all collision detections for all objects to objects are performed in a single frame. I've heard some SNES games employed this tactic to help out. It's not really a hinderance to the game play, since if the objects tend to move around at speeds of 1px per frame or less it won't result in a missed collision (ala Megaman and other of NES platformers that get away with this). Though if you do have max rapid fire in MegaMan with the pee shooter, not all hits will register/do damage. I do remember being impressed by Space Megaforce and often joked that the equivalent "special effect" on the SNES.. was not slowing down (since it has all that graphical "power" haha).
And I do think there's two types of ways to categorize a processor's power; out of the box (without much effort), and optimized (requires specific effort). It's fair to say the 68k has a higher level for "out of the box" performance. The SNES definitely sits on the far end of the spectrum of optimized. I think the PCE sits quite a bit closer to the 68k in this regard, but still requires more effort/skill for optimized code than the same on the 68k. Keep in mind, if the SNES cpu ran unhindered (no wait states on ram) at even 5.37mhz - it would easily rival both the 68k and 6280. And not to forget that the MD runs a full 8500 cycles more per frame than the PCE. That's 7% faster. That's a decent amount; that's the cost of playing an extra streaming sample, or another full sprite frame update in like SF2 or a beat'em up. In other words, it's impressive on the PCE's side since it has more overhead than the MD (like audio, samples, slower vram updating), yet still keeps up.
And in all of this, in relation to slowdown, is what does it mean to have slowdown? If you miss a frame state by 1% cpu resource or by 95%, it's results in the same effect; typically 50% slowdown or 30hz. It would be interesting to analyze games that slowdown and see how far they are missing that mark. Since the SNES is the crown king of slowdown, it would be interesting to see how much more clock speed would have helped that output (with the code as is, no optimizations).
As to developers ability, and bad examples, I think the PCE has a couple of clunkers. While I love Bloody Wolf as a game, the game code is absolutely horrible haha. I swear there's slowdown to 20hz (maybe even 15hz.. I need to check)! That makes no sense.
Just something else to point out for context; large enemies don't take more processor power for collision and AI. The bounding box for collision, and the cpu resource to check it, is the same whether is 4x4 or 200x200. Meta-sprites on the other hand can technically cause some overhead. But Bloody Wolf doesn't use meta-sprites like that. And the PCE doesn't have smaller 8x8 sprite cells, so you rarely see a meta-sprite with more than two cells.. maybe three on the PCE. Which is why the "only has 64" sprite table is not an issue on the PCE. It's on the Genesis, because if you start using meta-sprites with 8x8 cells to optimized for vram usage (and sprite scanline usage), you NEED those 80 sprite entries. If anything, 80 isn't enough if you really want to optimize. This is exactly why the snes has 128 entries; because if you want to optimize for vram usage and sprite scanline limit - you need to pick the sprite pair 8x8 and 16x16 mode (it's fixed for the screen). But that means really eating through that 128 sprite table, and that means a LOT more meta-sprites to decode into real hardware cells. For instance, in Turtles in Time for the SNES, Leonardo's meta-sprite is like 18 real sprite cells. I converted that on the PCE, and it's 6 real sprite cells (because I was optimizing for scanline limit). So the PCE version has less of an impact on its 64 sprite table than the snes version on its 128 sprite table. And because of the lower clock speed, the SNES takes a bigger hit for meta-sprite decoding than the MD and PCE. And it has a convoluted X/Y MSB offset that slows things down. So in some ways, what was said about the SGX cpu not being enough for the capable hardware, is actually much more true in reality on the SNES. SNES games that tend to have less slowdown, as the games that use the paired 16x16 / 32x32 sprite mode. Final Fight 2 and 3 use this mode, as do most beat'em ups. If you're ever curious about this stuff on the snes, the bsnes-plus emulator and debugger makes this really nice to look at (and has a visual cue for this as well). I looked at like 20 snes beat'em ups a couple of months ago. I was curious to see if they were optimized for vram and sprite scanline drop out like the Streets of Rage series (2 and 3). I would say a good 90% of them were not (and used paried 16x16/32x32). And they ones that were optimized with 8x8 cells, were for vram usages not sprite drop out (they actually caused more potential sprite drop out).
|
|
|
Post by turboxray on Mar 18, 2021 20:34:34 GMT
I'm just saying, from 1979 to 1988.. was all stagnation for 8bit processor design and clock speeds, when it didn't need to be. While I don't like WDC's poor design choices in the 65C816, I recently found out that Zilog released a really interesting, and very modern (for the times), software-compatible update to the Z80 in 1987 ... the Z280. I actually have an MCU variant of that! I met this retired engineer from Microchip a few years ago, and he was making SBC for CE/EE students. He had it booting to LISP haha. Instead of BASIC, but he had written some macros for I/O loading and stuff. He has CPM running on it too (you loaded it through the LISP REPL interface). He wanted me to do some software things for it, but I kept telling him that I was swamped with school at the time. I did want to do a multitasking conversion of CPM on it though (using his provided source). But I never got to finish that. But yeah, that was cool to see the upgraded z80. I went to the Maker's Fair with him in San Diego to help drum up interest in it. Sadly, I fell out of touch with him. I need to buy a PCFX again. Did you ever clock the transfer rate to vram? I mean that v810 should be able to saturate it unlike the 6280.
|
|
|
Post by spenoza on Mar 18, 2021 21:05:14 GMT
So in some ways, what was said about the SGX cpu not being enough for the capable hardware, is actually much more true in reality on the SNES.
I think this is also something folks overlook. The interplay between the CPU and the audio and graphics subsystems are a big deal. How your graphics chips handle sprite and tile data has a huge impact on what you're doing with your CPU, as you demonstrated pretty well. Furthermore, even when you have a co-processor to pick up some of the slack, like the SNES and Genesis have for audio, you still have to contend with memory access. This stuff is never simple, and the only way to truly evaluate a system is to look at its output.
|
|
|
Post by elmer on Mar 18, 2021 21:51:11 GMT
I actually have an MCU variant of that! Cool, I'm jealous! Plasmo has built a few Z280 boards now, but if I'm reading things right, I think that he found that it wasn't much faster than a similarly-clocked Z80 unless you run it with a 16-bit bus. Even at 12MHz, you're only getting 4 memory accesses per microsecond, so you really need the 16-bit bus (with its burst-mode) in order to keep the pipeline flowing at its maximum speed. I need to buy a PCFX again. Unless dshadoff comes up with a way to load test programs from a PC, I'd recommend that you wait until you can find a PC-FXGA DOSV card instead of a standard PC-FX. Did you ever clock the transfer rate to vram? I mean that v810 should be able to saturate it unlike the 6280. Nope, sorry! The last thing that I did on the PC-FX was to help out with the Team Innocent translation. Somehow I never got really back to continuing my work on liberis once I had gotten the V810 GCC compiler finished (which the VirtualBoy community has been using). It is something that I need to go back to, especially since I had gotten the V810 GCC compiler to build simple C++ code.
|
|
|
Post by turboxray on Mar 18, 2021 22:25:55 GMT
So in some ways, what was said about the SGX cpu not being enough for the capable hardware, is actually much more true in reality on the SNES.
I think this is also something folks overlook. The interplay between the CPU and the audio and graphics subsystems are a big deal. How your graphics chips handle sprite and tile data has a huge impact on what you're doing with your CPU, as you demonstrated pretty well. Furthermore, even when you have a co-processor to pick up some of the slack, like the SNES and Genesis have for audio, you still have to contend with memory access. This stuff is never simple, and the only way to truly evaluate a system is to look at its output.
It is interesting, right? How hardware both provides advantages and disadvantages directly related to CPU resource or power. To the point, where without that 512k or 1megabyte ram, the Atari ST in many ways is less capable than the SMS for games (at least, at 60hz.. but even some games at 30hz too). When I was doing the Megaman hacked rom to run on the PCE, and I was replacing the metasprite routine in the game to use 16x16 and 32x32 native PCE sprite, it actually speed up the original NES code quite a bit. If I remember correctly, the Megaman character is made up of 9 8x8 sprites. So that means it's looping through and doing 9 transformations on real hardware 8x8 sprites: X offset, Y offset, palette, tile number. And then it's check each one of those for clipping against the side of the screen. That's just for Megaman.. it does that for enemies too. The NES would definitely be faster if it had larger sprites and not have to deal with all of that. Enough to eliminate some slow down? Potentially. It got me thinking, what kind of game could you make on the PCE if you kept the CPU in slow mode (1.79mhz)? That would be an interesting challenge haha.
|
|
|
Post by turboxray on Mar 18, 2021 22:31:27 GMT
I actually have an MCU variant of that! Cool, I'm jealous! Plasmo has built a few Z280 boards now, but if I'm reading things right, I think that he found that it wasn't much faster than a similarly-clocked Z80 unless you run it with a 16-bit bus. Even at 12MHz, you're only getting 4 memory accesses per microsecond, so you really need the 16-bit bus (with its burst-mode) in order to keep the pipeline flowing at its maximum speed. From what was explained to me, there's a compatibility mode and a native mode. In native mode, with the pipeline it would execute the instructions much faster. I don't remember all the details, but I think this was clocked at 50mhz max, an with the pipeline it was equivalent to 100mhz (or something like that). Native mode also had extended registers for 24bit operations, but the instructions weren't extended to that. So you had to swap/load to get a byte into the highest reg or something to that effect. I.e. you got some instruction bloat to use the 24bit operations because of how you load the registers. Edit: Okay, I was wrong. This wasn't the z280. This was the ez80. So not quite a z280. It has z80 and z180 modes.
|
|