Hello, Guest the thread was called3.8k times and contains 53 replays

last post from mega65 at the

MAP instruction

  • Nice Job Gurce !!!

    Thanks for the work, this is very useful.

  • Didn't get notifications on those last two posts.


    Finally got around to finding my nexys board and plugging it into my laptop just now. I don't have a keyboard or vga-monitor attached, so just have putty serial terminal output to look at.


    So one thing I thought I'd check on was if there was a difference between Xemu and the hardware's display of the MAPL and MAPH registers when viewed through the serial debug port.


    Yes, it looks like it:


    For XEMU, when in C65-mode:


    PC A X Y Z B SP MAPL MAPH LAST-OP P P-FLAGS RGP uS IO
    E1A7 00 01 18 08 00 01F2 0300 0300 F022 00 --E---Z-



    For Nexys board, when in C65-mode:


    PC A X Y Z B SP MAPL MAPH LAST-OP P P-FLAGS RGP uS IO

    E1A8 00 00 0F 08 00 01F2 B300 E300 F0FA 22 ..E...Z.



    So I can confirm, Xemu is missing the upper 4-bits of MAPL and MAPH.


    Ok, I'll try refresh my memory on this thread once more (brain has gone cold on it again), and try get re-acquainted with the nexys board better too.

  • To be honest, this MAP concept seems really bizarre and inflexible to me:

    - we have just two offset possible; one for lower 32KB, one for higher 32KB - so we have to forget about creating complicated memory maps, they are simply not possible
    - to access memory above 1MB you have to call MAP twice
    - each time you use MAP, you lose all the CPU registers; their content have to be stored/restored
    - each time you want to map one memory area somewhere, you lose the previous mapping (this already led to a problem, see here: https://github.com/MEGA65/open-roms/issues/43)

    Why can't we have some more flexible and easier to use solution? My proposal: create a next-generation MAP mnemonics, let's say it could look like this:
    - $5C $5C - opcode, dual MAP - probably makes no sense on the original C65
    - byte with 4 higher bits set to 1 (to enable mapping) or 0 (to disable mapping); remaining values reserved for future extensions
    - ... and 4 lower bits to select 8KB block of 64KB address space
    - if enabling the mapping - 16 bits to select offset from $0000000 (you could select offset each $1000 bytes)
    Classic MAP called = new MAP settings cleared; new MAP called = classic MAP settings cleared. Don't touch registers, flags, etc.

    Something like this would preserve the spirit of the original mapper, while being more flexible (for example you could independently change mapping of distinct 8KB blocks in the interrupt routine and the main code) and less annoying (only 5 bytes to map a single region, only 3 to unmap it, no need to preserve any registers).

  • Yeah, it's kinda strange, but this is how C65 would work, so not so much choice here. But indeed, there can be other "extra" modes of MEGA65 as its own, as you invented one as well :) Personally I prefer 4*16K, and map any of the 16K of CPU visible address space to any 16K page of the memory. Especially since it's so common among other computers, even like C64DTV, but eg on many Z80 based machines as well, etc. But maybe some prefer "finer" resolution than 16K.


    Btw, one problem with double MAP: the CPU cannot know it will double MAP ... So it will do the C65-stuff on the first MAP, probably "paging out" itself (the code) before it hits the second. It's like the "NEG NEG" prefix of Mega65, the first NEG does negate the accumulator, so the second, so it's the original accumulator content now -> no problem, and then it's interpreted as a prefix only for the next instruction. MAP is dangerous to be used as a prefix (for the second MAP ...) as it has side effects cannot be "undone" in some cases (depending on A/X/Y/Z content) and you have a crash.

  • Yes, the MAP instruction is both a key element of the C65 architecture, and a general purpose abomination in many ways ;) To be fair to the creators, it was a very simple way to add the ability to access upto 1MB of memory. For running code in other areas of memory, it's actually not that bad.


    The limitation of the two 32KB halves helps to make it possible to do the address translation in real-time each CPU cycle. Making it too much more flexible could cause timing problems, especially for the MEGA65 at 40MHz.


    I did start implementing a "far jump" mechanism to avoid the need to use MAP for most purposes, and when I get the chance will likely finish it off. I have already extended MAP once, by adding the mega-byte select mode of it. LGB is right, that having MAP MAP would cause problems. But we could have NEG NEG MAP or something like that, for some sort of enhanced MAP. It would also be possible to make MAP load the registers with whatever was in the mapping before the MAP opcode was called, so that would be easier to clean up after MAPing to do something. We can think about this once we get the machine available, as it won't hurt backwards compatibility.


    LG

    Paul.

  • Yeah, maybe we "abuse" poor MAP :) I mean for the normal usage, it's more the VIC-III ROM mapping stuffs what was uses, IMHO, MAP was "more like" an extra, at least not meant as a very generic way to do fancy things with it. As far as I can see ... But exactly this is the problem as well, in Commodore machines in general, that all of memory paging/banking are kinda odd and special, there is no clean and generic way to use even Mbytes of memory easily and standardized way, other 8 bit micros was capable of.

  • The granularity of the C65 mapping is 8KB, and for my purposes it's quite convenient.


    16 KB granularity would be problematic for OS-friendly software on the C64. Mapping the lowest 16KB ($0000-$3FFF) would hide all the Kernal variables, including the ones keyboard scanner uses 50 times per second; not fun. Mapping the highest 16 KB ($C000-$FFFF) would cover the interrupt vectors/code and I/O area; even less fun. This would leave two 16KB areas for use ($4000-$7FFF and $8000-$BFFF), but even these are not very safe, as (depending whether cartridge is present or not), either $7E00-$7FFF or $9E00-$9FFF are the standard locations for the RS-232 buffers.

    That's why for now I use $4000-$5FFF range in the OpenROMs ($2000-$3FFF is probably equally safe, I simply don't know which is better) and I only move to alternative ROM routines which are unlikely to be called during some interrupt-driven sfx/gfx effects active (like VIC-II setup :) ).


    'JMP far' has (besides potentially covering RS-232 buffers) another disadvantage: imagine you want to print something on the screen from the 'far' code. You can't just call the CHROUT, as (depending what the application did) the screen memory can be hidden due to mapping. With MAP/EOM approach I have created a set of proxy functions in the standard ROM area, which disable the mapping, do what has to be done (for example call CHROUT), enable mapping again, and RTS. Kinda slow, but for listing the BASIC program code, printing SYNTAX ERROR, or displaying startup banner, it should be fast enough - and this frees over 1.5 KB from the standard BASIC ROM area (and at some point we'll want to add support for BASIC 10 tokens).


    Ability to 'clean up' and restore previous mapping would indeed be very helpful, we could then use mapping even from the interrupt handlers; I think the most sane would be to push/pull the configuration to/from the stack.


    ----


    Of course, the above is just my opinion; I'm not familiar with FPGA technology and internal CPU design at all (so I don't know how problematic it would be), and I don't know what game developers would think about it.

  • Yeah, maybe 8K is better. I just told 16K, since the best to have total flexibility to allow ANY of the mega65 address space to map ANY to the "slots" let it be 8K or 16K sized "slots", so for 8K, it's already quite significant amount of information to be passed to the mapper somehow if more memory ranges to be remapped. Well, it's about the balance maybe: the C65 way is not flexible too much but does need too much things to pass, and the opposite, hmm. However I should admit when I talk about my "dream mapper" I usually don't think about its usage for "commodore-like" stuffs, but for my own programs, not using KERNAL, etc anything ;) Eg like my Z80 emulation plans and other very crazy ideas, which indeed maybe in the minority as needed stuffs ;)

  • Yes. Don't get me wrong, 16K granularity would be enough for me if the C64 was designed for such banking from the start - but, unfortunately, it isn't (I forgot to mention the trick to takeover the NMI by CBM80 magic pattern few bytes after $8000... yet another nasty risk).

    To summarize:

    - if I was able to alter mapping without trashing 5 registers (.A, .X, .Y, .Z and Status), it would have saved me some code and probably gave a nice performance boost (on 6502 push/pull instructions are expensive, I'm not sure how Mega65 handles them)
    - ability to easily store/restore the current mapping would make it much easier to synchronize Kernal wanting to do some mapping, BASIC wanting to do some mapping, and applications wanting to do even more mapping
    - if additionally it would be possible to increase the amount of offsets available (even to just 4, one per 16 KB of memory space, but with keeping the 8 KB mapped/unmapped granularity) - this would be a pure luxury for me :)

    Of course I agree that at the moment there are much more important things to do, and I'm definitely against introducing any official extensions without extensive analysis :)

  • I can't remember how I structured the far call stuff, but I think it also used a low-memory 8KB or 16KB bank, leaving the KERNAL / BASIC maps untouched -- so it might be okay.


    The other thing that strikes me, is that maybe we should have an option to choose where in memory the KERNAL, BASIC and C65 ROM blocks come from, when banked in.


    As for cost of register save/load, I think it is 5 cycles to push everything, and maybe 10 to pop everything, or it might also be 5 for the stack pops, I can't precisely remember. Either way, it is much better than 6502. 4502 lets you:


    PHA

    PHX

    PHY

    PHZ

    PHP


    ...


    PLP

    PLZ

    PLY

    PLX

    PLA


    instead of the


    PHP

    PHA

    TXA

    PHA

    TYA

    PHA


    ...


    PLA

    TAY

    PLA

    TAX

    PLA

    PLP


    So less instructions for thet 4502, even though it has one more register beingg saved.


    LG

    Paul

  • It's already using PHX, PHY, and family. Well, hopefully we'll have the dev kits this year to test.


    To be honest, I fear a bit about turbo tape loading, as it temporarily changes mapping to store byte in RAM, and we need to finish everything before the next short pulse arrives - it's possible we will have to force higher CPU clock frequency during tape load, or use some other technique to store decoded bytes. Also, if the instruction timing is much different, IEC code will need adaptations, as it currently uses delay loops, not CIA timer. JiffyDOS will probably be the hardest to fix, the protocol tolerance is really narrow.