CyRUS64- I was using the wrong term when I said "Dynamic" Recompilation, which you so nicely pointed out as meaning recompiled on the fly (hence the Dynamic I suppose, instead of static), but that doesn't mean the rest of what I said was invalid. I havn't messed with any 64 emulators since the first came out, but I dont think I remember hearing about any that actually recompile the entire rom before executing so that it would not have to waste cycles doing it while executing. There are speed increases to be had through this method because of the differences in hardware (for instance, in your 64 emulators your running code that has been optimised to run on 64 hardware so it was not written to take advantage of a prosessor that can do out-of-order execution or perhaps they waste cycles doing bit manipulation that isn't necessary on the new cpu etc.)
And to AlphaWolf- As I mentioned before, barring things like differences in register sizes or entire missing functions (like the inability to do integer addition or something) most processors have the same basic functions. And even if it does take 2-8 instructions to perform the equivalent some of them, that doesn't mean it will be slower/faster. Thats like saying RISC cpu's are always slower than CISC because they require more instructions to get something done (BTW, most will say RISK are faster....). But your forgetting that it may also be the case that something that takes 2-8 instructions on the emulated cpu can be done in 1 on the intel cpu. Besides which, any wildely different low level instructions are probably only going to deal with memory subsystems and calls to the graphic rendering devices, all of which are going to be replaced by different calls that do not perform the exact same function, as the hardware is different.
I didn't say it was easy, or that it would be pretty, all I said was that it is possible. I had to do something similar years ago in school. I was tasked to write a virtual machine that would execute Sparc assembly on an x86 system. I did that, but I also wrote a line by line interpreter that converted the assembly so that it could be compiled and run natively (which re-used about 90% of the code from the vm). Can you guess which one worked better? Even ignoring the time spent to interpret the code in the VM, the compiled version was still faster because the native x86 compiler was able to optimize the code (instead of just running it as it came like the vm) for that platform, which had different strengths and weaknesses.