Just as a very, very rough overview, let's look at General MIDI.
There are 128 sound banks for General MIDI, each one corresponding to a separate instrument. When you read a MIDI file, it's really nothing more than a stream of numerical values, corresponding to the sound bank, volume, length, and whatever effects were thrown in (chorus, reverb, etc). Thus, a MIDI file is a very, very small one, since it relies on the actual sound banks to produce the sounds.
A MP3 file is going to contain the actual recorded sounds, and doesn't use sound banks. This is why it's going to be significantly larger.
Taking a MIDI file, and actually playing it back as "real" audio, and then recording the audio as "real" audio into MP3 can be done with any number of programs.
Taking a MP3 file, and trying to convert it into a MIDI file won't work, unless you actually take the time to sequence it yourself. Even then, you're only going to get the instrumental sounds, and certainly not any vocal sounds (unless you happen to be using the Oooohs and Aaaaahs instrumental settings). Then there's the matter of accuracy between one synthesizer chip's sound bank versus another. For example, try comparing a Korg AI2 synth chip's excellent sounds versus some of the el-cheapo ones... I swear, some of the el-cheapo ones sound just as bad as standard frequency modulation (FM) attempts.