I'm guessing you've watched a lot of Star Trek? I figure this is just the human mind's wonderful proclivity towards learning patterns.
I was actually thinking about a similar but less subtle example: I use a transit system that involves tagging on and off with an RFID card (Clipper on Caltrain). When you tag your card, it makes one of three different noises depending on what happen—and I couldn't possibly tell you which one corresponds to what. But if I'm actually using it and I make a mistake, I notice immediately because I'm so acclimated to the pattern it normally makes!
If I hadn't realized this, the UI for the system would have looked absolutely terrible. The different beeps are the auditory equivalent of "mystery meat navigation" but even worse because they don't carry any semantic meaning at all. But because I always use the system in the same way and the noises were consistent, it actually works really well even if I never consciously learned what noise corresponds to what.
(The fact that there are wrong ways to use the system is bad design, but it's a function of how the whole train system is set up, not the fault of Clipper's designers.)
I wouldn't be surprised if there was some psychology behind the choice of sounds when you tag the transit card. Just like visual "affordances" in UI design, there may be certain audio characteristics that people will associate with "good" vs. "error," for example. I recall one study showing that people almost universally feel certain shapes and word sounds are "friendlier" than others (albeit by very subtle margins). And I believe in many languages people interpret pitch inflections to tell when a sentence is continuing vs. finished - think about reading someone a serial number out loud, and how if you were to pronounce the last number the same as all the others it will sound like you left the sentence dangling, with more digits still to come.
I'd be really interested to read anything more concrete about audio affordances though, if anyone knows of links to further research, etc!
I can list at least three linked ones: with two beeps, if the second's pitch is higher, means "on" or "ok" or "up", whereas if the second's pitch is lower, the other way around.
A fantastic book on how these are all related is "metaphors we live by". I'd expand on this but I'm typing on mobile and on a hurry. But seriously: read the book. It takes a long afternoon. It's great, short, illuminating, and not excessively dense or padded.
Yeah, there's quite a lot of this but I can't think of any good references offhand. When doing audio logos I've always looked to catchphrases and famous movie quotes and tried to emulate the underlying intonation/cadences. I'm not aware of this knowledge being systematized anywhere though.
* Two beeps (different) - successfully paid fare, and you're about to run out of money on your card
* Two beeps (equal) - tagged off (e.g. on Caltrain or San Francisco Bay Ferry)
* Three beeps - read error or insufficient value
There might be others. Indeed, they've introduced a quieter "here is a Clipper reader" beep to help visually impaired people locate them: https://vimeo.com/183916243
You can train your brain to remember pitches. Try humming your favorite song, then play it and see if you picked the right key. I did this as a game with a friend one afternoon with maybe 80% success rate across 50 songs.
As I understand, no. There's some suggestion that small children, usually under the age of 10, can acquire perfect pitch, but no adult has ever been documented acquiring it.
Though, I can now instantly tell if I'm watching a PAL or NTSC version of Star Trek TNG based on the the first few bars of the opening song.
Which is weird, because I don't even have pitch perfect hearing and I'm not even that musically inclined.