Few listeners can distinguish between "average" and "best" MP3 samples

Two weeks ago, we challenged our readers to see if they could discern the difference between MP3 recordings at different sampling data rates. Nearly 700 completed our study. So does a very high data rate result in a noticeable difference? Here our are basic results:

i-440bf04fb064fb4c8ff7981fde41b3a7-mp3-1.gif

Respondents rated two recordings, one by rock guitarist Carlos Santana, and another by orchestral composer Aaron Copland. Each recording was encoded into an MP3 file at three different data rates: 64, 128, and 256 kbps. For both recordings, there was a significant difference between ratings of the 64 kbps data rate and the 128 kbps data rate, but no difference between ratings of the 128 and 256 kbps data rate. It's looking like the 256 kbps MP3s offer no advantage over the much smaller 128 kbps MP3s.

If you're not familiar with data rates, here's a brief primer. MP3's are "lossy" methods of digitally encoding audio files. This means that some of the information in the original digital recording is removed in order to make the file smaller. A 64 kbps MP3 contains half as much information (and consumes half the disk space) of a 128 kbps MP3. A 256 kbps MP3 is twice as big again. The MP3 algorithm works by attempting to remove parts of the sound that you're less likely to detect. But obviously this process can't work forever. A balance must be struck between a compact file size and audio that sounds good. The real question we're trying to answer is "how big is big enough?"

Recently Amazon.com launched a digital music service which boasted 256 kbps MP3s, instead of the 128 kbps files more common at other sites. Are they simply wasting our disk space? So far these results suggest they are. But there are a few other possibilities.

First of all, it might not be true that everyone can't tell the difference between 128 kbps and 256 kpbs data rates. In fact, 33 of our respondents were able to successfully rank both recordings, rating the higher data rates as better quality. Still, that's less than five percent of our listeners. Do they really hear something the rest of us can't?

One possibility is that the people who can't hear the difference don't have sensitive enough speakers and headphones. We asked respondents to listen to the Copland recordings using their computer speakers, and to the Santana using headphones. Some listeners didn't have headphones, and just listened to the Santana with their speakers. So if the people who used headphones were better able to discern the differences in the recordings, then we might be able to attribute the difference to equipment. However, we found no such difference. The correlation between ability to discern the difference between the Santana recordings and headphone use was -.01, not significantly different from zero.

There was, however, a small, significant correlation (.09) between listeners who had purchased their own (presumably better) external speakers and ability to discern the difference between the Copland recordings.

We also asked listeners how much musical training they had. Though many respondents reported over 20 years experience, we found no significant correlation between music training and ability to discern the higher data rates.

We asked listeners if background noise or concern about making too much noise had affected their ratings. Again, there was no correlation between background noise and the ability to discern higher data rates. Self-reported hearing problems had no relationship to the results either.

There was, however, one factor which did explain some of the individual differences in the results. We asked listeners the following question: "Are you an audiophile? Please rate your level of interest in high-quality audio." They gave their responses on a scale 1 (don't care about audio quality at all) to 9 (extreme audiophile). Here are the results:

i-ba6d6cb31961178ab87aa54f255cb5d0-mp3-2.gif

Those rating themselves as more extreme audiophiles were more likely to be able to detect the difference between the different data rates of the Santana MP3s. This was not attributable to their having better headphones; they simply appear to have better knowledge or hearing ability than those who aren't audiophiles. Even so, the correlation between audiophilia and ability to detect the better Santana recordings, though significant, is not very strong: just 0.17.

Part of this may be due to the particular excerpts I chose for the study. As many commenters pointed out two weeks ago in the survey thread, the easiest way to discern artifacts due to MP3 encoding is in the cymbals, and neither the Copland nor the Santana had cymbals. I suspect the acoustic guitar in the Santana has some of the same properties as cymbals, which is what made encoding differences easier to detect there than in the Copland.

In case you didn't get a chance to participate in the survey, here are all the samples once again. You can guess which is which in the comments.

Copland 1:

Copland 2:

Copland 3:

Santana 1:

Santana 2:

Santana 3:

More like this

There's a lot of debate online about whether people can really tell the difference between the various audio formats -- AAC, MP3, you name it. Does it really make a difference? Recently I saw a blog post suggesting that the methodology for many so-called studies on the phenomenon was flawed. If you…
We often think of music as expressing emotions, and research has backed this notion up. But typically the research has focused on melodic instruments: sweet, sorrowful violins; bright, happy guitars; melancholy, wailing oboes. So what about percussion instruments: drums, cymbals, tympani—can they…
Listen to this short recording: It's a sequence that repeats every sixth beat. But when we're listening to music, we usually prefer to divide rhythm into two- or three-beat patterns (duple or triple rhythm). In this case, the sequence doesn't make it obvious which pattern is correct. A…
Music has been associated with drug use for decades -- from the flower children smoking weed at Woodstock to jazz great Charlie Parker getting hooked on heroin, it seems that every type of music has a drug that we associate with it. Last month we discussed a study where college students were asked…

In the mid-90s, an engineer with Swedish Radio (Chester Grewin, IIRC -- I can't put my fingers on the preprint of the paper right now) presented a paper on this topic at an AES convention that an consisted of an exhaustive listening test of all the low-bitrate codecs in use at the time, focusing mostly on MPEG audio layer 2 and 3. The takeaway was that above 256 KB/s, layer 3 doesn't gain anything by throwing additional bandwidth at the problem. It was optimized for lower bitrates and just doesn't scale out well at the other end). OTOH, Layer-2 (known in Europe as "MUSICAM") does benefit from additional bandwidth. They claim that at 384 kb/s, layer 2 is perceptually identical to the original linear digital data (but I've never seen a codec that didn't introduce artifacts on certain program types).

Another factor: compression and distortion of the original source of the mp3. This can make a HUGE difference in what remnants the mp3 encoding reveals.

If the original source is already very compressed, leading to a very dense signal, then it can overload the mp3 encoding process and distort the result. If the wav original is already overloaded and slightly distorted, then that distortion gets magnified by lower encoding rates.

One example is the Soundtrack to the first Pirates of the Caribbean movie. The signal overloads the speakers very quickly at high volume, and my mp3 encodings (Apple's iTunes built-in encoder) at 128 and even 192 still distorted the result to make the music unbearable. Only at the full 320 was the distortion effect reduced enough to be tolerable. (The sequel soundtracks don't suffer this problem.). Another highly compressed, extremely dense CD is Rush's Vapor Trails (which has been criticized heavily for it).

There may also be cases where high dynamic contrast (which compression eliminates, at the expense often of the artists original intent) can leave recognizable artifacts at lower mp3 encoding rates. I did a 128 of a digital recording (MTT, San Francisco Symphony) Stravinsky's Rite of Spring that had moments that were horrid until I redid it at 256.

The Rite (pretty much any recording since Bernstein's 1958) is probably a really good case where the higher encoding rates can make a noticeable difference. The density and dynamic contrasts are more extreme and can reveal more flaws in the encoding process.

By Joe Shelby (not verified) on 30 Nov 2007 #permalink

Great work, Dave. Very interesting - and I'm not saying that because it confirms all of my prejudices on this topic.

Or maybe I am. :)

My guess is:(best-->average)
Copland 3-1-2
Santana 2-3-1
Inner earphones were used. Music training is less than 5 years. Audiophile rating: 5.

Remember too... that the type of speakers and monitoring environment one is using to listen to these files will greatly change what people are able to hear.

I DJ, and mp3 files encoded at 128kbps sound great on my speakers at home... but sound terrible on a loud PA. You can really hear the aliasing around the high end of the frequency spectrum.

Beatport, one of the largest DJ digital music providers, usually encodes their files at 320kbps.

So, determining what application you are using your mp3's for seems important. 128kbps might be fine for home stereo applications, but for someone who enjoys audiophile quality playback... it may fall short.

I tried the three Santanas and saw a problem. You are asking me to listen to, and remember, and hold that sample in memory while I listen to another for comparison.

I don't have a good memory for the fine nuances of a sample of music. But if I had two speakers I could switch between, each with a different source, I could tell you which sounds better.

It seems like whether or not bitrate makes a difference is going to vary depending on what is being compressed-- meaning not just that bitrate will have different impact between different genres, but that there will be variance between any two songs, and even variance between different parts of a song! This is, after all, the entire reason why VBR works.

I'm thinking Copeland: best 1 2 3 worst and Santana best 3 2 1 worst. Have decent headphones (ATH-A900) but nothing good to power them. At what point of the amp/speaker range are the differences audible? How many readers have the equipment to actually hear the difference? I've brutalized my ears through years of drumming/loud music but still love my V0 VBR. I don't care what you say.

Using CBR is just a stupid waste of bandwidth and space anyway. It would be a lot more interesting to see the results for V0 versus V2 versus V4 versus V6 or so.

"It's looking like the 256 kbps MP3s offer no advantage over the much smaller 128 kbps MP3s."

You speak to hastily. Perhaps you mean no immediate advantage. If you ever need to transcode your music to a different format the higher bit rate may hold up better in translation. If you've ever used the eq on an iPod with high frequency pure tones you'll know, for instance, how digital processing can result in un-expected artifacts that don't show up immediately. Now the problem with iPod eq isn't necessarily a sample rate issue but merely an example of how things can have consequences.

BTW, testing two short tracks of music isn't sufficient. Certain types of sounds are much harder for certain algorithms to compress and the difference in bit rates becomes more apparent. Your test reminds me of Mythbusters. You have far too small a data set (types of music / sounds not subjects) to draw a firm conclusion. It is like saying can two guitars sound different? We got to guitars and they sounded the same to everyone... That doesn't mean guitars can't sound different, it just means the two you picked don't.

How about I supply uncompressed test recordings and you try again? Or download standardized test samples used to check audio compression. I think the results would be a bit different.

Just to be totally pedantic about things... The difference in the MP3s is their "data rates" not their "sampling rates". All of these clips have a sampling rate of 44.1kHz, just like the CD that I assume they were taken from.

Also, looking at the info in Quicktime player for each of the 6 MP3 files hidden behind the embedded flash player, they all show up as being 256kpbs. What gives?

To TB, the MP3 files were first compressed to their respective rates, then Dave changed them back to 256kbps so that you can't tell by their file sizes.

TB: Thanks for setting me straight on the data rate/sampling rate distinction. I have corrected the post.

As Freiddie indicates, I took the original CD, compressed it to the data rates indicated, then re-encoded them all at 256 kbps so that listeners wouldn't be able to distinguish between them by their load times.

"TB: Thanks for setting me straight on the data rate/sampling rate distinction. I have corrected the post.

As Freiddie indicates, I took the original CD, compressed it to the data rates indicated, then re-encoded them all at 256 kbps so that listeners wouldn't be able to distinguish between them by their load times."

While it is interesting to see that the 128k and 256k both seem to have held up to the same degree to your re-encoding, you have also somewhat invalidated your test by doing so. You are no longer testing virgin encodes from uncompressed. Your test was actually if people could hear the difference between 128k and 256k rips re-encoded to 256k--which is related you your claimed test but not, in fact, the same thing. Based on this your conclusions are invalid. You need to restate you conclusions to match the actual test you performed.

Copland: 3 best, 1 worst. I guessed by the richness of the piano part. 3 just sounded a lot more like what a grand piano would sound like in concert (fuller representation of harmonics, perhaps?). The pianos in 1 and 2 sounded more bland and wooden.

Santana: 3 worst, 1 best. In general 1 just sounded the clearest. But the most telling features were the breathing sounds of the singer. In 3 you can barely hear them. In 1 they are clear and add considerably to the musical experience --- the singer seems most 'human' then. In 2 they are audible are somewhat blurred; an inattentive listener could easily not be aware of them.

I used fairly expensive Etymotic in-ear earphones to listen to the samples. I hope those do make a difference...

Oh, and I do have 15 years of musical training, although I think it wasn't musical knowledge/ability so much as knowing the sound of a piano well that guided me.

And my second last sentence above should start 'In 2 they are audible BUT somewhat blurred...'

Scote:

Actually the 256 kbps sample was never re-encoded. The argument was made two weeks ago that this was a problem because it received a different treatment from the other two files.

In fact, if the 64 and 128 kbps samples were further degraded by being re-encoded, the difference between the 128 kbps and 256 kbps data rates should be larger, not smaller than what it was. Any error we've introduced in re-encoding should actually make it easier to tell the 256 kbps samples from the 128 kbps samples.

Since most listeners still did not detect a difference, then I'd say my conclusions stand quite well indeed.

You should put white text next to each recording stating the rank. Wouldn't be visible against the white background, but anyone can select the text and it becomes visible.

Santana (Could easily tell the difference)
1-2-3
Classical (Not at all sure I can tell the difference)
3-1-2 Almost guessing here.

Another factor to consider in this experiment is the loudness war that has been going on in the mastering industry since the 1990s (video demonstrating the practice).

At a glance it would add weight to your conclusion (and in general it does), but it actually invalidates this particular experiment. The Santana song you chose comes from one of the more obvious examples of bad mastering in the industry - those clips are going to be easier to order because the source material itself (your CD) was compromised to begin with.

It would be interesting to do the same study on people under the influence of Cannabis or LSD.

"Scote:
Actually the 256 kbps sample was never re-encoded. The argument was made two weeks ago that this was a problem because it received a different treatment from the other two files."

You'll forgive me for not knowing what you wrote 2 weeks ago, especially since what you just wrote in the comments was:

"As Freiddie indicates, I took the original CD, compressed it to the data rates indicated, then re-encoded them all at 256 kbps so that listeners wouldn't be able to distinguish between them by their load times"

What you said earlier in the posts doesn't match what you are telling me now. This indicates you are being sloppy with the facts.

I'm not claiming your study has no value, but I do think you are playing fast and loose with your conclusions after compressing only two short pieces of music. I'm not claiming that 128k files are always distinguishable from 256k. Most of the time I think they aren't. But you cannot reasonably say that 128k and 256k files are indistinguishable by testing two, short, non-representative music files. I'm rather surprised at how defensive you seem about this rather obvious fact.

Try compressing something known to be difficult for mp3s to compress (a worst case scenario) and try your study again--if that is indistinguishable then you might be able to legitimately draw the conclusion you are drawing now.

Santana & Copland too 2-3-1 (256-128-64)

I have a large CD collection (c. 450) and encoded them a couple of years ago. I found AAC better than MP3 for the same bit rate. I started at 96 kbps AAC, and was worried that I was running out of space after 20 GB. I switched to 64 kbps AAC for the rest. I found any lower was awful, but 64 kbps acceptable and necessary to fit on my drive. I'd probably switch to 128 kbps now that I have more space, although I don't buy many CDs these days as my library is >50 days music. I always listen on computer speakers and usually as background.

By Edwin Arneson (not verified) on 01 Dec 2007 #permalink

Scote:

I wasn't attempting to deceive you with my comments, and I apologize if you felt that way. I should have been clearer in comment #15. I can see you might have encountered the facts in an order that made you feel manipulated.

However, I think you'll excuse me if you read a "defensive" tone in my comments after you've called me "sloppy" and "fast and loose."

You do make some valid points in comment #24 -- that perhaps if I had used recordings that highlight some of the flaws in the MP3 algorithm, the results would have been different. That said, since self-proclaimed "audiophiles" are able to discern the difference between the Santana clips, and still our non-audiophile listeners can not, we can make a reasonable case that most listeners can't tell the difference between 128 kpbs and 256 kbps recordings.

What I think would be an interesting follow-up would be to see if we can train non-audiophiles to notice the difference between different data rates, by explaining where to listen for problems. I'm not sure it's interesting enough to devote an entire Casual Friday to the problem, though.

"Scote:

I wasn't attempting to deceive you with my comments, and I apologize if you felt that way. I should have been clearer in comment #15. I can see you might have encountered the facts in an order that made you feel manipulated.

However, I think you'll excuse me if you read a "defensive" tone in my comments after you've called me "sloppy" and "fast and loose."

Granted, "fast and loose" is, perhaps, inflammatory and I could have chosen a phrase that is less so and more inclined to engender rational discourse . I think I was surprised at your eagerness to come to a universal conclusion from a mere two short clips of audio. I didn't seem to comport with the more rigorous standards I've come to expect from you.

As to "sloppy," I'm sorry to say that is simply true. At one point you said you "re-encoded them all at 256 kbps" and at the next you are scolding me for not knowing that you only re-encoded the 64k and 128k files. Those two contentions are contradictory and "sloppy" is a factual and otherwise non-judgmental conclusion born out by the inconsistency. I did not accuse you of being deliberate deceptive, nor did I think you meant to be.

Scote,

I don't think I've made a "universal conclusion" here." I simply tested two short clips and reported the results. Let's go over my claims:

1. "Few listeners can distinguish between "average" and "best" MP3 samples"

This has been shown to be true. While we can question whether these were the best samples, for these samples, fewer than 5 percent of listeners could tell the difference.

2. "It's looking like the 256 kbps MP3s offer no advantage over the much smaller 128 kbps MP3s."

This was only a provisional conclusion, and I believe it was warranted based on the dramatic results you see in the graph. Few people noticed any difference at all, and many actually rated the 128 kbps samples higher than the 256 kbps samples.

3. Self-rated audiophiles are able to discern the difference.

I think this is one conclusion you and I can both agree on.

What didn't I say? I didn't say a higher sampling rate wasn't worth it, or that these two samples are representative of all music, or any broader conclusion than the three points I make above.

The one mistake I made was to be unclear about the encoding procedure in comment 15. I'm sorry about that.

Saying "It's looking like the 256 kbps MP3s offer no advantage over the much smaller 128 kbps MP3s" can make audiophiles, myself included, a little irritated. Then again, that quote was taken out of context; after I read the remainder of your entry, that statement gained a little more credibility. Perhaps if it was preceded by "For the average listener..." I wouldn't have considered it to be as disconcerting.

It makes sense as to why the majority of people cannot tell the difference between 128kbps and 256kbps:

* Most computer audio systems comprise of basic left and right speakers, and maybe a subwoofer. However, rare is it that these two speakers include tweeters, the tiny sound producing units that are responsible for the reproduction of frequencies above (typically) 12KHz. Tweeters are not included because they increase the cost of the speakers substantially (via box construction, added components, electronic frequency crossovers, etc) and the smaller woofers can reproduce some of the upper ranges. However...

* The cutoff point for standard small woofers can be anywhere from 14KHz upwards, but is typically at the low end.

* The standard low-pass cutoff for 128kbps MP3 files is 16KHz, meaning the encoder discards all data that is above that frequency. 256kbps encodings typically do not have a lowpass cutoff.

* Over the years, MP3 encoders have advanced in such ways that typical artifacts (warbling, pre-echo, metallic high end, etc) are lessening greatly, especially at lower bitrates like 128kbps, but the lowpass filter is still in effect. Therefore...

* The benefits of a full-frequency 256kbps file will only be heard by those whose audio equipment is capable of reproducing frequencies above 16KHz. (note: As per the Nyquist theorem, the maximum frequency reproduced by standard, uncompressed audio CDs is 22KHz)

* Thus, I would expect a strong correlation between those with relatively high-end audio equipment and those who proclaim themselves "audiophiles" - discerning higher bitrate MP3s is becoming more an issue of faithful frequency reproduction, in my opinion.

"* The benefits of a full-frequency 256kbps file will only be heard by those whose audio equipment is capable of reproducing frequencies above 16KHz. (note: As per the Nyquist theorem, the maximum frequency reproduced by standard, uncompressed audio CDs is 22KHz)"

I wouldn't want people to think I'm just picking on Dave so I should point out what you may mean is "* The [full] benefits of a full-frequency 256kbps file will only be heard by those whose audio equipment is capable of reproducing frequencies above 16KHz." This is because high frequencies are not the only potential benefit. This is not a rip on your statements just a note about the precision of language.

Personally, I usually find it easiest to hear the difference in bit rates in the cymbals. In 128kbps and lower the details in cymbals tend to turn to mush. There were no clear and prominent cymbals in the two samples--and I can't hear above 16k.

Note, also, that many, many people have full range headphones like the venerable and reasonably priced Sony V6 phones which are fully capable of reproducing the entire range of 44.1k frequency reproduction provided the amp can, and many amps and devices are capable of driving 16k highs.

Hi Dave,

for the next test, it would be interesting to test Enya music files. I figured out, her style is poison for mp3. Would be intersting if you could confirm that.

What matters to me is that the 37th copy, on the grain-of-sand-archive-of-everything that I will to my then great-to-the-exponent-whatever grandchildren, is still fully accurate, and hasn't lost a bit here and there each time someone in the ancestry string changed hardware or software.

Or, for that matter, when someone took their grain-of-sand-archive-of-everything to the beach, and dropped it*, and needed to make a fresh copy from the family backups, which have to be non-lossy.

____
*If anyone recalls who wrote that science fiction story, please credit the author for me.

By Hank Roberts (not verified) on 03 Dec 2007 #permalink

Hi Dave,

Here are my results :
Copland best to worst : 2-3-1
Santana best to worst : 2-1-3

Listened to the samples off an onboard sound card through a pair of closed headphones (BeyerDynamic DT770 Pros). The difference is easily recognizable with Santana but less so with Copland. The worst samples seemed to have more dead frequencies. Consequently, the best ones seemed the fullest.

My audiophile rating would be 4-5 : I really care about my sound and will spend to a certain extent on equipment, but I will not go overboard.

I'd also be really interested if you have any information on whether a dedicated sound card makes a difference.

After looking at the answers, it would seem I have a preference for music encoded at 128kbps. Seeing as how this would be the most commonly available bitrate, I was wondering whether your survey showed a preference for this bitrate (more people rating 128kbps better than the others)?

>> I'd also be really interested if you have any
>> information on whether a dedicated sound card makes a
>> difference.

I have heard that, due to the high prevalence of electromagnetic fields on and near a computer's motherboard, the quality of audio processors from high to low would be (A) External A/D Processor (via USB or FireWire), (B) Dedicated sound card located farthest away from the motherboard, and finally (C) integrated motherboard sound.

Of course, some integrated sound is better shielded than others and EMF may not even be a problem. Heck, I use integrated sound on my desktop and have had no problems with it (my laptop, however, has very noticeable EMF interference, as it is an ultraportable. All the parts inside are really, really close together)!

Otherwise, I'd say it's merely a matter of preference. If what's working for you now isn't noticeably problematic, hang on to it. If your speakers buzz when your hard drive spins up, then it may be time to go with an external solution...

Great work!
I have taken such test as an audio engineering student in Amsterdam some time ago but those days we were all trained to have "golden ears". I must say that a big factor which can influence the results greatly is the equipment the tests are being listened through. It is very important to have high quality DAC's and monitors, otherwise all samples will sound as good as the weakest link in the audio chain.
In my class, the results looked much better with much higher results in distinguishing between the two top rates (128 and 256).

By Udi Shvekey (not verified) on 04 Dec 2007 #permalink

I'd like to see these results tested on two groups of people including a representative sample of the "average listening population" as compared with a "highly trained musician population." Musicians I know often discuss the discrepancy between recording levels and how easily they discern the difference. It would be neat to level the playing field and conduct this experiment on the same listening equipment at a university comparing music majors to non music majors and see what those results deliver. That would eliminate the question of the quality of the listening device being a variable.

Copland 1 - 64
Copland 2 - 256
Copland 3 - 128
Santana 1 - 256
Santana 2 - 128
Santana 3 - 64

Audiophile Rating - 7.5

Missed the survey boat (boolean) - True

Using Sony headphones that are old enough that the foam cushions are long gone, Santana #2 sounds the worst to me, but #1 and #3 sound almost the same, so Santana appears to be 1:256k, 2:64k, 3:128k, or best to worst, 132, which leaves, among your readers, votes for Santana #1 as best and worst, but not the middle, and #3 almost evenly divided between best, worst and middle. #2 is considered best, worst, but mostly middle. It appears that over half of us are wrong about which Santana clip is the 64k.

A few years ago, I ran across a similar test, pitting bit rates of 64k, 128k, 256k and 320k against self-selected audiophiles who thought that they had "Golden Ears" using double blind testing and AB switching, in which the listener could switch between two songs playing in synchrony, but at different bit rates as often as he liked. A few rare individuals in this test could pick 256kb over 128k with better accuracy than a coin flip, and many couldn't reliably tell 64kb from 320kb. None were significantly better than chance at determining 320k from 256k.

A related study seemed to show that audiophiles who thought that they had "Golden Ears" were more certain of their decisions than those who did not consider themselves "Golden", but were no more accurate.

By Dangerous Dan (not verified) on 07 Dec 2007 #permalink

Looks like I'm late to the party, but thought I'd drop in anyhow.

Listening to these samples in a room that has distracting ambient noise, with ears that are a bit clogged from a cold, and speakers that are on the blink, I could still hear a difference between the 128 and 256 samples. But I *preferred* the 128 because I intensely disliked the music selected for the samples.

Hearing a "better" version was, to my ears, a worse experience. (Nothing against Copland or Santana, both of whom I usually enjoy, it's just these snippets I can't stand.)

I wonder whether others might make a choice based on personal preferences rather than on the quality of the play back.

CMR-114's comment, I think, is pointing to a practical problem that we "regular" listeners might have with assessing sound samples. (Professional musicians or sound technicians might not be bothered by these difficulties.)

Unless one is already familiar with what a piece of music is "supposed" to sound like, comparing brief samples of sounds may pose a greater challenge to one's short term musical memory than to one's hearing acuity.

Well, the only way this test can be done legitimately is with a reference grade system placed in a reference quality room with people that have trained ears.

Anything other methodology is just trying to figure out what the average person using whatever average system overall.

These portable formats are originally supposed to used on portable systems like iPods and computer using cheap computer speakers. But to replace the CD using a nice high quality stereo? NOPE. One would be stupid to do that. These formats have too much compression. But since the average person doesn't have a high quality stereo, trained ears, or a high quality room, the masses are used as a test for quality level....

If you have to save space on a computer hard drive, iPod to put your music, then by all means use either 128 or 256k formats, but don't try to sell these formats as "just as good as the original", because any experienced audiophile using high end playback systems will hear the difference, especially compared to an original format as with a CD, SACD, DVD-A, or the new TrueHD or DTS HD Master formats.

When CDs were fairly new, a great many "Experienced Audiophiles" claimed that they could hear an improvement in the quality of CD produced sound if the edge of the CD was colored green with a felt-tip marker. In actual testing, the bitstream recovered from the CDs with and without peripheral green ink was identical, and therefor, the sounds produced on equal equipment would be identical, and in double-blind testing, no individual was found to have even the slightest statistically meaningful ability to tell the two apart, nor could the most sensitive electronics. My point here is that differences that an audiophile claims to be able to hear usually exceed that which he is able to prove or demonstrate that he can hear.

The individual who can reliably tell the difference between studio quality analog or digital recording and 256k mp3 is demonstrably rare. There simply aren't enough of them around for all audiophiles to be one, and not all of them even claim to be audiophiles. The individual who can reliably tell between a studio recording and 64k mp3 is common, but hardly universal, even among the audiophile set.

That being said, I still believe that for the purpose of archiving music, using a higher bitrate than you can distinguish is a good idea, because as scote wrote in #10, "If you ever need to transcode your music to a different format the higher bit rate may hold up better in translation." If you want some compression, but no loss of signal accuracy, use something like the FLAC codec.

For listening purposes, use whatever bitrate you like, but remember, the difference that you think you hear between the bitrates probably affects your listening pleasure more than the ones you actually do hear.

By Dangerous Dan (not verified) on 16 Dec 2007 #permalink

[Well, the only way this test can be done legitimately is with a reference grade system placed in a reference quality room with people that have trained ears.]

How would only testing people with "trained ears" enable improvements to psychoacoustic audio compression? Wouldn't that only make such systems suitable for people with "trained ears", instead of finding out the general principles that govern the audio perception capabilities of all humans?

copland 1: 64
missing high frequencies

copland 2: 128
still congested and distorted (at the start easily heard, like it crackles)

copland 3: 256

santana 1: 256

santana 2: 128
too smooth, bassy and missing guitar string pluck

santana 3: 64
uh terrible

Where is the FLAC example? :) It's not fair when we have nothing for sure to compare to.

I wanted to post to those people that had trouble (as I did) telling the cuts apart. Something is wrong with the way this test is set up.

I did something a little different. I compared your three cuts to the original cd.

I took the test and connected my computer through my processor and tried to get it calibrated to the output of my cd player with a sound meter.

I have to say that the samples were hard to tell apart, even once you posted the answer.

****************************************************

Then I compared your three cuts to my cd player playing the Santana cut from my own cd. The cd version was startling better than any of the three.

Something just doesn't seem right about those 3 cuts when I play them compared to the original cd. There should not be that much difference between your cuts and my cd.

I've done this before and have recorded entire albums and been able to A/B between the different bit rates and the original cd and I have had a lot of trouble telling the difference between bit rates that are close to each other, (so I was expecting to have trouble with your test).

But, your cuts are at a whole different level of indistinguishability. Compared to the original cd they just don't fit, they sound horrible. Your cuts are not representative of the bit rates. They all sound bad.

My conclusion is that this was not a good test of the differences (that I agree can be hard to distinguish) . You can make the argument that it is all relative and it proves that the compression rates are statistically similar, but the test is in a lot of ways flawed from beginning to end. No control of any variables and no accounting for everyones systems and I could go on and on. This is pseudo-science at its snake oil best!

I would implore anyone who wants to hear the differences to bring in your iPod to a high end stereo store along with the original cds and see if they will let you compare your compressed ripped version to the original. You will see much more difference between the low bit rate and the high bit rate of your own music.

My problem with this poorly conducted "test" is that I'm afraid that it will leave a lot of people believing that what you are providing as the three different bit rates is representative of how much difference there is between the three bit rates. It only really proves that these three cuts, played from computer sound cards, through poor speakers... sound pretty close.

I think most listeners know that a well done cd through a good sound system will sound hugely better and these three cuts will sound incrementally no different between each other compared to the cd through a good system.

Please don't view this test as evidence that bit rates don't matter that much. They only don't matter on this poor test.

This is not science no matter how many "P values" are calculated to prove significance. Garbage in garbage out.

Pseudo-science... very misleading!

I would be interested to know, along these same lines, if similar tests have been done on the ability of the general public to distinguish between CD quality digital recordings and original analogue recordings on studio tapes.

[I admit in advance my inability to read all 46 of the above comments, so I may have missed something.]

BTW, I had no trouble correctly identifying the Santana cuts by bitrate - but plead guilty to having been a professional soundman in my youth.

If it is allowed, and Dave Munger reads all these comments, I would like to ask him if there has been research on the psychological effects of what I call the "soundtrack life" - this new habit of some people spending a great percentage of their day with an iPod playing music into their heads.

Thank you,

Kip

By Kip Hansen (not verified) on 03 Feb 2008 #permalink

Santana:
2=256
1=128
3=64

Did use Beyerdynamic DT 770 Pro 80ohm, powered by my laptop's built in soundcard.

Comparing an mp3 to the CD original correctly is not hard. Just rip the CD to .wav (e.g. with Exact Audio Copy), make an mp3 of it (e.g. with LAME), and compare the mp3 to wav using one of several software ABX comparators available (e.g. WinABX). Do 16-20 trials. WinABX will compute the likelihood that you heard a real difference, versus chance.

If for some reason level adjustment or EQ was applied during either the mp3 or the CD rip, then of course there will be an audible difference between them, but that's not inherent to the formats. Using a good modern mp3 encoder (like the LAME 3.97 series) and a non-'killer' track as test source (e.g, tracks used to optimize LAME), variable bitrate encodes at 190 kbps and greater should be indistinguishable to all but an extremely few listeners, even on 'reference quality' equipment. MP3 has gotten THAT good.

By krabapple (not verified) on 13 Mar 2008 #permalink

I agree completely with #46, this test is very much flawed. The reason why all samples sound bad is because all samples were encoded twice, second time to 256kbps. The author of the article should look up what the implications are of tandem coding using a lossy codec.
Also regarding your audiophile index, doesn't it mean that someone who thinks (s)he is an "extreme" audiophile also tries harder to find the difference? I have years of experience in ABX testing of audio codecs and the longer you listen to and the more you concentrate on the song the easier it becomes to hear differences.

Hi.

I just want to join the crowd pointing out that this test is useless to conclude the ability to distinguish and rate mp3 at different bitrates and the original. Using re-encoded/transcoded samples will only allow to conclude that ability to distinguish and rate re-encoded/transcoded samples, not matter the methology used.

Thus, the conclusion mentioned in the headline is not at all scientific unless this fundamental flaw is corrected.

By Graciano Fialho (not verified) on 10 Sep 2008 #permalink

I strongly do agree to the before mentioned arguments, that this test is fundamentally flawed and thus most of the conclusions drawn from it are invalid. It had to be redone using 1) mp3s directly encoded from uncompressed/lossless source, and 2) again decoded to uncompressed/lossless for ABXing. However, even then two short pieces of music are not a representative data basis for drawing any generalized conclusions out of it.

Hi. For Santana I think:
1. 256
2. 128
(very difficult to tell between these two)
3. 64 (quite clearly worse)

Can't tell with the classical music track (usually find it difficult with this genre anyway). Maybe you guys could have chosen a more standard track, like Chopin, Rachmaninof, Beethoven, etc? It's very difficult with contemporary music. But great work guys, could you publish an updated graph with the replies you get from here?

classical:
1-96
2-256
3-128

rock:
1-256
2-128
3-96

Even though Dave posted the answers by the time I had found this site, I was able to tell the difference using my Klipsch X5 in-ear monitors. The difference b/w 64 and 128 should be obvious, the former sounds like it's under water.

Between 128 and 256 the differences are a bit more nuanced. On santana you can tell there is far more compression with the second track (128) when the acoustic guitar, especially at the trailing edge of the notes.

Trust me when you have a good pair of headphones, there's even a difference between 256 and 320 kbps. The sound from 256 sounds more closed off and narrow compared to the wider soundtage with 320 kbps.