What if it is a pareto improvement: better improvement for some dialects but no ...

viraptor · on March 4, 2024

Here's a question that should have the same/similar answer: Increasingly some part of the job interviews is being handled over the internet. All other things being equal, people are likely to have a more positive response to candidates with more pleasant voice. So if new ML-enhanced codecs become more common, we may find that some group X has a just slightly worse quality score than others. Over enough samples that would translate to lower interview success rate for them.

Do you think we should keep using that codec, because overall we get a better sound quality across all groups? Do you feel the same as a member of group X?

shwaj · on March 5, 2024

I don't think it's a given that we shouldn't keep using that codec. For example, maybe the improvement is due to an open source hacker working in their spare time to make the world a better place. Do we tell them their contribution isn't welcome until it meets the community's benchmark for equity?

Your same argument can also be used to degrade the performance for all other groups, so that group X isn't unfairly disadvantaged. Or, it can even be used to argue that the performance for other groups should be degraded to be even worse than group X, to compensate for other factors that disadvantage group X.

This is argumentum ad absurdum, but it goes to show that the issue isn't as black and white as you seem to think it is.

viraptor · on March 5, 2024

A person creating a codec doesn't choose if it's globally adopted. System implementors (like for example Slack) do. You're don't have to tell the open source dev anything. You don't owe them to include their implementation.

And if their contribution was to the final system, sure, it's the owner's choice what the threshold for acceptable contribution is. In the same way they can set any other benchmark.

> Your same argument can also be used to degrade the performance for all other groups,

The context here was Pareto improvement. You're bringing a different situation.

shwaj · on March 5, 2024

The grandparent provided an argument why we might not want to use an algorithm, even if it provided a Pareto improvement.

I suggested that the same argument could be used to say that we should actively degrade performance of the algorithm, in the name of equity. This is absurd, and illustrates that the GP argument is maybe not as strong as it appears.

viraptor · on March 5, 2024

The argument doesn't make sense in practice. We could discuss it as a philosophy exercise, but realistically if the current result is better overall and biased against some group, you can just rebalance it and still get an overall better result compared to status quo.

Changing codecs in practice takes years/decades, so you always have time to stop, think and tweak things.

gcr · on March 5, 2024

One thing the small mom-and-pop hacker types can do is disclose where bias can enter the system or evaluate it on standard benchmarks so folks can get an idea where it works and where it fails. That was the intent behind the top-level comment asking about bias, I think.

If improving the codec is a matter of training on dataset A vs dataset B, that’s an easier change.

samus · on March 5, 2024

I would be very surprised if there is no improvement if the codec is biased towards particular dialects or other distinctive subsets of the data. And we could certainly be fine with some kinds of bias. Speech codecs are intended to transmit human speech after all. Not that of dogs, bats, or hypothetical extraterrestrials. On the other hand, a wider dataset might reduce overfitting and force the model to learn better.

If the codec has the intention of working best for human voice in general, then it is simply not possible to define sensible subsets of the user base to optimize for. Curating an appropriate training set has therefore technical impact on the performance of the codec. Realistically, I admit that the percentages of speech samples of languages in such a dataset would be according to the relative amount of speakers. This is of course a very fuzzy number with many sources of systematic error (like what counts as one language, do non-native speakers count, which level of proficiency is considered relevant, etc.), and ultimately English is a bit more important since it is de-facto the international lingua franca of this era.

In short, a good training set is important unless one opines that certain subsets of humanity will never ever use the codec, which is equivalent to being blind to the reality that more and more parts of the world are getting access to the internet.