Translate "sentences of news" is very different to translating an entire article, which is obviously what's interesting.
Is anybody in MT or text comprehension/generation really working on systems that construct a model/"understanding" of the bigger narrative in a longer-running text? Even just to be able to do correct anaphora resolution across sentence and paragraph boundaries, but intuitively also WSD seems easier if you've got some sort of abstract context over more than just a sentence.
I think Google translate already has this. I was translating some text into German a few days ago, and after a few sentences I used a word that made it clear that I was talking about a specific type of contract appointment, and it went back and adjusted earlier sentences to use more precise terminology. You only notice this when you a) speak the language you're translating into somewhat; b) actually type/compose the message in the Google Translate text box; and c) are typing something idiomatic enough that such specific phrases can be inferred. So I guess it's just something you wouldn't normally notice.
Either way, I was mightily impressed, to the point where my wife had to roll her eyes and say 'yeah yeah I understand it now' to get me to drop it. (I'm just easily excited I guess.)
I've found that google translate works decently (as far as these automated translations can be expected to work) from/to English, however translating from a different pair of languages the results are often very off in my experience. In particular it seems it has some sort of internal bias, as if it always an english-like intermediary representation.
For instance, if you ask google to translate the Portuguese "báculo" into French it gives you "personnel". It's nonsense as far as I can tell, a báculo is a "crosier of a bishop"[1]. So what's going on here? Well if you translate it from PT to EN it gives you "staff" and suddenly it starts making sense, because while staff means "A long, straight, thick wooden rod or stick, especially one used to assist in walking" (which fits báculo) it can also mean "The employees of a business" which is an accurate definition for french "personnel". And I believe that's how you end up with the nonsensical PT -> FR translation.
Similarly Google used to be confused by the tu/vous (informal/formal) distinction that exists in many languages but not in English. At some point the portuguese "tu és" would be translated in french by the formal "vous êtes" instead of the informal "tu es". This appears to have been fixed however, I can't reproduce it at the moment.
Conjugations don't fare so well however, for instance imperfect past french "je chantais" is translated into portuguese preterite "eu cantei" even though "eu cantava" would make more sense I think. Obviously with such small phrases I can't really be too harsh on google's bad grammar, they're probably not optimizing for that case.
That sounds extremely interesting. I had not noticed that feature before. Do you happen to have some example input at hand that triggers such an adjustment?
People have certainly worked on moving beyond sentence boundaries, although what is meant by understanding is always a bit nebulous. Certainly we need to make sure whatever process we are using has a sufficiently rich internal knowledge representation. One piece of work is this: https://github.com/chardmeier/docent/wiki which is a document level phrases-based statistical machine translation decoder. There have also been special purpose evaluation tasks which include correct pronoun resolution e.g. https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT... .
If you're interested, probably the DiscoMT workshops are a good starting point for some things people have tried.
Is anybody in MT or text comprehension/generation really working on systems that construct a model/"understanding" of the bigger narrative in a longer-running text? Even just to be able to do correct anaphora resolution across sentence and paragraph boundaries, but intuitively also WSD seems easier if you've got some sort of abstract context over more than just a sentence.