I've seen neither spam nor abuse in captions, but I have seen plenty of lousy ma...

judge2020 · on Sept 29, 2020

There's also the situation where regular people submit their "community-contributed" captions where it's obvious that they used Google Translate themselves.

https://youtu.be/8PGghyfRVtE

gwern · on Sept 29, 2020

On the other hand, I've seen some shockingly good machine captions, where I'm fairly sure they were machine because they were just uploaded or I noticed technical terms being transcribed phonetically, but they nevertheless manage to transcribe better than I can understand it. My theory is that they prioritize the full-power RNN transcriptions for only some new videos, and haven't gone back over the full historical YT corpus.

palijer · on Sept 29, 2020

The captions for Google meetings are frighteningly good and accurate. We use a lot of acronyms, and made up words that it turns into acronyms. I sometimes get spooked because even with it off, I know all those meetings and social calls are sitting in some google db to be turned into something someday.

ShockedUnicorn · on Sept 29, 2020

I wish they'd care more about people with foreign accents. I still find any type of google voice recognition unusable, unless I lay on a really bad fake texan accent.

jk700 · on Sept 29, 2020

Some people and orgs upload videos with subtitles, especially those targeting international viewers.

gwern · on Sept 29, 2020

I am well aware of that, but when the video comes with no slides and not even a summary, you can guess that they were not paying hundreds of dollars to have it professionally transcribed, and you can also guess that humans were not involved anywhere in the process (either to generate or review it) when they, say, phonetically transcribe fairly common technical jargon which is written prominently on the slide which has been fullscreened for the past minute in the video.

(Is it really so hard to believe that the usual neural network progress curve has happened for speech transcription and that the future is already here, just unevenly distributed across videos?)

missedthecue · on Sept 29, 2020

I have seen plenty of profane machine captions, but never any profane abuse in crowdsourced ones.

jxramos · on Sept 29, 2020

yep, seen some machine generated profanity before, it's always humorous. That bot has a filthy mouth!

greggman3 · on Sept 29, 2020

According to the article, captions have to be enabled and then individually approved by the channel owner. So of course you'd be unlikely to see spam or abuse in captions since the channel owner is filtering that spam and abuse out.

Not saying I agree with Google's decision. Just saying it makes sense that you wouldn't see spam and abuse.