Commons:Requests for comment/Technical needs survey/TimedText

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Previous proposal Overview page Next proposal

TimedText[edit]

Description of the problems[edit]

  • Problem description:
  1. need an easy/user-friendly way to categorise timedtext. beneficial for categorising based on languages, quality of transcript, etc.
  2. need an easy way to check all timedtext pages associated with a file. something similar to https://commons.wikimedia.org/w/index.php?oldid=828200732#L-166 .
  3. need a more intuitive way of going to the associated file on a timedtext page. currently it's by ctrl+click the file (the mediaplayer box), or open up the popup and click the circle i. i needed this so much that i wrote a script before i learnt the ctrl+click trick https://commons.wikimedia.org/w/index.php?oldid=828200732#L-159.
  4. a way to assess the quality of timedtext (similar to wikisource?). incomplete, transcribed, non-synchronised, proofread, verified...?--RZuo (talk) 23:59, 31 December 2023 (UTC)[reply]
  5. a tool/interface that helps transcription, something like https://www.nikse.dk/subtitleedit/online .--RZuo (talk) 07:07, 2 January 2024 (UTC)[reply]
  • Proposal type: feature request
  • Proposed solution:
  • Phabricator ticket:
  • Further remarks:

Discussion[edit]

  •  Oppose You did not explain why this would be useful and why there are these needs. Also 4 can already be done via file categories. Opposing for now since this so far doesn't seem to be anywhere near the most important issues and can to a large degree already be done; very many other issues would be more important and haven't been listed here. --Prototyperspective (talk) 11:17, 1 January 2024 (UTC)[reply]
    can you point to me an english timedtext that's incomplete, and an english timedtext that's been proofread, based on your claim that "4 can already be done via file categories"? RZuo (talk) 11:46, 1 January 2024 (UTC)[reply]
    I said it can already be done, not that it is already being done and I would encourage such to be done, especially if machine translation / auto-caption tools are leveraged for WMC multilingualism (which could be very impactful). However, I can also point you to an example: Category:Videos by Terra X with English subtitle file unchecked – these need proofreading (see the cats above for more). I think people usually just upload timedtexts that are already complete but a new category for incomplete ones would be useful.
    1. also is already being done with cats like "…with subtitles in English". Prototyperspective (talk) 16:13, 1 January 2024 (UTC)[reply]
as i've tested at TimedText:Sandbox.webm.en.srt, timedtext pages can be categorised in the same way as other pages, but hotcat doesnt work on tt pages, so it's cumbersome. which is why i said we "need an easy/user-friendly way to categorise timedtext". the most basic solution is to make hotcat work on tt pages.
but traditional categorisation method is inferior to the assessment structure in wikisource, which i think is a lot easier to use (just clicking the coloured dots) and provides a standard classification.
then this reminded me of the need to have a transcription tool, because transcribing audio/video is different from a text. transcribing audio/video requires pausing the playback and setting timestamps.--RZuo (talk) 07:07, 2 January 2024 (UTC)[reply]
  • Regarding 3. A patch for this is already coming —TheDJ (talkcontribs) 10:19, 2 January 2024 (UTC)[reply]
  • Regarding 4. You can always just use talk pages. Just like Wikipedia uses talk pages for wiki project assessments. —TheDJ (talkcontribs) 10:17, 2 January 2024 (UTC)[reply]
    • TheDJ, consider ASR-generated captions. How to indicate to the viewer in the player as long as they aren't fully proofread, that these are automated captions (and therefore might be wrong in some places)? How to efficiently proofread them using the crowd? Love Bawolff's idea below. -- Rillke(q?) 19:03, 4 February 2024 (UTC)[reply]
      I think we’d better focus on asr subtitles before this. My point was more that there are some workarounds right now, yet no one is proofreading to begin with. It’s generally not a good idea to add complexity to the software, before there is a well identified use and need. And I don’t think we have seen that yet, or there would be templates in talk with proofreading state. So in my opinion it would be better to work on more fundamental problems before we tackle proofreading with additional software complexity . —TheDJ (talkcontribs) 21:26, 4 February 2024 (UTC)[reply]
  • Regarding 5. Heavily suggest that this is a case of "external specialised services are better than build and maintain our own service". We used to have Amara integration and for the few years that that worked, it was pretty ok. Finding a good online editor, hosting it on Toolforge and adding a few integrations is going to be way more maintainable than trying to ram yet another component into Mediawiki. —TheDJ (talkcontribs) 10:16, 2 January 2024 (UTC)[reply]
    a tool is needed, doesnt mean it has to be embeded in mediawiki. an external tool hosted on toolforge like croptool is good enough, but we dont have such a tool now.
    making it easy to transcribe audio/video is good for linguistic diversity. RZuo (talk) 08:47, 13 January 2024 (UTC)[reply]
  •  Comment In order to avoid Special:AbuseFilter/103, designed to "Enforce syntactical valid Timed Text" for new and anonymous users, one must ensure their Timed Text is syntactically valid by reading MediaWiki:Abusefilter-warning-invalid-timed-text and en:SubRip.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 11:37, 28 January 2024 (UTC)[reply]
  • I feel like TimedText should integrate with the Translate extension. Bawolff (talk) 18:41, 3 February 2024 (UTC)[reply]
  • TimedText is currently not in a state that would encourage contributions in the same easy way you can impove Wikipedia Articles. I'd love to have a way to edit the caption line shown in the player while watching the video. Additionally, for speech, an ASR (like Whisper) would be helpful to me so the timing and most of the content is already correctly done. None of this have to be provided by MediaWiki, but a seamless integration in the user interface of Wikimedia Commons would be great. -- Rillke(q?) 19:03, 4 February 2024 (UTC)[reply]

Votes[edit]

  1. yes.--RZuo (talk) 12:01, 23 January 2024 (UTC)[reply]