In April, EEG Video hosted the first in our series of Zoom webinars to educate media professionals about our closed captioning innovations.
Best Practices for Closed Captioning and Broadcasting • April 21, 2020
During this well-attended live online event, Bill McLaughlin, VP of Product Development for EEG, walked attendees through our solutions built for broadcasting, entertainment media, and more. This essential webinar covered:
-
All about EEG’s closed captioning solutions for broadcasters
-
How EEG’s products can be used for different captioning needs and workflows
-
The latest closed captioning advancements at EEG
-
A live Q&A session
Featured EEG solutions included:
Find out about upcoming EEG webinars here!
Transcript
Regina: Hi everyone and thanks so much for joining us today about closed captioning and broadcasting. I’m so happy you all could join and I hope you’re all staying safe and healthy out there.
My name is Regina Vilenskaya and I am the marketing content specialist here at EEG. I’ll be your moderator today. With me on this webinar is Bill McLaughlin, the VP of Product Development. Bill has led a lot of EEG webinars in the past, as well as delivered talks about the advancements in closed captioning and the broadcasting industry in general.
For today’s webinar, Bill will be walking us through who we are and what EEG does, the EEG solutions built for broadcasting and the latest products and features. If you have any questions today about any topics Bill mentions, feel free to pop those questions into the Q&A tool at the bottom of the Zoom window. We’ll try to answer as many questions as possible at the end of the webinar. And I think that about covers it, so I’m now going to welcome Bill to kick off the webinar Best Practices for Closed Captioning and Broadcasting. Bill, over to you!
Bill: Thanks a lot. OK. Can you see me great?
Regina: Yes, great.
Bill: Awesome. Thanks everyone for coming. It looks like we’ve had great attendance and it’s really good to be able to talk to everybody. Usually we - this time of year, you know, we’re out in Vegas at the NAB Show and it’s a really great time to talk with all our customers and partners. It’s a shame to be missing that but we’ll do what we can to, you know, try to share a little bit about what’s been new with what we’re doing, some general information about the products for anybody who’s learning about them for the first time and happy to take any questions both this week and, you know, related to anything that’s gonna be in next week’s webinar.
So I lead the product roadmap team at EEG generally, so a lot of what we talk about today, it’s gonna be about the things we’ve developed over the last couple years, but also especially the things that are brand new this year. We have development essentially in our SDI product lines that have been established for a long time, and also related to IP trends in AI and speech-to-text captioning and helping our customers move to software-based installations and specifically to cloud deployment and, you know, new video applications beyond entertainment and news media, like video learning, classrooms, meetings, video conferences, obviously–right in the present time that's a very active subject as well.
Our business interests kind of are at the intersection of video, AI and IP production and how that works with accessibility. The company has been providing accessibility products since the early 1980s. The encoders were, you know, traditionally reachable with dial-up modems, SerialPort, and we’ve come a long way from that towards being really IP-focused and cloud-focused solution and really providing a communication solution that’s bringing real-time caption services into the video production plants.
Today we’ll be talking about kind of how that fits specifically for broadcasting and, you know, kind of live news and entertainment media, whereas the next week’s webinar is going to be more about, you know, more private communications in terms of educational communication, corporate communication, municipalities and government, but so today we’re really talking mostly about the broadcasting field.
So we will talk about the HD492, which is our flagship unit right now for SDI closed caption encoding, and we’ll also talk about the Alta software, which is a software closed caption encoder that has a lot of the same features that, you know, you’ve come to expect if you’ve worked with SDI closed captioning gear, especially EEG gear in the past, but bringing that into IP and cloud video production. And finally, we’ll talk about Lexi, which is the product family that kind of brings AI advances into the mix and does speech-to-text AI and text-to-text translation AI and tries to make sure that the creation of accessible media can, you know, be done in the broadest and really most business-effective and sustainable way possible.
HD492 iCap Encoder
So, beginning with our 492, this is an SDI closed caption encoder. It’s SDI in and SDI out. This series of units–which began about 12 years ago with the HD480 unit, which was the first closed caption encoder really to support bidirectional IP communications through iCap, basically exchanging the audio through the captioning in a structured IP connection over the cloud–that first came out on the 480, which was about 12 years ago. And this series has continued to be updated, you know, for security, for new communications methods, for going from SD to 1.5G HD-SDI, to now 3G HD-SDI and the upcoming–we'll talk about that in a moment–there will be a 12G version.
And what’s really best about this unit is the super simple integrated connection to captioners through iCap. So iCap is going to be a bidirectional encrypted IP link between the SDI equipment that's in the broadcaster plant and a remote captioner. And it's more reliable than using something like a dial-up modem, it's a lot more secure than using a dial-up modem, it provides better-quality audio to the captioner than, you know, other phones and other out-of-band systems, and it's very low-latency, so it's really a dedicated tunnel for that caption data giving the captioners exactly what they need to receive from the encoder and putting it all onto a package connection that is very easy to set up from an IT standpoint that relies on an outbound connection module. You don’t need to put it completely outside a firewall and, you know, public-hosted IP address or anything like that. It pretty much works from within your standard broadcast installation.
This also has all of the traditional methods for caption and communication if that's something you still have needs for. The unit comes with telephone modem. It even has an option for a second modem if you're doing two languages or two different devices of traditional modem captioning. It supports serial input from teleprompters or other kind of local on-prem captioning devices, and you can take input to it on the just encrypted local plaintext connection that's disabled by default for security reasons, but in a local connection mode, sometimes that's a simple thing to do. If the captioner has a way to hear the audio from being onsite at an event or in the studio or anything like that, then that can be a reasonable option, too.
The 492 is going to take audio off of the SDI signal and it finds the audio in the embedded audio data and is able to downmix that, select the channels that have your dialogue most strongly, and it creates a low-latency encrypted feed to the captioners. So, you basically have an ability with the 492 to determine which captioners are going to be able to access your encoder. You know, thousands of captioners have the software downloaded and really just need an Access Code to get started with a new customer on that encoder.
And when they get the audio, they’ll return a transcription and the HD492 will put that transcription data into SDI VANC and at that point it's in the SDI signal and any kind of downstream receivers are going to be able to pick up that caption data for recording, for compression, for over-the-air transmission–really, any type of use case. Typically, those SDI VANC captions will be the format that's useful for communicating caption data.
iCap Workflow and Integration Advantages
When you're using iCap, you have a lot of workflow advantages, because you're using really a software-driven system that isn't just a point-to-point connection like a modem or a telnet connection. So, in addition to the security and IT advantages that you have logging of everything that happens, you kind of know when any captioners come in and out of the system, so you can track access, if there's any type of discrepancy report or a viewer report about a caption absence, you can monitor, you know, “Were the captioners in when we expected them to be in?”
You can even download as-run caption files that you can, you know, ingest into a system to do caption editing or just text transcripts to show you what was sent with or without timings attached so you have a lot of ability to record and repurpose everything that comes in over iCap in addition to access to the logging and access to the AI services like Lexi when the encoder owner is ready to look at any of those.
HD492 iCap Encoder Features and Advancements
The 492 hardware itself, the 492 is the one-rack unit version and that includes two main video outputs and an additional decoder output. It also includes options to purchase some extra software modules. So, we have a new software module, for example, that can ingest or output 2110 ancillary data so, effectively, you can move ancillary data between the SDI and the IP domains in the 2110 system, and there's also modules that can do things like play back a caption file in real time into the video signal based on embedded time codes, and there's modules–the CCMatch module, for example–can take your video and change the unit from its default video delay, which is very small–less than a line of video–to a larger delay it has a buffer to go out–to about eight seconds of delay, and you can use that for situations, especially when the video isn’t airing completely live to create real-time captions that are matched completely with the video and the audio track of the media.
Some of those features are also available on the 1492, which is the openGear card version of this product. It’s a frame card because you can save 10 of them on an openGear frame. It has most of the same features. The encoder is not built into that one and I believe the CCMatch feature’s not built into that one, but it has a lot of the same features and is another deployment option.
There will be another unit also coming into this product family by the end of 2020 that’s going to be able to process 12 Gbps SDI. What that allows you to do is have a native 4K signal and actually put captions directly into that or to be able to decode captions in a 4K signal and have a native 4K decoder output, which is, you know, probably more an A/V application right now than a reality for most broadcasters. But, you know, it's good to have that option for sports events that we might be capturing in 4K now to put captions in, in the native format.
Alta Software Caption Encoder
So, one of the reasons that the 12K Gbps SDI unit maybe isn’t what’s happening in a lot of broadcasting is, of course, because of the IP transition that’s happening in broadcasting. So, you know, we’ve been working for a number of years in the Alta product family on technologies that bring the closed captioning features that are familiar from the SDI world into IP video production, whether that's working on-prem, in the studio facility or in the cloud.
And our Alta system works either in MPEG transport streams or in 2110, so we're going to cover both of those and kind of talk about how these products can help you kind of take captioning workflow that is familiar from past practice over all the years that live captioning’s been pretty much completely required in North America on TV, and bring that into the IP production space.
So, the Alta encoder - it’s software you can deploy either on a, you know, on a physical rackmount service or as a virtual machine, or as a cloud virtual machine (something like an Amazon EC2) and, regardless of which of these deployment options are in, it's basically - the way to think of it is just a software version of a unit like the 492. So it's going to take, you know, video and audio signal in most cases. It's gonna take those in, it's gonna put a caption signal out and we have a lot of different formats for that, and the communication with caption services over iCap is going to be basically the same regardless of which format you’re using.
And that's a big advantage because it means that your caption service provider, who’s maybe working remotely, probably is not really queued into the video production realities in the studio facility, is really able to just do exactly the same workflow with exactly the same software, even when effectively, everything at the studio facility is being overhauled.
So, we think that that's a really big advantage that we can offer to our customers with this transition. And, basically, transitioning from SDI workflow to IP workflow or having side-by-side deployments of SDI in IP workflow for, you know, maybe different channels that are part of the same family. That kind of thing is all supported very, very cleanly with iCap, because the encoder is really a virtualized module anyway from the point of view of the captioner. You know, they get the audio back and they return the captions through a fixed interface.
Alta MPEG-TS Workflow
So to look at how that works with the MPEG transport stream, you'll see that kind of picture in the center with Alta, that shows the basic channel controller. And you can control a number of channels through one Alta server. For each one of those channels, you're going to have an input stream, which in this case is going to be an MPEG transport stream, and the video and the audio can be in quite a few formats. But it's gonna be a transport stream so it has, you know, an embedded PCR clock - the timing is all in one stream. The video, the audio, everything else is in one IP stream.
And we're able to take the audio out of that stream, apply a set of mixed parameters that you set up for each channel to say where the dialogue is in the audio mix, and we’ll encrypt that, we'll mix that down for the captioner, encrypt it and put it in a low-latency feed back to the captioners over iCap, and the captioners use the same software that they would use for any other iCap equipment. They return transcriptions and the transcriptions that are returned are going to go out into an IP output stream.
So, in most cases when you're working with a transport stream, the IP output stream is going to be the same video and audio signals on the input and we're going to add the captions into the user data embedded in the video. So, it's very analogous to the SDI workflow where the whole package flows through the caption encoder with only captions modified or added.
And in the transport stream domain this works the same way: The whole package passes through the audio coding, none of those parameters are changed, the captioning is just added and adds a really pretty trivial amount of extra bandwidth. In most cases, it’s going to be the same bandwidth coming out as going in and just a small amount of null packet data that was in the original stream will be replaced with captions on the output.
Alta MPEG-TS Features and Advancements
We've had a lot of success with this in sports and OTT types of environments working with cloud production and, you know, having an MPEG transport stream either in unicast in the cloud or multicast for on-prem installations usually. And so we’ve been finding that some new formats need to be supported, so compared to what you've maybe seen on this product previous years, there is a couple of new caption output formats that are supported. The product now supports HEVC input video so you can do streams. It's been qualified up to 200 Mbps of HEVC stream, so that's really pretty much a perceptually lossless stream.
You also can have DVB encoding for UK or European broadcasting and that can be the DVB teletext format or the DVB subtitles format that has the rendered bitmaps of the captions. We can also output the DVB streams or SMPTE 2038, which is a transport stream wrapper around VANC data–we can output those as a separate stream from the Alta system, independent of the audio and the video. So in those cases, you could be using an external multiplexer, save some bandwidth going in and out of Alta, for example, by sending only the audio and sending out only the caption stream. And we'll see how that's analogous to what would be the general workflow in 2110 in a second.
One other thing to mention is that, a lot of times, the Alta product is deployed kind of with other playout and channel-in-a-box products; the Evertz Overture or ORT is very popular. And another reduced bandwidth, very simple option with that is you can use Alta for the caption communications for the audio disembedding and the communication with the live captioners, the aggregation of multiple channels maybe to a single captioner, and you can take the output and actually shuttle that over through a nicast telnet connection to the external product if that product can take CTRL+A or 608 or 708 data. So, that's a pretty common installation, too, that helps save some of the bandwidth and complexity.
Finally, we've done a lot of development, actually, in Alta in the SCTE-35 domain which is, essentially, triggers for ad availability and program segmentation and we can trigger these SCTE messages based on IP input based on some HTTP API information from various types of scheduling systems, and Alta can inject those triggers into the transport stream in live production. Alta is also able to create a log of all of the inbound triggers if you set up a channel that has SCTE-35 data in the transport stream going in, you can actually log and capture all those messages and that can be very good for any type of discrepancy logging involving the advertising or segmentation.
Alta 2110 Workflow
This shows the Alta 2110 system. The 2110 system is very, very similar to the transport system. We provide that as a separate VM or server installation, but a lot of the software runs really the same way and would be very familiar. In 2110, sort of the major innovation is that the video, the audio and the ancillary data all travel in separate RTP streams rather than in the same stream, and the synchronization is handled by an external PTP clock, and that's kind of a standard technology that is supported by, you know, most higher-end switches and things like that, and you can get very accurate time synchronization to, you know, within, say, 10 nanoseconds or something like that, so, plenty enough for your audio and captioning for sure.
Basically with the Alta system, typically you're going to feed in only the 2110-30 audio stream and save the bandwidth then of feeding the full, uncompressed video in the 2110-20 standard and feeding the audio into the system and, basically, your iCap output captions are going to come out in the 2110-40 standard, which is an ancillary data standard, and at that point you can marry those later on as needed based on the PTP time stamps, you know, back to audio and video signals that are being processed separately in the facility.
Alta 2110 Features and Advancements
Because you have a lot of bandwidth savings from routing only, the audio and the ancillary data in 2110, it's interesting to think about the density that's possible with that. We can provide on a one-rack unit server 10-20 channels of the Alta 2110 solution basically using the same power, the same rack space that it would take you to do a single channel of SDI closed caption encoding.
So, I think that generally this helps captioning kind of scaled down in terms of deployment costs and complexity in a lot the same way that in a successful 2110 system, a lot of all the other functions can be kind of greatly abbreviated how much resources you need to use to actually accomplish that.
We have each channel in Alta–especially important when you have 10 or 20 channels actually running once–is separately configurable and that happens both through our local API and our local web interface which is what you're seeing on the screen now. Alta is also configurable in the input and the output destinations through the NMOS APIs, so NMOS IS-04 and IS-05 provide a system where 2110 devices from all different vendors are able to register their identities to a centralized registration server, and you can create routing–again, for video, audio and ancillary, either as a group or separately–between the different devices that are in the 2110 chain, and they’ll all register, they’ll all report and they’ll exchange STP files, which have the characteristics of the media so that they can talk to each other smoothly.
And we've participated in all three of the recent JT-NM Tested events–which have been prior to the NAB Shows was and prior to the IBC Show last year–to qualify the Alta solution for 2110 and PTP compatibility and NMOS compatibility and, now, 2022-7 redundancy, meaning the use of two separate networks to send and receive streams in a lossless way.
We’ve qualified that with pretty much all the other major vendors in this area. I think there’s about 100 different vendors that are participating in the testing program, including around that we just did virtually in March last month, so there’s been - there’s been a lot of progress in that and I think even though overall industry-wide the 2110 product cycle is pretty early, hopefully that’s gonna pay off the work that we've done there to kind of make this a smoother deployment than some technologies that have a lot of different vendors that need to work together to build a full broadcast plant, kind of working on the same page.
Lexi Automatic Captioning
So we’ll pivot to our last section and we're going to talk a little bit more about the AI services that we deliver to these caption encoders. Traditionally, live captioning has been powered by human stenographers, you know, when people are using what’s called voice writer sometimes, people who are using a, you know, human-assisted speech-to-text engine and, you know, these people do an incredible job. It's something that - it’s a hard job, and there is a lot of complaints, even when it's done well but, you know, there is a lot of good work that's done.
The human model has some issues with cost and scalability and so, of course, there's a tremendous amount of interest in applying AI to this problem and, you know, different presentations and kind of big-picture thought pieces on broadcasting in AI have often been very focused on closed captioning as being, you know, kind of one of the most obvious, easiest innovations and we’ve found that it’s actually rather difficult to deliver a really high-quality AI captioning service with low latency, with high reliability, that really fits into broadcasters’ way of working. But it is something that we've been doing with Lexi for about three years now and we have in the hundreds of customers relying on us to kind of keep that going and mostly very satisfied. So we'll take you through what we've been working on and look at how that’s shaping up, both now and looking into the future.
So Lexi, we currently focus on the three languages that are mandated for broadcasting at least in some areas in North America, which is English, Spanish and French. And the Lexi service can deliver word accuracy of 90% or more in a delay of five seconds or less. If you look at what, for example, US FCC asks you to do for captioning, they look at word accuracy, they look at delay, they look at an issue called completeness, which basically entails whether the program is covered from beginning to end, whether the - you know, all the segments are covered. Also, whether the service is consistent, whether it, you know, comes on on time and doesn’t leave until the end so, you know, this is something that scheduled computer technologies obviously do pretty well. There is also a positioning aspect and we'll talk in a couple of slides about how we've been looking at AI for that aspect as well.
Lexi Workflow
The Lexi service is delivered as a software on the service, so it’s hourly paid as a broad estimate because people pay a lot of different amounts. The hourly cost is probably about 10-20% of what you would pay for a traditional human-powered service.
The captions in Lexi, it’s a hosted service and it travels through iCap, so if you have captioning already happening with any equipment that uses iCap–whether that's an SDI encoder, Alta any third-party integrations we've worked on, for example the Imagine Versio or with iStreamPlanet–basically, any of those things you can deliver the Lexi captions to it without any new hardware and without any real new IT worker different configuration.
You really just have to authorize Lexi as a - one of your providers for captioning and since it's coming through the same tunnel on the same systems, it's very easy to use Lexi interchangeably with existing methods, like having Lexi caption off-hours in times when, you know, previously only prompter captioning was available, or having Lexi available as a backup or in cases where, you know, a human caption service isn’t able to service your request in time–anything like that.
So it’s definitely part of what's in mind with the Lexi service to make this something that, you know, could be used as a total captioning solution but also is very well-suited to being used as a partial captioning solution along with existing methodologies.
Lexi Local
Mentioning briefly a brand new product this year that we’re going to go into a little bit more in next week’s webinar, but might also be of interest in a broadcasting context. We have recently released a Lexi Local product and Lexi Local is a - you know, basic speech-to-text captioning product that operates on-prem and so, essentially, we ship you the server as shown in the picture.
It’s a one-rack unit item that kind of has the same types of basic input features as a regular caption encoder, but you can connect other caption encoders to this product and it acts as a server for speech-to-text services that's completely local, and so while this has, you know, a little bit - it’s a little bit less business as usual in terms of connecting out through iCap and things like that, a really big advantage of this for some organizations is that you have complete internal control over your data flow and you're not uplinking it to cloud services that, you know, is not always appropriate for situations the data that may be, corporate proprietary, may be classified, may be medically sensitive or any other type of category of essentially protected data that might have special restrictions or a complete disallowance on sharing it out to a vendor through the cloud.
So the Lexi Local system is something that, you know, EEG or no other partner vendor actually has access to the data through this, so it provides you with kind of a high-security option, you know, at at the cost of needing to really host it, you know, in your own rack as opposed to having it be a software-as-a-service type package, so we'll talk about that a little more next week. If you’re interested, or we can talk some more in the questions.
Lexi Automatic Captioning Features
The rest of what we’ll talk about with Lexi are features that are, you know, mostly being evolved in our main cloud product. And we’ll talk about three main AI features that kind of move beyond the core of speech to text and look at other ways to enhance the service to provide, you know, better accessibility and kind of go out of that box of just transcription to provide something more with AI broadcast.
So our Topic Model feature is the first one of these and in real basic English what the Topic Model does is it simply provides a way for you to upload custom vocabulary into your speech-to-text model. So that's going to mean that you can teach it new words that it might not know, you can teach it how to say the names using the phonetic pronunciation of jargon and things like that, that are not gonna be commonly known on a dictionary.
And you can increase the emphasis of things on a certain topic, like hockey is shown in the picture and it's an example of, you know, there may be a lot of words that are found in the dictionary but are going to be much more common in this context, so you can emphasize that by providing articles. And we read from text files, we read from Word document files, we read from - we’ll really can crawl and read from a website and pick up news articles and try to pick up as much information as we can through text analysis of what's going to be useful to add into your speech-to-text model. You can also feed it from a teleprompting system using the MOS protocol and that integrates with iNews, it integrates with ENPS and a couple of other products in that area.
Finally, we have a feature in Topic Models of EEG base models and what that provides is a system where, essentially, your learning structure into Lexi is a two-tiered structure where it begins with a model that we update at EEG to have most updated news, events and things like that. And then also, you add your own custom vocabulary on top of that and as we produce updates they get added to your updates, and the two are hopefully synergized together.
The Lexi Vision system is a option that can work with any iCap product that has the ability to update to post video to iCap, so most of the later SDI encoders, most codecs that you would use through the Alta transport stream product, if you've ever used the RTMP Falcon product–these all have an ability to upload little low-res videos for use and caption positioning.
And if you enable that on the encoding product and you ask for the Lexi Vision feature, then instead of having to choose a completely static caption position, actually Lexi Vision will analyze the graphics and it will try to place the captions on the screen so that they don't go over any of the text crawls, they don't go over any of the logos, they avoid faces of the speakers on the screen, which can actually be a very important accessibility issue for, you know, people who understand partially through lip-reading, and basically these are all FCC requirements that the captioning does something about this and in a lot of status quo implementations, this is sort of handled just by trying to have a fixed rule, you know.
We always put the captioning on the bottom a little bit above the crawl, but with Lexi vision, you're actually able to be a bit more dynamic than that and, you know, the captions aren't gonna bounce around all the time but, you know, they - they actually move in response to what's happening on the screen, and so they don't require you to kind of have the same rules for program inflexibly and all the time.
Finally, iCap Translate is a kind of supplementary service to Lexi that provides secondary language captioning, and so iCap Translate works in the text-to-text domain. It is in the cloud and delivered over iCap and controlled the same way as Lexi is, but what it will do is, it’ll take your primary language captions and those can be from any source. They can be from a live source or they can be something that's pre-recorded.
And on a phrase-by-phrases basis, machine translation is used to provide a second or, you know, even in some cases a third language of captions that's actually based on a translation from the first language. If the first language of captions has very good accuracy, the translations will generally have very good accuracy, too. The text-to-text process is actually generally more accurate than an initial speech-to-text process. So it's good for secondary language captioning.
Typically in a broadcasting application, you would only use two languages because it's difficult to get more languages than that through a conventional workflow and to the consumer decoder, but then in some types of OTT workflows you may have a player that actually can support additional languages, and so we can help with that as well in Translate.
Lexi Automatic Captioning Advancements
We've been doing Lexi now for three years, as I said, and have definitely begun to see some accuracy improvements, even in the time that we've been doing this so, you know, that’s encouraging. A lot of the promise of the ASR technologies, of course, that this is something that, you know, there is so much development in for basic consumer applications, in addition to more strenuous professional applications like broadcast captioning and the models do keep getting better.
We’ve added a couple of new models that are now available on eegcloud.tv, our public site, and if it's something where you've maybe tried this before and, you know, felt that the quality was sort of only so-so, you know, I would definitely encourage that you try again with some of the newer models.
I think that as a basic point of reference for kind of typical, anchored news commentary, we're seeing an increase probably from something that was more likely to be in the range of about 90% accuracy on words, to, you know, maybe more like 95% with some of the new systems, so that's a pretty exciting difference. I mean, that's a big reduction in the number of errors and, you know, I do expect that we're going to keep improving that. We're working with a number of different partners to try and put together the best system possible.
Recap
So that's the basic stuff we have to share for today. I think that, you know, there's so much going on in the IP transition and in the AI space right now that that's really where our development focus has been, probably since about the last NAB Show at least and, you know, moving forward, hopefully we'll all be able to get together for IBC and show some new things but, of course, everyone is just seeing how it goes.
But the Alta and Lexi solutions have been evolving very quickly and definitely we would like to talk more about that to anybody that has some interesting applications. The SDI encoders are, you know, still probably despite that being a technology that's stabilized to some extent, I think, you know, we still deal on a daily basis with people who are, you know, having problems captioning with dial-up modems. That’s a situation that, you know, all the telcos have gone digital on VOIP with the data; they're not trying to support 56K modem anymore and, honestly, that's become a little bit of a disaster in the live captioning field and so, you know, the HD492 and your basic iCap encoders, I think, still have an important role in that.
And in preparation for, you know, once you've got the iCap connection in there, you're sort of ready to start working with some of these other services and start making a path into IP because, you know, clearly as plants move into public cloud and things like that, I am not really aware of any availability of putting a modem up there, so that's been on our, you know, thought map for many years and I think is really becoming a reality, so it’s really an exciting time.
Q&A
Thank you all for coming. I think Regina is going to help take some questions. I hope we’ve got some good ones. I think we have as much time as anybody needs, so let's go to that.
Regina: Sure, yeah, so we have reached the question-and-answer portion of this webinar. If you have any questions and haven't already done so, please feel free to ask your questions in the Q&A tool at the bottom of your Zoom window.
So, we were asked if there will be a recording of this webinar. I did want to let everybody know that yes, this webinar is being recorded and this will be available both on YouTube and on our website. I will also notify all attendees of today's webinar of the recorded and published webinar.
We also got a few questions about iCap Translate. First of all, can iCap Translate be activated on the fly, like for a guest presenter at an event?
Bill: Well, I mean, you - you can activate iCap Translate on the fly. What I would be concerned about from hearing that question is that, you know, iCap Translate is going to take the captions that exist in the first language caption track and is going to create a translation in a separate second language text track, so I would be concerned that in some applications that's not exactly the tool for the job or what's required because, for example, if you had speakers who were speaking in, you know, in English and in Spanish, and they were, you know - one speaker was gonna be English and one speaker is gonna be Spanish, and what you wanted to get out of the process was to have a single caption track that was all in English, unfortunately that's not really the application that Translate is addressing, because what Translate would do for you is if you had, say, an event where all the speaking was in English but you wanted to have a caption track that had the whole event in Spanish, then you could have an English track and a Spanish track on the captions.
But it doesn't actually, you know - Translate relies on having a first caption track that is in a certain language first, so I'm not sure that for that case of a mixed event that would definitely work. It kind of seems like more what you would need is, let's say you were using Lexi for captioning. You would need to switch between a Lexi job with English setting and a Lexi job with Spanish setting. But then those would be, you know, a single caption track with two different languages, just like in your audio track you had a single track with two different languages. So it could get complicated. It could get complicated.
It really depends on what the desired end product is, but I know that's an interesting case and some of that language detection is actually a technology that we've been looking into a bit and I think it would help a lot of people in the live event space.
Regina: Alright. And then, with iCap Translate, are there any additional - are there any plans for additional language support?
Bill: Yeah, so currently - currently we support about - I think about eight languages and we support the ones that basically the 608/708 standard supports. A very similar set of languages that's not completely overlapping but very similar is supported in teletext, which is kind of the other most common standard for different downstream products to support.
We get a lot of questions about wanting to support Translate into some - you know, especially Asian languages that don't use the same character sets at all. And one of the major problems with that is that downstream products that play the captions back often don't have a clear way for you to actually send captions in those languages. So we've done some work in real-time captioning in Chinese, in Korean, in Japanese, but typically those systems really only work when you're using kind of an overall workflow that, you know, you know supports those characters.
Unfortunately when you use a lot of sort of off-the-shelf software that’s common, you’ll actually find that your streaming players don't really have support for live captions in those other languages. So it's kind of an industry-wide issue and we do what we can but I would say we could support those through Translate, you know, when we have customers that actually have the ability to ingest the data and show it to people.
Regina: We had a couple questions asking about the Falcon product. So next Tuesday we have another webinar at the same time. It's going to be at 2 PM Eastern, 11 AM Pacific. We'll be talking about A/V, live events and online communications. There will be a link to sign up for this webinar going out to all attendees for this one, but we will be talking about Falcon in much more detail next Tuesday.
Bill: OK I don't mind answering a question about Falcon if there’s a question.
Regina: So it was more about if we were going to discuss the Falcon product and, if not, when it will be discussed.
Bill: Oh. Yeah yeah, okay, so the Falcon product is a cloud-hosted caption encoder that is RTMP in and RTMP out. Yeah, the reason that’s kind of being put off to the next webinar is that most typically we find that that product is being used in educational contexts, in municipal communications contexts, in corporate communications contexts, so really our next webinar next week is going to be more A/V-focused, you know, because typically then you're talking about an RTMP stream that’s maybe in the range of 5-10 Mbps, so that kinda differentiates this from the Alta product, which is typically either on-prem or in kind of a cloud VPC, something like that and might be delivering, you know, more like mezzanine-compressed video that could be something like 10 times that rate. So that’s why those are sort of two different product families and we’re addressing them as the two webinars. But there’s some crossover definitely.
Regina: Someone is also asking about EEG’s roadmap for 12G SDI.
Bill: Yes, so as we covered in the video in the fall by the end of 2020, probably at the IBC show if things go according to plan, we are going to be officially releasing and shipping encoding and decoding in 12G SDI. And that’s basically going to be, you know, a unit that's going to be very similar to the 492. It’s not going to obsolete the current 492 anytime in the currently roadmap future, but there will be a unit that basically has similar features and has a new 12 Gbps input board, so that is something we're actively working on.
Regina: Alright, and someone is asking about the best suggestion to use the iCap - to use iCap Translate with 90% accuracy by using a professional captioner or Lexi. So, maintaining accuracy when using iCap Translate.
Bill: Yeah, so Translate is - translation accuracy is, you know, we've done work on this and kind of tried to come up with the best ways to rate it, and translation accuracy in general is a little bit harder to attach a number to than word accuracy in it speech-to-text, and of course that has its own challenges, too, which I won’t get into right at this moment.
But, you know, especially in Translate, you know, of course there are multiple translations for any phrase that could basically preserve almost all of the meaning and be mostly correct. There are also a lot of, you know, ways to misinterpret things. And whether that's on, you know - obviously whether that’s on a word level or phrase level can be a complicated topic. You know, that being said from a basic subjective viewpoint, I think the, you know - the translation accuracy of well-formatted text using Translate is probably over 95% if you kind of want a single analogous number to the kinds of numbers you would use in word error rate for Lexi.
Now, if you're starting with a transcript that is imperfect in the first language which, you know, any live captioning performance is generally going to be not 100%, really that’s something to look out for because the inaccuracies in the performance on track one are going to sort of probably be multiplied by going through the translate process, unfortunately, because, for example, words that might sound similar in one language that might be transcribed imperfectly but it's kind of easy to understand where the error lies, once you put that through translation, it can become harder to see how this ever seemed right to anyone.
So it is important to start with the best-quality captions possible. If you have a Lexi translation that’s in the 90’s and you put that through Translate, I think your results are still going to be pretty good. If you have a human captioner performance that’s 95% plus and put that through Translate, then I think that's going to be very good also. You know, obviously the best possible case would probably be that you had something like a pre-recorded script that was 100% accurate and then we’re putting that through Translate, but clearly for a live event, that’s usually not going to be the case. But any good initial caption performance should still be pretty good when you put that through Translate.
Regina: And it looks like we have time for just one more question, which is asking if EEG supports NDI.
Bill: EEG does not currently support NDI. That is something we're sort of interested in, especially for the A/V space. NDI doesn't support closed captions. For anybody that's not familiar, NDI is a IP video standard that basically shares compressed video and is designed to be used over a more - let’s say a more, like, typical office network than the kind of, you know, manage media network that the 2110 standard envisions.
And NDI actually doesn't support closed captions technically, but what you can do with NDI is send - essentially send graphics overlays that can be composited with other graphics overlays sent over NDI, and that's been something that we've been looking into doing. You know, again mostly for having open display when you have live event A/V captions, you know, kind of similar, too, and we’re gonna talk about this product in a lot more detail next week but our AV610 product, which is for, you know, in-person caption viewing and gives you a couple of options to squeeze back and scale the video to preserve a space to display captions on-screen or to just have a, you know, kind of a jumbo screen that’s just text or text and a logo that you can display captions to people who are actually attending an event in-person that way.
Regina: Alright, well, thank you everybody for all of your questions and thank you for joining us today for Best Practices for Broadcasting and Closed Captioning. Take care and I will reach out to everybody with the recorded webinar and with updates about next week's webinar and how to sign up for a one-on-one meeting. Thank you.
Bill: Thank you.