Accessible Multimedia: Captions & Transcripts

In order to meet minimum WCAG guidelines, all video must have some kind of captions, and all audio must have transcripts. But what kind? And how do you create them?

We examine the WCAG requirements for accessible video and audio, as well as explaining the different types of captions (open, closed, and subtitles), and the different types of transcripts (automated, edited, and descriptive). And finally, we look at different methods for creating captions and transcripts, with a focus on finding methods that are both low-cost and easy.

An automated transcript is available below.

Okay. Today, we’re talking about accessible multimedia that pertains to captions and transcripts in particular. A copy of these slides can be found at go dot UVM aimed edu slash accessibility along with this video and a full transcript. A full edited transcript of this tutorial. And we’re going to get into what specifically those mean when we talk about multimedia this morning.

We are talking about video and audio in particular, we’re going to go through accessibility in terms of basics and legal bits and also what the university of Vermont Center on Disability and Community Inclusion CDI requires of our own multimedia them. We’re going to go through captions, the types of captions available and how to create them. And then finally we’re going to go through transcripts and again, the types of transcripts available and how to create them.

We are going to start with the very basics. This figure might seem familiar to you, but 67% of accessibility failures come down to design choices instead of limitations of a platform or limitations of technology. 67% of all of this is down to choices made by the multi-media producer. I frequently get asked if certain pieces of multimedia need to be accessible.

They’re really short videos. Really short little snippets of audio. Yes. Yes. Your multimedia needs to be accessible. You want to create an inclusive and welcoming, accessible space for people to engage with what you are producing. In terms of multimedia, you’re putting a ton of time and effort into creating these beautiful pieces and you want everyone to be able to enjoy them.

We are creating multimedia for people with permanent disabilities, people with temporary disabilities and people with situational disabilities, as well as people who don’t identify as falling into any of those three categories. Now, what does that actually mean? Well, people with permanent disabilities that might affect what they need in terms of accessibility for multimedia, that could be people who are deaf or hard of hearing, that could be people who are blind or low vision using a screen reader.

There are also auditory processing difficulties that benefit from being allayed by captions and things of that nature. Temporary disabilities is if you know when you go to the optometrist and they dilate your eyes and everything is very washed out, you have a hard time seeing until the medication wears off. That’s a temporary type of disability. Maybe you are out in the cold of Vermont winter and you forgot your gloves and your hands are really stiff and you’re having hearing trouble operating the controls on a media player.

That’s a temporary disability as well. A situational disability is maybe you’re on the bus, you’re riding the bus and you want to watch a video, but you can’t play the video out loud with the audio up. You forgot your headphones. You can’t play the video out loud with the audio up because that would be so disruptive and rude to other people or what if you’re sitting in direct sunlight and you’re trying to watch something on a screen and the sun glare is so much on the glass, so you’re just depending on the audio for that video.

Those are examples of situational disabilities we are designing for everyone. So with multimedia accessibility, it has two parts. It has the automated and the manual accessibility tasks. All accessibility work you’re going to do has automated and manual tasks. When it comes to accessibility for multimedia, the automated task is simply generating captions or generating a transcript. The manual task is ensuring that those captions and transcripts are usable.

We’ve talked a little bit about accessibility versus usability before. Accessibility is simply making sure that whatever you’re creating, people can access using the platform of their choice. They can actually get hold of the material you’re presenting and usability is making sure that material is easy and efficient to consume. Now, who sets the standards for accessible multimedia? They’re back.

It’s the book. AG. WCAG is the Web content accessibility guideline and a set of international standards for making content more accessible to people with disabilities. While it does say Web, they do also cover PDFs and multimedia. And the group behind it is the Web Accessibility Initiative. It’s an international group. They are currently working somewhere between two point WCAG, 2.2 guidelines, and WCAG three guidelines.

We are going to talk about what meets accessibility for WCAG 2.1 guidelines In this presentation. So what can 2.1 guidelines have three levels a level double AA level and triple AA level A level is basic accessibility. If you are not meeting a level requirements, your work is inaccessible. It’s completely inaccessible. Double level is reasonable usability. If you are meeting the requirements of AA level accessibility standards, then your work can be considered reasonably usable.

This is the requirement that currently in the United States, public organizations such as the University of Vermont and the Center on Disability and Community Inclusion must meet triple-A level goes beyond that. It’s really talking about what is the gold standard for being accessible and usable and inclusive of. So let’s talk about videos first. For videos with any kind of audio, this can be spoken audio.

This can be nonspeech utterances, music, sound effects. What gags? A level requirement is that there are automated captions available. What guides double a level is that those captions are edited in some way. In order to be usable, your captions really need to be edited so you can ensure that they make sense and triple a level is those edited captions along with audio description or ASL interpretation.

CDCII commits to double a level accessibility at this point in spring of 2023, but we also try to provide ASL interpretation and wherever possible. And in fiscal year 2024, we’re going to be talking about how we can expand that service. And also provide audio description for a lot of our multimedia. So stay tuned. If your video does not have audio of any kind, WCAG says that the A and Double-A level is just simply an audio description.

Many places fail to meet this standard, but you have to at least try. triple-A level is audio description, plus a descriptive transcript. We’re going to talk about what both of those things are for audio pieces. What can a level requirement is an automated transcript of some kind that can be taking your captions file and just having you along with the timestamps and just throwing it up.

That can be all of the content of the audio in one big paragraph. I’ve seen that before, but Double-A level, meaning that it’s actually usable, is to have an edited transcript. And that’s the same as triple-A level. The edited transcript, that’s 2.1 work guidelines at this point. But wait, there is more. What guides 1.1.1 requirement is it’s an AA level requirement.

So, you know, you have to meet it in all cases. It is that non text content you must quote unquote provide text alternatives for any non text content. So that it can be changed into other forms people need, such as large print, braille speech symbols or simpler language. What this means in practical terms is that if you create a webinar or you record a presentation as a video and you put it up to share with people, you also need some type of text alternative that people can access and manipulate.

So is that, is that captions alone captions are screen reader accessible, but I would argue and CDCI, I would argue that it means that for your video content you need an actual transcript. If people are going to be able to access your content effectively. A lot of people don’t watch videos. It’s just not something they’re into or people also can’t skim through the text if they’re looking for one particular nugget of gold.

In amongst everything that you’ve provided in these riches, they’re just trying to find the one nugget they remember from the presentation and you can’t search on captions. So we recommend a transcript for your video content. So that’s just universal design, right? We’re talking about universal design and making things accessible for people in the different places and methods they use to access information.

All videos must have an audio track of some kind, and he’s people simply prefer to listen rather than watch. All audio must have a transcript. Not everyone is going to be able to access audio. A lot of folks just need a transcript, so all videos must have a transcript. That’s how we’re interpreting WCAG 1.1.1. So let’s talk about captions.

There are three types of captions open captions, closed captions and subtitles. Open captions are captions that are burned into the video. Let’s take a look at this video. This is a video that was created by ITVS, a British television channel showcasing drag Race star Tia Coffey and this is for Pride in 2022. What if we turn off the audio so you can see that there’s text burned into the video?

I would argue that text is not accessible, but when you look at the bottom of the screen, the audio dialog is burnt into the bottom of the video. Let’s listen. I wish I could have instilled myself with more confidence in who I was right from right from the get go, because it is a process that it’s taken me so long to sort of be.

If you turn off the audio, the captions still are burned into the video. But you notice down here, this is on Twitter, there’s no controls for turning captions off. And we’re going to talk about why that’s an issue. So pros and cons of open captions. Open captions are amazing for events such as you’re going to show a video on a large screen at the front of a big room.

You want to put on open captions? I think so that people sitting at the back of the room can can access the audio. Maybe the audio system is a little musi people in the audience are talking. There’s some sort of raucous workshop going on in the room next door, so open captions are great for that, but open captions might be distracting for some users who are working with cognitive or attention related disabilities.

They are hard to create and hard to edit. Once you’ve burned the captions in to your video, it’s hard to go back and and edit them out unless you you want to actually edit pieces of the video. You can’t separate the audio from the video necessarily. So or at least I should say they’re harder to create and harder to edit.

They’re not compatible with screen readers. This is a big one. They’re they’re literally pixels that you’ve embedded into the video, so screenwriters can’t pick them up and users can’t customize them. You’ve folks who are working with low, low vision conditions who need to increase the size of the captions, need to turn them to a different color because they have a certain type of color blindness.

Or do any of the other things that there are amazing extensions to help people follow along with text on screen and your open captions will work with none of them close used captions. These are captions you can turn on and off. So this is a video CDCI I created. It is part of our our podcast line. And this is just the YouTube video player.

You can see there’s a little button down here. CC You can turn closed captions on and off. So if we start playing this song, I would like to start with with an introduction and it’s just and I don’t know you at all. We’re meeting for the first time, but I’m introducing you based on having read your book, followed you online, and also read a bit about you so you can turn the captions on and off, on, off, depending on who needs what.

Right now I have the audio on this video turned off, so if I was working with an attention difficulty, I might be just wanting those captions off. But the audio on all Rebel, some of the many hats she wears our homes but maybe I’m on I’m on that bus. I can’t. I forgot my headphones. I can’t play it out loud.

So I’m going to have to just watch the captions in order to to follow along. Open, closed, close captions. So pros and cons of closed captions. Closed captions are compatible with screen readers. Users can customize them in most cases they are searchable to a large extent by search engines such as Google or Bing, which is going to boost your SEO, your search engine optimization can make your content easier to find.

They can be turned off by users with different access needs. A lot of times we are working with competing access needs. That’s when one person needs captions on one person needs captions should be turned off and so people can customize for their own, their own situation. The cons of closed captions is simply that it requires editing time to make them usable.

Subtitles. I see this all the time and YouTube is to blame for this, but subtitles are a type of caption. They are captions that translate one spoken language into another. So I have here a video clip from the 1994 martial arts film Wing Chun. It was originally in Cantonese, so the language of the film is Cantonese with yellow and there the actors are speaking Cantonese.

That’s the audio track that you hear, but the subtitles on screen are actually in Spanish. Someone’s taking the time to translate the film into Spanish, and those words appear on the screen. Let me know you the truth on Twitter. Now, these are an example of open captions they’re burned in. I can’t turn them on or off. See that closed caption button is grayed out.

But for folks who would like the to watch, listened to watch the movie and understand it, but don’t speak Cantonese but do speak Spanish, they’re all set. Those are subtitles. Now we can do a little pop quiz here. Enough pause here for a little pop quiz. Let’s go look at this particular video. It is called When Animals Don’t Attack.

And I want you to watch a little bit of it. We’re not going to watch the whole thing and ask yourself, what kind of captions is this video using? And is this a video with audio content? Okay. Okay. That’s that’s enough of that, especially if you’re just listening to the audio track of this tutorial. So this is a this is a social video called When Animals Don’t Attack, talking about people’s Fear of sharks.

And throughout the video, this we’re using the YouTube player here. The captions that the video is using are open captions. They’re burned in. You see that closed caption button is grayed out. Is this a video with audio content? I would argue that it is not a video with audio content, but those open captions that are burned in need to be voiced over in order for it to be accessible or there needs to be a transcript of the text.

Otherwise the video is completely inaccessible. I would argue this video does not meet single a level accessibility criteria. This is a different video. This is a video that was produced by a group here at UVM. And we’re going to ask again what kind of captions is this video using? Is this a video with audio content? I built the trebuchet catapult and all the materials were recycled.

I studied why writings exploit pushback. I just quoted a few things like an animation, like some games on a program. So I would argue that they’re this is this is a video with open captions, but it also has closed captions available, even though they’re simply the auto generated version of captions. It has open captioning for all of the dialog that is in the video.

Is this a video audio content? Absolutely. Because it has dialog there. That’s important to the content of the video and the the the dialog is available through both closed captioning, even though it’s just automated and open captioning. So swings and roundabouts. Right. Pluses and minuses, pros and cons to that. But it’s another example of the way people use and distribute captions.

So let’s talk about how to get automated captions if you upload a video to youtube. YouTube provides automated captions automatically. Another way to get automated captions is to import your video into Adobe Premiere or a software called Cadet, both of which will provide you with an automated captions transcript that you can then edit. Your third option is to send out for commercial captioning, or you can send it to someone like Rivka Hom who charges you a fee.

Now how to get edited captions. The EVM Access Center does a great job with edited captions. They are available to different folks within the UVM population. Check with your department. You can also edit your automated captions on YouTube directly in the the what they call the subtitle editing panel, which is not a great term for it because we know subtitles are more than a specific type of caption.

But there you go. You can edit your automated captions in Adobe Premiere, Camtasia or Cadet? My favorites Camtasia, which is a UVM endorsed program. And again, you can send it out to pay for someone to edit your captions for you. So let’s look at these some of the caption editing software that’s available. Software such as YouTube, The cost is free and it’s somewhat easy to use.

It’s gotten a lot better in the past few years than it was Camtasia. I believe it’s somewhere along the lines of $100 per dollars, per seat per version. It is very easy to use Camtasia to create and edit captions. Adobe Premiere is incredibly expensive. It is also provided by UVM in on machines in the house library in the basement.

Anyone with a UVM net ID can access those. It is not easy to use. I would argue that if you unless you are using Adobe Premiere to edit your video, there are easier ways to deal with your captions. Cadet is a free software package that is not easy to use but is technically available. I’ve used it. It is.

It’s sort of a bear to wrangle, but it is available again. Subtitle courses. Another set of captioning software that has a free tier for you to try. And then you have to sign up for a subscription to access it. It is really not easy, but you have all these options. What I find easy to use might not be.

What you find is it is. So you can try these for yourself and decide how you want to move forward.

So why, why do we go with edited captions rather than the automated captions? We go with the edited captions because the fresh automated baptism plate now are Sony 64% accurate? Right? Everyone get that. We’re all on the same page there. We could begin. We get. That’s right. The best automated captions right now are only 60% accurate. That first sentence was an example of a sentence with ten words in it where six of the words were accurate.

That’s an example of 60% accuracy, and that’s really the average of automated captions right now. It’s not good. So let’s pause here and talk a little bit about audio description, which we touched on earlier. This is a video created by the filmmaker Cheryl Green, and it is an audio described piece about cooking with brain injury. Cheryl Green has a brain injury and is a filmmaker who produces a lot of videos.

So if you are not accessing the audio because you’re blind, you’re low vision, you’re just not watching it, you have a full picture of what is occurring. So we also talked a little bit about ASL interpretation. This is an example of racial interpretation along with the text and regular content of a video. You can see the speaker is up very large, nice, large face for people who need facial nonverbal cues or are lip readers to be able to access this.

The speech of the speaker along with directly underneath in the lower right hand corner is our ASL interpreter and then large and to the left are the contents of the slides. Let’s take a quick watch. Hello, my name is Chelsea dear. What? I use the pronouns she her and I’m a speech language pathologist. The area of focus. Okay, so you can see that as the person is speaking, you have an interpreter who is signing using American Sign language.

That’s triple A level accessibility. But CDCI. I would like to do a little bit more ASL on all of our videos. There is definitely an argument that ASL is some people’s native language, some deaf people, that is their native language. So when we talk about language access, we want to talk about whether ASL should be included as a default.

So language access is, which is when you think about which materials be offered in which different languages and. F24 CDCI, I will create our own language access plan and we hope that UVM has a language access plan soon as well. So let’s pause here for a second and talk about online presentations, workshops and events. If the video you are giving is in Zoom teams or Google Meet, all of those provide automated live captioning you can book if you are at UVM.

The UVM Access Centers provides ASL interpreters who can be reserved based on the status of your department. At CDCI, we are able to book them for our events fairly readily and then after online presentations, workshops and events, you really want to think about how you distribute the finished video. We did a whole entire set of instructions on how to create maximally accessible online presentation recording.

Again, you have that speakers face nice and large so people can can look at nonverbal facial cues and or lip read. You have your ASL interpreter underneath the large speaker. We have the edited captions available and also the slides off to the left hand side and you can find that set of instructions over it. Go dot UVM that edu accessibility.

So moving to audio, let’s talk about types of transcripts. There are three. There’s automated, edited and descriptive transcripts. Automated transcripts. Are they can be as simple as a copy of your captions file with timestamps that you pull from YouTube or Zoom. YouTube provides you with that a choice of, I think, four different types of of file extensions to create your transcript from Zoom, we’ll just give you a vetted file which you can turn into a a text transcript.

You can also hire out for a machine automated transcript such as from Rivka, which costs some money but is really cheap for what you get when you think about a lot of these presentations or videos are like an hour in length. That’s a lot of work to get through. So pros and cons of automated transcripts, technically they’re better than nothing.

And they mean look at one point, 1.1 a level guideline, but in practice they’re pretty difficult to read through, pretty difficult to scan and and use. So what about edited transcript? Edited transcripts are readable and useful. They help boost your SEO. Not everyone watches videos, so they just might go straight to the transcript and skipping the video. And that’s okay.

That’s just their personal preference. You’re okay with that? And edited transcripts help people skim or search for relevant information. You can include extra resources such as links or images, and you can lay out the transcript with headings. And it’s a it’s accessible to screen readers as well. The cons of an edited transcript is simply they take time to edit.

Now how do I get edited transcripts? You can import your automated transcripts such as what you, you know, your captions file with the times timestamps. Just plop that into word or word press and then start removing your time stamps bit by bit. Yeah, that, that’s what it looks like. That’s, that’s how it goes. Or you can send out for commercial transcription and have someone do it for you.

But the key commonality there is someone is going to have to put in the work. Now for editing a transcript. What you’re going to want to do is you want to delete all the time stamps that are if, if they are included, you’re going to want to indicate different speakers by paragraph. You’re going and you’re going to want to bold each speaker’s name.

You really want to come out with something that looks like a an interview. You might read in The New Yorker, Harper’s magazine, just indicating each speaker taking different turns. You can add in links, images or other videos to provide context. You’re also going to want to transcribe nonspeech sounds when they are important for context. I was watching a baking show this weekend and the captions on and the chef plunged his knife into a pastry and the caption read Pastry crunches not only did I appreciate that because I didn’t hear the pastry crunch whatsoever, but immediately following that shot, they went to one of the judges who said, Oh, that’s the sound of a good pastry right there. So that that pastry sound was important to know that it had been made enough of a crunch. It was picked up by one of these judges, and then it turned out the dude with the French pastry won the whole thing. So there you go. Now, best practices in graphic design apply to transcripts. We’ve gone over accessible graphic design in terms of fonts and readability, font size, color, choice, color, contrast, white space.

You can get that whole tutorial over it. Go dot. You’ve aimed to reduce accessibility. There’s slides, there’s video, and there’s an edited transcript. So again, edited transcript should read like print interviews. I took this snippet of text from the video we watched earlier about closed captions from the podcast episode reproduced, and you can see that we have two speakers here and Hannah and here starts off the dialog and talks about wanting to introduce Hannah.

And she’s, you know, she’s a little nervous. She says, I’m going to give it a shot. And Hannah says, Sounds good. That’s an important switch in turn, taking right. You need to know that Hannah agreed to his request for reassurance that it was going to be okay to introduce her the way she she goes. So a word about descriptive transcripts.

Descriptive transcripts are usually for videos, and they include written text and the audio content is paired with written visual descriptions. So you might have the audio description right next to the dialog in a table, and that way people can navigate through them using keyboard tabbing through or with a screen reader. And I’ve included an example of a descriptive transcript here in my slides you can see that it’s fully laid out as a table so far for the door, resist the freak out.

Accessibility is hard and it’s a process. Just take one thing to work on at a time from this video. Just take one thing and focus on it for like a month. Get really good at it. Like put a post-it on your laptop screen and be like, Oh, this time we’re going to make sure that all my videos have at least automated captions or, Oh, I’m going to spend 30 days just reading through, scrolling through my automated captions to see how close they they are to accurate.

Number two, you are always encouraged to ask for help. All questions rock, especially about accessibility, because if you are asking the question, somebody else is too shy to ask the question. Or maybe they feel like their question isn’t good enough to know all questions. Rock So you should ask questions. Number three get feedback from people with disabilities. If you think your automated captions are good enough, make sure you run those automated captions passed people with disabilities to get their input to see how good you think you’re doing.

And number four is don’t be afraid to advocate for what you need if you are. If you are given a resource to peruse for a class or professional development or in A to meet with other collaborators on a project and you need captions on it, or the automated captions aren’t good enough, or it’s a podcast episode and you really need a transcript.

Don’t be afraid to advocate for what you need because other people will need it as well. Case in point, I joined a new community advocacy group up in Burlington and they’re just getting off the ground. They’re just starting out and immediately they started a podcast. I was excited about that podcast because I love podcasts. I’m also hard of hearing, so I need transcripts to be able to access any podcast episode and they put out sorry, putting out podcast episodes, boom, boom, boom, and there were no transcripts, so I had to be that person that was like, Hi, I’m just wanted to ask, can we get transcripts for this?

And it took them a while to figure out how they were going to do that and make it cost effective. But they did it. And then other people joined in the comments and they were like, Oh my gosh, this is so great that we have transcripts. I wish there were transcripts as well. Now, there are great job of the transcript, like people were super excited and so just takes one voice to advocate for what they need in order to help everyone get more accessibility stuff.

I’ve included here some resources around captioning there’s red dot com which does automated captions at $0.25 a minute. DC amp is the described and captioned media program. They have a lot of great information about different types of captions and different methods. And then Cadet that software I mentioned that’s free is the caption and description editing tool. Technically it’s free.

I found it really difficult to use. I would love to hear from other people who found it different. Here’s some resources on transcript. Again, Rivkin does transcripts. WCAG has a whole page of best practices for transcriptions, and then there’s a website called D Script that’s d e script dot com, which does unlimited transcription starting at $12 a month.

A bunch of people I know who do media production for a living use this service because it is cost effective and very high quality. And here are some resources around language access because it’s a huge topic and it’s going to get more and more important. AUC, The Association of University Centers on Disability has is working on a language access plan and there’s a PDF that takes you through what they’re doing.

This is a great article on why we need language access plans. It is cute for amoebas, language and coloniality non-dominant languages in the digital landscape and it talks about the need for more readily available translations of materials into different spoken languages and why that is important from an equity perspective. And then there’s a link to requesting an ASL interpreter from the UVM Access Center.

I’m Audrey Homan. Thank you for your time. This presentation is licensed under Creative Commons Attribution Noncommercial Share. Like as long as you give us credit and you’re not making money off of it and you’re not trying to copyright it, please share this presentation and its materials. You are free to share it and remix it. If you have any other questions, please go to go dot UVM to edu slash accessibility.



Leave a Reply

Your email address will not be published. Required fields are marked *