The potential use cases for Call Transcription within evaluagent are huge. By transcribing calls > text, we can mine these conversations for insight (eg: topics, audio metrics and sentiment) and facilitate AutoQA through our new SmartScore feature. However, generating an accurate Transcription can be complex due to the number of variables involved. This user guide aims to help explain how Transcription works inside evaluagent and get you set up for success.
When working with call recordings, assessing the accuracy of the transcriptions is one of the first activities you will want to do. The transcription should be good enough to read and understand without listening to the audio but it won’t be perfect.
- There will be words and phrases specific to your organisation (product names or brands) that the system cannot recognise ‘out of the box’.
- There will also be certain speakers that are picked up more effectively than others.
- The impact of background noise will also mean that everyday words are randomly missed.
All of these affect the accuracy, but many can be addressed over time as you tune both the transcription process and write effective prompts and queries.
How does Transcription work inside evaluagent
Step 1: Check you are providing us with the best audio possible. Click here to see what we recommend.
Step 2: During setup a list of words/phrases which are more likely to be said within a conversation by the agent (ie: take your account number) are provided by you to our Transcription Engine. These words and phrases support a more accurate speaker diarization (labelling who said what, and when) than what can be achieved “out of the box”.
Step 3: We push the audio file, along with the words/phrases through our transcription engine to turn the audio in text. This is then turned into utterances (segments of the conversation) which are “stamped” with who said what, and when.
Handling Infrequent Scenarios:
- It is important to note if your recording contains hold music or the recording of IVR this will be transcribed. As the Transcription is used to power Insight Topics, Sentiment and our Automated Line Items this will likely impact the accuracy and effective of these features.
- The above, and behaviours such as "Start and Stopping" the recording system as part of your compliance workflows may impact the performance of Audio Metrics depending on how your Call Recording platform handles this, and what is included in the Audio File that is shared with us.
Step 4: The transcription is mined for conversational insights and used by the large language model through the evaluation process.
Recommend Approach
Here at evaluagent we have invested in a “market-leading” transcription engine to provide you with the best experience possible across the largest number of scenarios. This includes investing in a transcription engine that can handle both mono and stereo call recordings and is capable of being “Fine-Tuned” over time on your industry, products, or services.
It is important to recognise however, the experience you have with our products and features will be dependent on the quality of the audio file your call recording partner can provide to us, with calls being recorded on Mono (1 channel), being compressed to a low bitrate and having excessive background noise facing the biggest issues with speaker separation, word accuracy and insights.
For the best experience, we recommend the following:
- Split Channel (Stereo) with one speaker on each channel
- Uncompressed PCM WAV Files
- A sample rate of 22.1khz or better
- A bitrate of 64kb/s or higher when files are compressed
- An audio file that only contains the parts of the conversation you wish to be transcribed and analysed.
If you are unsure if your existing call recording system can achieve this, please speak to them directly. 5-10 years ago this may have been a challenging conversation however the majority of call recording vendors now recognise the importance of high-quality call recordings, and storage costs have been reduced significantly so many providers now offer Stereo by Default or will have the provision to provide it.
Need more info on Bitrates and Handling Compressed files? Click here.
Struggling to see success? What can be done?
At the time of writing, there is no Transcription Provider who claims to offer 100% accuracy in their speech-to-text model or speaker separation. In addition, it is important to reiterate that enabling Transcription is not a one-off event and will require time and effort from your account admins to ensure it is accurate and keeps up to date with changes in your organisation. This can be done by regularly reviewing your conversations and ensuring the words & phrases remain relevant. If they do not, please contact us to update them.
In addition to regular maintenance and recording in high-quality audio the following situations and solutions could be considered:
In summary
We hope this user guide has provided you with everything you need to know to get the most from our Transcription feature. If you have any questions or need further support after following the steps listed above then please get in touch and the evaluagent support team will be happy to help guide you.
Additional Technical Information
Additional Information: Audio Files and Compression
Most telephony platforms will have some compression. If compressed files are provided using Codecs such as .MP3 or .WMA are provided, then the ideal set-up would be:
- Sample rate 22.1KHz or more
- Bit rate 64 kb/s or more
- Stereo
Typical formats we see that are NOT good enough for speech analytics are:
- MP3 with a bit rate of 16 kb/s or less
- WMA with a bit rate of 24 kb/s or less
- A sample rate of 8khz or less
Additional Information: Building a list of agent word and phrases
- Start with a booster list that contains all the obvious words you would use within your business. These will range from brand names, products, and services.
- Include any industry specific terms which might be relatively common but not in every conversation e.g. ‘insurance’ or ‘claim’ are words that the transcription engine wouldn’t expect to see all the time, therefore may require boosting.
- Review the words and phrases that are frequently used by both agents and customers. It will be easier to boost words used by agents as they are likely to be most frequently occurring. In the future, we may be able to include words and phrases that are more likely to be said by the customer.
- Review the audio and transcriptions to see what other words and phrases are frequently misheard. Keep a record of the transcripts so our support team can assist you with more bespoke advice. Don’t just look at misheard or ‘substituted’ words but consider ‘insertions’ - words that shouldn’t be there - and ‘deletions’ - words that were removed.
Comments
0 comments
Article is closed for comments.