Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Class notes

Class notes autism

Rating
-
Sold
-
Pages
6
Uploaded on
31-08-2025
Written in
2025/2026

“These are my autism classroom notes, written in a clear and organized way to help with learning and revision. They cover key topics, explanations, and examples discussed in class.”

Institution
Course

Content preview

arXiv:2401.03816v1 [eess.




Copyright 2024 IEEE. Accepted to ICASSP 2024 - 2024 IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP), scheduled for 14-19 April 2024 in Seoul, Ko-
rea. Personal use of this material is permitted. However, permission to reprint/republish this
material for advertising or promotional purposes or for creating new collective works for resale
or redistribution to servers or lists, or to reuse any copyrighted component of this work in other
works, must be obtained from the IEEE. Contact: Manager, Copyrights and Permissions / IEEE
Service Center / 445 Hoes Lane / P.O. Box 1331 / Piscataway, NJ 08855-1331, USA. Telephone:
+ Intl. 908-562-3966.

, CREATING PERSONALIZED SYNTHETIC VOICES FROM ARTICULATION IMPAIRED
SPEECH USING AUGMENTED RECONSTRUCTION LOSS

Yusheng Tian, Jingyu Li, Tan Lee

Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong SAR


ABSTRACT sulting in impaired speech production. Such impairments are
most commonly at the articulation level [6, 7]. TTS mod-
This research is about the creation of personalized syn-
els trained with articulation-impaired speech would generate
thetic voices for head and neck cancer survivors. It is focused
synthetic speech that expectantly contains similar types of im-
particularly on tongue cancer patients whose speech might ex-
pairment and may not meet the intelligibility requirement for
hibit severe articulation impairment. Our goal is to restore
communication. This motivates the present study on creat-
normal articulation in the synthesized speech, while maxi-
ing personalized synthetic speech from articulation-impaired
mally preserving the target speaker’s individuality in terms of
training data. Our goal is to restore normal articulation in the
both the voice timbre and speaking style. This is formulated
synthesized speech while maximally maintaining the target
as a task of learning from noisy labels. We propose to aug-
speaker’s individuality. We consider the target speaker’s indi-
ment the commonly used speech reconstruction loss with two
viduality to be well maintained if both the voice timbre and
additional terms. The first term constitutes a regularization
good aspects of the original speaking style are kept.
loss that mitigates the impact of distorted articulation in the
training speech. The second term is a consistency loss that en- From the machine learning perspective, the above goal
courages correct articulation in the generated speech. These can be formulated as a task of learning from noisy labels [8–
additional loss terms are obtained from frame-level articula- 11]. This is justified by the fact that the degree of articulation
tion scores of original and generated speech, which are de- impairment in the speech of a tongue cancer patient varies
rived using a separately trained phone classifier. Experimen- across different types of speech sounds [12]: some sounds re-
tal results on a real case of tongue cancer patient confirm that main largely unaffected and can be viewed as having clean
the synthetic voice achieves comparable articulation quality labels, while those with distorted articulation are with noisy
to unimpaired natural speech, while effectively maintaining labels. Inspired by the re-weighting approach [13, 14] and the
the target speaker’s individuality. Audio samples are available consistency constraint approach [15] developed in studies of
at https://myspeechproject.github.io/ArticulationRepair/. learning with noisy labels, we propose to augment the con-
ventional speech reconstruction loss in TTS model training
Index Terms— Personalized speech synthesis, articula- with two additional terms. The first term is a regularization
tion disorder, learning from noisy labels loss that mitigates the negative impact of distorted articulation
in training speech. The second term is a consistency loss that
1. INTRODUCTION promotes accurate articulation in the output speech. Specifi-
cally, a separately trained phone classifier is incorporated dur-
Tongue cancer is a prevalent form of head and neck cancer. ing training to provide frame-level articulation scores for both
Its incidence rate has been rising in recent years [1]. For ad- original and generated speech. The articulation score of orig-
vanced or recurrent tongue cancer, surgical intervention may inal speech is used as the re-weighting criteria to derive the
involve the removal of both the tongue and larynx [2, 3]. As a regularization loss. The articulation score of generated speech
consequence, the patient would lose voice and speaking abil- quantifies the inconsistency between the phone classifier and
ity permanently. Voice is not only an important means of the TTS model, representing the consistency loss.
communication, but also an integral part of a person’s iden- The proposed approach is validated on a real patient case,
tity. Personalized text-to-speech (TTS) was proposed to en- the same as the one reported in [16]. A personalized synthetic
able people with vocal disabilities to communicate using their voice is built for a female Cantonese speaker, who was ad-
own voices [4]. It was shown that using personalized TTS vised to undertake laryngectomy for recurrent tongue cancer.
as an alternative communication method can significantly im- The patient already underwent partial-glossectomy six years
prove the quality of life of laryngectomees [5]. ago, and about 3/4 of her tongue was removed by surgical
Personalized TTS models are trained with natural speech operation. As a result, she had difficulties in producing cer-
from the target speaker. A major challenge in creating per- tain speech sounds. The synthetic voice is developed from
sonalized synthetic voices for tongue cancer survivors is that the articulation-impaired speech of this patient using the aug-
the training speech is often impaired. Both the tumor and mented reconstruction loss. Objective and subjective evalu-
the treatment process could cause damages to the tongue, re- ations are carried out to demonstrate the effectiveness of the

Written for

Institution
Secondary school
Course
School year
1

Document information

Uploaded on
August 31, 2025
Number of pages
6
Written in
2025/2026
Type
Class notes
Professor(s)
Asma nosheen
Contains
All classes

Subjects

$38.49
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
nabiaasma

Get to know the seller

Seller avatar
nabiaasma Aberystwyth University
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
8 months
Number of followers
0
Documents
13
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions