Diffusion Speech is a diffusion-based text-to-speech model. Our speech synthesis pipeline is quite simple. We use a diffusion transformer model (DiT) to predict the duration of each phoneme. Then we ...
This repo is the implementation of a research project aimed at enhancing Acoustic Side-Channel Attacks (ASCAs) using a novel combination of Vision Transformers (VTs) and Large Language Models (LLMs).