Scored-based Source Separation with Applications to Digital Communication Signals

1Massachusetts Institute of Technology      2Universidad Carlos III de Madrid


We propose a new method for separating superimposed sources using diffusion-based generative models. Our method relies only on separately trained statistical priors of independent sources to establish a new objective function guided by maximum a posteriori estimation with an α-posterior, across multiple levels of Gaussian smoothing. Motivated by applications in radio-frequency (RF) systems, we are interested in sources with underlying discrete nature and the recovery of encoded bits from a signal of interest, as measured by the bit error rate (BER). Experimental results with RF mixtures demonstrate that our method results in a BER reduction of 95% over classical and existing learning-based methods. Our analysis demonstrates that our proposed method yields solutions that asymptotically approach the modes of an underlying discrete distribution. Furthermore, our method can be viewed as a multi-source extension to the recently proposed score distillation sampling scheme, shedding additional light on its use beyond conditional sampling.

Single-channel Source Separation

Interpolate start reference image.

The goal of single-channel source separation is to separate a source mixture into its two constituent components — a signal of interest (SOI) and an interference signal. In the example above, a drone is attempting to communicate with a remote receiver at home. In the vicinity of the receiver there could be various wireless devices communicating information, such as Wi-Fi routers, gaming or VR headets, and perhaps even long range wireless signals emanating from 5G towers. With the proliferation of wireless devices in a heterogeneous ecosystem, co-channel intereference from other devices is one of the primary deterrants to reliable communications and effective use of the radio radio-frequency (RF) spectrum.

Digital Communication Signals

Interpolate start reference image.

At a high level, digital communications deals with the transmission of bits by modulating a so-called "carrier signal". Groups of bits, from which the underlying discreteness of these sources originates, are first mapped to complex-valued symbols via the digital constellation — a mapping between groups of bits and a finite set of complex-valued symbols. These symbols are subsequently aggregated to form a complex-valued continuous waveform after filtering by a pulse-shaping function. The constellation is chosen (among other considerations) by the number of bits modulated simultaneously. Common schemes include modulating two bits at a time (Quadrature Phase Shift Keying, or QPSK), or one bit at a time (Binary Phase Shift Keying, or BPSK). To recover the bits at the receiver, one may adopt matched filtering (MF) before the estimation of the underlying symbols, and thereafter decode them back to bits.

Mitigating co-channel intereference is a challenging problem, especially in heterogeneous wireless systems, i.e., a wireless system with many different communication devices. In the figure, this is represented as an additive signal b(t) with a scaling factor κ after modulation of the bits onto a carrier wave, which is known as the signal of interest (SOI). We envision an intelligent decoder that uses our proposed solution to reverse the effects of the interference by leveraging independently trained statistical priors over the SOI and interference signals.

Motivation and Challenges

We are motivated to develop new methods for single-channel source separation for two main reasons:

  1. Scalability: We seek to develop a library of pre-trained models that can be used as plug and play priors for source separation. To overcome the training challenges of end-to-end architectures, we also seek solution that require a single model update when sources are added/updated.
  2. Automation: As new signals and protocols are discovered, a method that can automatically discover structures in these sources from samples is highly desired. We seek methods that can automate the statistical modeling of such sources without hand-tailored source-specific algorithms.

We are also motivated by the unique challenges inherent to separation of digital communication signals.

  1. Since digital communications symbols are obtained from modulating symbols from a discrete set they possess underlying discreteness.
  2. Overlapping temporal structures in time and frequency.
  3. The signal generation model could be possibly unkown (e.g., sources recorded over-the-air) and the interference can be non-Gaussian.

α-Posterior with Randomized Gaussian Smoothing (α-RGS)

The backbone of our method is built on maximum a posteriori (MAP) estimation.

Interpolate start reference image.

However, solving the above problem for digital communication signals is difficult due to their underlying discreteness. Thus, we must resort to combinatorial methods which can be expensive. Due to the non-differentiability of the objective function, we cannot resort to gradient-based methods in its current form.

Our solution: Develop a gradient-based algorithm that asymptotically approaches the modes of the underlying distribution by estimating a a continuous proxy of the desired SOI,

Interpolate start reference image.

To this end, we seek to smoothen the optimization lanscape defined by the MAP objective. We achieve this by employing Gaussian smoothing defined as follows,
Interpolate start reference image.
where the noise standard deviations that scale a standard Gaussian random variable are defined by a sequence of scalars increasing in (0, 1). We also leverage the Generalized Bayes' Theorem with an α-posterior with the idea of reweighting the contribution of the more complicated distribution, which is the likelihood term in our setting, as shown below:
Interpolate start reference image.
Our proposed method admits a gradient update given by,
Interpolate start reference image.
the asymptotic solution thereof approaches the modes of the following loss function,
Interpolate start reference image.
Above, we use the score of the distributions, which in practice is approximated using a diffusion model,
Interpolate start reference image.
The intuition is that larger noise levels help explore between modes and smaller noise levels help resolve the solution. The full algorithm is shown below.
Interpolate start reference image.


Interpolate start reference image.

Each row in the figure above shows source separation results for a different mixture. Across the board, our proposed method with an analytical or learned score function for the SOI outperforms all other baselines — matched filtering (MF), linear minimum mean squared error (LMMSE), reverse diffusion and BASIS separation — in terms of bit error rate (BER) and mean squared error (MSE).

We train diffusion models on different RF datasets — i) synthetic QPSK signals with RRC pulse shaping, ii) synthetic OFDM signals (BPSK and QPSK) with structure similar to IEEE 802.11 WiFi signals; and iii) signals corresponding to "CommSignal2" from the RF Challenge, which contains datasets of over-the-air recorded signals. All synthetic datasets were created using the NVIDIA Sionna toolkit.

As expected, our best results are obtained by leveraging the prior knowledge, in the form of the analytical score for the SOI. Nevertheless, our learned SOI score can nearly mimic this performance, and despite the slight degradation still outperforms all baselines in terms of BER as well. It should be noted that CommSignal2 contains small amount of background noise, which is amplified at low SIR (high κ). This noise constrains the minimal achievable BER even under the assumption of only having the residual AWGN present, illustrated by the black dotted line in the bottom left plot. Nevertheless, out proposed method outperforms all baselines.


Research was sponsored by the United States Air Force Research Laboratory and the United States Air Force Artificial Intelligence Accelerator and was accomplished under Cooperative Agreement Number FA8750-19-2-1000. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the United States Air Force or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein.

The authors acknowledge the MIT SuperCloud and Lincoln Laboratory Supercomputing Center for providing HPC resources that have contributed to the research results reported within this paper.


      title={Score-based Source Separation with Applications to Digital Communication Signals},
      author={Jayashankar, Tejas and Lee, Gary C.F. and Lancho, Alejandro and Weiss, Amir and Polyanskiy, Yury and Wornell, Gregory W.},
      journal={arXiv preprint arXiv:2306.14411},
      title={Score-based Source Separation with Applications to Digital Communication Signals},
      author={Tejas Jayashankar and Gary C.F. Lee and Alejandro Lancho and Amir Weiss and Yury Polyanskiy and Gregory Wornell},
      booktitle={Thirty-seventh Conference on Neural Information Processing Systems},