Welcome to PULPS

Welcome to PULPS

In eukaryotes, cellular compartmentalization is crucial for living cells to spatiotemporally organize complex biological reactions (Banani et al., 2017). Mounting evidence shows that liquid-liquid phase separation (LLPS) underlies the formation of a wide range of membraneless compartments, also termed biomolecular condensates, such as the nucleolus, P granules, and PML bodies, which is organized by multivalent interactions among proteins and RNA molecules (Alberti et al., 2019). Among these components, scaffold proteins drive phase separation that concentrate low valency client proteins or other molecules (Banani et al., 2016). Dysregulation of scaffolds can lead to aberrantly altered condensate assembly and various complex diseases such as neurodegeneration and cancer (Zhang et al., 2020). Until now, only a few proteins have been experimentally identified as scaffolds, whereas most of the rest are unlabeled. In contrast to labor-intensive experimental approaches, the prediction of scaffolds in silico could increase the speed and further boost the research on LLPS. However, existing computational tools suffer from uncovering potential candidates or overcome extreme imbalance. Previously, we manually collected 150 scaffold proteins that are drivers of LLPS, 987 regulators that contribute in modulating LLPS, and 8148 potential client proteins that might be dispensable for the formation of MLOs and constructed the integrated database named DrLLPS (Ning et al., 2020). However, a bioinformatics predictor dedicated to scaffolds of LLPS is still urgently needed.

Elucidating the principles underlying the formation of phase-separated condensates is vital to understand the physiology and pathophysiology of a wide range of biological processes. Therefore, there is an urgent need to identify the proteins linked to LLPS to further characterize these condensates. In this work, we carefully reviewed and introduce multimodal features based on multivalent interactions to characterize both scaffold and other proteins in DrLLPS, and six types of protein features, including sequence-derived features along with amino acid composition, biophysical principles of the propensity towards droplet formation, IDR, hydrophobic regions, LCR, and secondary structure were adopted. We further implemented a positive unlabeled (PU) learning-based framework that combine ProbTagging and the penalty logistic regression (PLR) for profiling propensity to drive LLPS in the human proteome and developed the PULPS. Besides the area under the receiver operating characteristic curve (AUC), the area under the lift curve (AUL) adopted to recover the performance. PULPS achieved the best AUC of 0.8353 and AUL of 0.8339. In comparison, we achieved 57.37 % higher performance than the same pipeline without the PU framework and an 8.21% superiority over the second best predictor in terms of AUL. We also reviewed the literature on recent spotted LLPS driving proteins, in which a partial recovery implementation was successful with a 2.91% increase in AUC from 0.8353 to 0.8596 and a 2.85% increase in AUL from 0.8339 to 0.8577. Then, we present our newly designed predictor named PULPS and the webserver for PULPS freely accessible for academic research at http://pulps.zbiolab.cn/.