Motivation
Protein solubility plays a vital role in pharmaceutical research and production yield. For a given protein, the extent of its solubility can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein solubility predictors.

Model
In this work we propose, DeepSol, a novel Deep Learning-based protein solubility predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and additional sequence and structural features extracted from the protein sequence.

Results
DeepSol outperformed all known sequence-based state-of-the-art solubility prediction methods and attained an accuracy of 0.77 and Matthew’s correlation coefficient of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced production capacity and can more reliably predict solubility of novel proteins.

Citation
This work is published in Oxford Bioinformatics https://doi.org/10.1093/bioinformatics/bty166. This work can be cited as:
Khurana, S., Rawi, R., Kunji, K., Chuang, G.Y., Bensmail, H. and Mall, R., 2018. DeepSol: a deep learning framework for sequence-based protein solubility prediction. Bioinformatics, 34(15), pp.2605-2613.

Webserver
A webserver is available for protein solubility prediction. The webserver accepts proteins in fasta format. Submission of prediction job requires a free account, which can be created using this link. A user account is required to keep track of jobs submitted by the user as well as the status of the submitted jobs.

Source Code
The standalone source code and models are available here.