Cancer of Unknown Primary (CUP) is the 4th leading cause of cancer-related death worldwide. Treatment remains problematic as it depends on the tissue-of-origin of the disease, which is often misdiagnosed or undiagnosable. I designed and trained a robust and cost-effective machine learning model to carry out non-invasive CUP classification from cell-free DNA (cfDNA). The mixture nature of cfDNA in the bloodstream leads to heavily convoluted data that disrupts intraclass distribution assumptions made by standard methods. This pipeline is focused on addressing these challenges with a Support Vector Regression predictor, a diverse 18-model base library, and a meta-learner to restore previously violated assumptions and produce a stable tissue-of-origin prediction. Data from 1340 solid tumor profiles and 225 whole blood profiles were utilized to fit the learners. This multi-level model determined tissue-of-origin with 93.5% accuracy on a real withheld test set (n=522), corresponding to an 18% improvement from industry standard tests. The techniques presented have important implications in treatment decision-making in CUP and in the development of early detection assays. Moreover, since ctDNA can be used as a surrogate to study a patient’s tumors without the need for a biopsy, my novel design can be used to decrease risk in cancer progression monitoring, capture tumor heterogeneity, and reduce the overall cost of existing methods. The increase in specificity and sensitivity can add to the practicality of these applications to enhance the broad sphere of the non-invasive examination of tumors.

What inspired you (or your team)?

Constructing data structures and crafting algorithms in the fast-paced setting of competitive programming captivated me for years. I joined countless competitions locally and internationally, but a part of me was always dissatisfied by the fact that algorithm optimization contests merely resulted in rankings and prizes without any discernible impact on the greater population. I felt like my training could be used for something with more substance and humanity.

To this end, I started a business, CV Enterprises, that trains students to become full-stack developers and provides them with an endless supply of real-world experiences. Growing the organization to twenty-one employees spread across multiple high schools provided me with the tangible impact that I was looking for. We completed fifteen web and app development jobs, including injury tracking software for sports therapists and interactive apps to promote reserves for the town council. This gave me a taste of what it was like to work on projects with far-reaching implications.

I then happened to attend a talk on software- how its ubiquity allows programmers to reach a worldwide audience as easily as a local one. This inspired me to set a goal that extended beyond the local businesses that we have serviced. Bioinformatics seemed like the perfect fit, with the influx of “big data” necessitating algorithmic optimization (an old friend). I sprung at the opportunity, applying to the Alizadeh Laboratory at Stanford with the intent to build predictive models by leveraging terabytes of raw genomic information.

With the new-found purpose, I proposed a project to enhance the non-invasive diagnosis of malignancies. Fast forward two years, the meticulous data collection, featurization, and experimentation finally bore fruit: a novel ensemble-based machine learning model that has the potential to outperform the industry standard for non-invasive cancer diagnosis.

Securing a high ranking in a competition or developing the next hit app paled in comparison to the possibility of actually helping patients down the line. The pursuit of individual notoriety with competitive programming, the development of a meaningful company within my community, and finally, the contribution to the field of bioinformatics and precision medicine instilled a new perspective on making an impact. It is the perspective that inspired a relentlessness in me that drove the completion of HICCUP.