HuProt™ and the Origins of the Human Proteome Microarray

Nearly 25 years ago, researchers demonstrated that proteome-wide protein microarrays were achievable, opening the door to proteome-scale biology.

As part of our ongoing series marking that milestone, Seth Blackshaw reflects on how the original human proteome microarray took shape at Johns Hopkins. He describes how a collaboration between neighbouring laboratories evolved into HuProt™, the human proteome microarray, and helped lay the foundations for CDI Labs.

Scientific tools rarely begin as products. They usually start with a problem in a laboratory.

For Seth Blackshaw, that problem emerged while studying how neurons develop in the brain and retina. His research focused on transcription factors, the regulatory proteins that guide cells as they take on specialised roles during development.

Understanding those proteins meant understanding what they interacted with and how they functioned. The available tools made that difficult.

During his postdoctoral research, Seth had already begun exploring high-throughput approaches to gene expression. Using one of the earliest sequencing-based methods, SAGE, he generated digital expression profiles of retinal development. This was one of the first attempts to capture how gene expression changes over time in neural tissue. Looking back, the scale now seems modest.

“We were generating about 50,000 to 100,000 sequence tags per whole retina,” Seth recalls. “Now you can get that level of information from a single cell.”

Even so, the experiments produced a long list of genes that were strong candidates for regulating retinal development. The real challenge was figuring out what those genes actually did.

Let's consult on your project

Building the first mammalian protein microarray

When Seth joined Johns Hopkins, he arrived at the same time as Heng Zhu. Both were recruited into the Center for High Throughput Biology, a programme designed to bring together researchers interested in developing new technologies.

Their interests were very different. Seth studied neuronal development while Heng was working on microbial protein arrays, yet their laboratories were placed next to each other.

“We were put together in the same room,” Seth says. “We could either ignore one another or collaborate.”

At the time, Seth’s lab was working to understand transcription factors and RNA-binding proteins involved in neural development. Protein arrays offered a promising way to explore those interactions at scale. The difficulty was that no one had built a mammalian protein array of that size.

At the point, the goal was not to represent the entire proteome. Seth was interested in transcription factors and RNA-binding proteins that regulate gene expression during neural development, and a focused collection of regulatory proteins would already provide a powerful experimental tool.

Seth initially proposed creating a focused array containing a few thousand human proteins involved in gene regulation. The internal grant was rejected. Ironically, that rejection helped push the idea further.

Expanding from protein arrays to the full human proteome

The team set out to subclone and create a large collection of full-length, 3D proteins. Once they had this collection, it became clear that this could scale much further than first imagined.

Working together, Seth and Heng began assembling prototype arrays containing transcription factors, RNA-binding proteins, kinases and mitochondrial proteins. Early versions included roughly 4,000 proteins. The first focused arrays were later described in a Cell paper in 2009, demonstrating that large-scale mammalian protein microarrays could be used to study regulatory proteins systematically.

At that stage, the array already covered several key regulatory protein classes. It was enough to test whether large-scale mammalian protein arrays could work in practice.

Then Heng posed a simple question.

“If we’ve already come this far,” Seth remembers him saying, “why don’t we just do the whole thing?”

That decision led to the development of the first full human proteome microarray. A key technical decision helped make the platform practical. Instead of expressing proteins in mammalian cells, the team used yeast. Yeast provided a eukaryotic environment that allowed many proteins to fold correctly while adding reproducibility to the protein arrays.

That balance between biological fidelity and scalability proved critical to building a proteome-scale resource.

Improving antibody specificity with proteome-wide screening

From the beginning, Seth recognised that proteome-scale arrays could solve a persistent problem in biomedical research: antibody specificity.

Like many scientists, he relied heavily on antibodies in his work. The quality of those reagents varied widely.

“Most antibodies are actually pretty bad,” Seth says bluntly. “Cross-reactivity is always the problem, and there wasn’t really a systematic way to test it.”

Proteome arrays offered a new approach. Instead of testing antibodies against one protein at a time, researchers could screen them simultaneously against thousands of potential targets.

The method was not perfect. No single experiment can definitively prove antibody specificity. However, proteome-wide screening provided a powerful way to identify problematic reagents early and enrich for the best candidates.

From academic protein array to CDI Labs platform

As the arrays matured, the technology began to attract broader interest.

In 2008, Seth and Heng co-founded CDI Labs to develop the platform further. Protein production and immunisation work was carried out in Puerto Rico, while array fabrication and other specialised steps remained in east coast USA. The original concept focused on producing highly specific monoclonal antibodies.

Early progress depended heavily on government grants. One major turning point came when the NIH Common Fund launched a programme to develop immunoprecipitation-grade antibodies against transcription factors.

The CDI Labs approach proved effective, and the project provided crucial funding during the company’s early years.

As the platform began to be used more widely, another application started to emerge.

Seromics and autoantibody discovery at proteome scale

Researchers realised that proteome arrays could be used to analyse antibodies present in human serum.

Instead of validating laboratory reagents, scientists could examine immune responses directly in patient samples across thousands of proteins simultaneously.

That shift opened new possibilities for studying autoimmune disease, cancer and infectious disease.

Over time, the demand for these large-scale seromics projects grew.

“The ability to screen across the whole human proteome opened up new understandings of disease,” Seth explains. “Cancer, autoimmune disease, and more recently post-viral conditions like long COVID have all benefited from being able to explore the impact of autoantibodies in serum.”

For Seth, one of the most satisfying examples came from a study that used HuProt™ arrays to identify the cellular targets of pathogenic autoantibodies in multiple sclerosis.

The work demonstrated how proteome-wide screening can reveal mechanisms of disease that might otherwise remain hidden.

Talk to us about the missing data in your omics project

Why proteome-scale biology matters

Looking back, Seth is both proud of the platform and realistic about its role.

Proteome arrays are not designed to deliver final clinical answers. They are discovery tools that allow researchers to explore biological systems at scale and identify signals worth investigating further.

“The array is a discovery engine,” Seth says. “You start wide, identify signals, and then follow up with more focused experiments.”

They also encourage a different way of thinking.

Most scientists still focus on a small number of proteins. Proteome-scale platforms begin from the opposite direction. They start with everything.

The future of proteome-scale screening

The core resource will remain the same: a comprehensive collection of full-length human proteins.

“The secret sauce is being able to generate high-quality, appropriately folded full-length proteome preps, which we now manage to do routinely at CDI Labs” Seth says. “That’s the part that really matters.”

At the heart of our future developments is our validated protein collection and our understanding of how it can be used, in whatever format, to address new scientific questions at scale. Providing research groups across the globe to rapidly access our expertise, without being delayed in looking for the right resource.

As computational tools improve and researchers become more comfortable working with large biological datasets, the ability to interrogate the entire proteome becomes even more valuable.

Sometimes the only way to understand biology is to start wide.

Continue reading the 25 Years of Proteome-Scale Biology series: