Featured Presentations
NCBI SRA integration presented to the ODSS Technical Implementation Working Group
Velsera was invited to present on our SRA integration into production to the Technical Office of Data Science Strategy (ODSS). ODSS leads implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that comprise NIH. The presentation described the Undiagnosed Disease Network (UDN) use case and demonstrated how to use the SRA.
Variant Workbench presented at the Kids First Webinar
The Variant Workbench has been presented in the Kids First Webinar.
The Variant Workbench enables users to analyze tens of thousands of variants with ease on the Seven Bridges Platform. The Variant Workbench is a special dedicated Data Studio Analysis, which uses a Spark Cluster on the backend to provide the computational power and capabilities necessary to read all the variants.
The variants are organized in parquet files, or gVCF, and can be then accessed via a “sql” like approach, like pandas dataframe or R dataframe.
The major capabilities that the Variant Workbench unlocks are:
Analysis and exploration: Analyze tens of thousands of variants in seconds, with the ability to access data directly via the notebook interface.
User’s data readily available: Having the capacity to support gVCFs natively enables users to add private data to the project and be able to access these variants directly.
Annotation on the fly: This feature can combine several databases of known variants in real time, which provides a possibility to combine different annotation databases (like dbSNP, ClinVar and so forth) in a quick command that return the results in seconds.
DS3 Summer Workshop A Great Success
Two members of the CAVATICA team, Cera and Ryan traveled to Boulder, CO where they presented a lesson on accessing INCLUDE data to a team of DS3 researchers. This is the 3rd year in a row that we’ve collaborated with this wonderful group of researchers. Read more about them under our Teaching page.
30 participants in total were present for this workshop.
Publications
Immuno-oncologic profiling of pediatric brain tumors reveals major clinical significance of the tumor immune microenvironment
Click to read: https://www.nature.com/articles/s41467-024-49595-1
New Features
Unlocking genomic discoveries with seamless access to the NCBI Sequence Read Archive
Click to read: https://velsera.com/unlocking-genomic-discoveries-seamless-access-to-the-sequence-read-archive
Unravelling Cellular Complexity: Exploring 3D Genome Structure and DNA Methylation with the snM3C Pipeline
Click to read: https://velsera.com/blog/unravelling-cellular-complexity-exploring-3d-genome-structure-and-dna-methylation-with-the-snm3c-pipeline
Meeting the NIH Data Management and Sharing (DMS) policy requirements
Click to read: https://velsera.com/blog/meeting-the-nih-data-management-and-sharing-dms-policy-requirements-with-community-focused-cloud-platforms
Manifest-based DRS import: A solution for cross-dataset analysis
Click to read: https://velsera.com/blog/manifest-based-drs-import-a-practical-solution-for-cross-dataset-analysis-to-empower-translational-research
Collaborative Projects
New Collaborative Project Awardees
Three new collaborative project applications have been approved for the CFDE Workspace Pilot Award.
Mingzhu Fu (NYU Langone) is a PhD candidate at NYU School of Medicine. She’s interested in working on rare and complex genetic traits such as neural developmental diseases.
Sarah W. Curtis, Ph.D. (Emory) is a recent graduate and is now a postdoctoral researcher at Emory. She aims to uncover epigenetic differences in people who were highly exposed to endocrine-disrupting compounds.
Jason Flannick (Broad) is an assistant professor of pediatrics at Harvard Medical School and the Division of Genetics and Genomics at Boston Children’s Hospital, and an associate member of the Broad Institute of MIT and Harvard. His lab develops computational approaches to use human genetic and broader genomic data to understand or better treat human diseases, with a current focus on diabetes.