1: What is Data Science?

Data science is an interdisciplinary field that utilizes informatics, statistics, machine learning and high-performance computing to discover new knowledge and insight from data. As per Wikipedia, data science can be described as a science, a research paradigm, a research method, a discipline, a workflow, and a profession. It utilizes methods from multiple disciplines including mathematics and computer science. This highly data-driven profession has been referred to as the “fourth paradigm” of research following the more established activities of empirical, theoretical, and computational sciences.

Click here to see a comprehensive review.

 

2: What is Artificial Intelligence (AI)?

Artificial intelligence is an interdisciplinary field that utilizes techniques from computer science, data science, mathematics, informatics, statistics and probability, logic, optimization, machine learning, and high-performance computing to simulate/emulate learning, reasoning and planning.

One of the field’s primary goals is automated knowledge discovery.

 

3: What is Machine Learning (ML)?

Machine learning is a broad class of mathematical and statistical techniques that enable computers to learn patterns and signals within datasets. Machine learning algorithms are used today in a wide variety of research applications from intelligent information retrieval to advanced predictive analytics. TAP Discovery includes a broad range of statistical and ML methods including linear and logistic regression, naïve Bayes, decision trees and random forest, support vector machines, reinforcement learning, multiple unsupervised clustering techniques, and of course multiple state-of-the-art artificial neural network and deep learning algorithms. All of these methods can be used individually or as part of a predefined discovery workflow. TAP Discovery is a no coding ML platform thus no prior experience with AI required.

 

4: Why was TAP DiscoveryTM developed?

AI-based tools that have been designed for computer and data scientists can be difficult to use and require significant understanding of algorithm parameters and appropriate usage. Furthermore, the core concepts underlying AI and machine learning continue to grow in scope and complexity making it hard to deploy in a sound and principled manner without advanced training as a computational scientist.

TAP Discovery is an alternative cloud-based platform designed specifically for subject-matter experts in healthcare and the life sciences who desire to build advanced data-driven models. TAP Discovery enables individual PIs and smaller resource-limited scientific teams to engage in world-class AI-based exploratory data analysis and hypothesis generation for a broad range of clinical and biochemical problems of interest. TAP Discovery can build predictive and prescriptive models and unearth new questions and innovative solutions that may otherwise be overlooked or considered intractable.

TAP Discovery supports a variety of machine learning algorithms, a multiscale interactive visual interface, explainable AI (xAI) and interpretable ML (iML) methods, integrated large language model (LLM)-based literature summarization, and seamless access to high-performance computing (for large and/or exceptionally complex models). In addition, new multi-informatics features that combine predictive analytics with clinical phenotyping for population health initiatives, biomarker profiling, and chemical discovery and optimization are being developed for those interested in early-stage drug discovery.

TAP Discovery will enable clinicians and scientists to ask and answer complex and challenging questions in a timely and innovative way. We believe TAP Discovery will provide a foundation for more informed decision-making thus leading to accelerated discoveries.

 

5: How is TAP Discovery different from existing analytics platforms?

Most AI/ML tools today are designed for computer and data scientists. The interface often used is Jupiter or similar interactive workflow tools. These are appropriate platforms for senior AI/ML people who want and need low-level control over data and model training. However, the vast majority of AI-driven life science discovery in the future should be driven by the quality of ideas from the clinicians, biologists, and chemists, unhampered by the difficulty of using open source tools, the growing shortage of AI/ML expertise, and the lack of funding to consistently engage such talent even when it is available. TAP Discovery is designed to permit a friendly AI/ML experience via a) visual user interface that is designed to feel familiar and relevant to life science and healthcare professionals, b) allows no-code ML operations without the need for a programmer, and c) provides an intelligent Help and Guidance System (HGS) that promotes best practices at each step of the discovery process. In combination, these features guide a clinician or scientist through the various analytics steps to an effective and actionable predictive model with little or no data science team involvement or cost.

 

6: Why should I consider TAP Discovery when there are numerous free tools, code libraries, and open access data sources to draw upon?

The short answer is because they are not actually “free.” First, most open-source AI tools, although very usable in the hands of a computer and/or data scientist, were never designed for our intended audience which includes clinicians, molecular biologists, organic chemists. To use open-source software tools effectively you often need a team—professional or academic—and that requires funding that your research grants may or may not accommodate thus further delaying critical and time-sensitive discoveries. Second, even if you have a well-funded and skilled team, it is unlikely they can ever cover the broad swath of AI topics required to be fully effective in today’s rapidly accelerating and increasingly complex scientific environment.

There are simply too many skills to possess and best-practices to master. To address these concerns, TAP Discovery bridges and backfills knowledge gaps in building predictive models, promotes live collaboration, and further amplifies productivity via guided instruction and automated workflow management. TAP Discovery automates many activities, eliminating the often substantial manual effort required when using free tools. With TAP Discovery, broader investigations and analyses can be performed in a fraction of the time, results can be automatically cross-compared using multiple ML methods, and diverse knowledge resources can be seamlessly applied to your problem of interest. The goal at every step is accelerated ideation and experimentation.

 

7: What are TAP Discovery’s main features and strong points?

Key aspects of the platform that uniquely promote better, faster and more actionable AI use:

  • Data Wrangling and Feature Engineering options to simplify dataset management and uploading;
  • Integrated PubMED literature search and document retrieval (based on keywords of interest) as well as an integrated LLM for advanced literature summarization and exploration;
  • Best-in-class supervised, semi-supervised and unsupervised Machine Learning (ML) algorithms and strategies for rapid predictive model development—with no coding required;
  • What-If  visuals to interrogate counterfactuals and dependencies within a trained ML model;
  • Explainable AI (xAI) and Interpretable ML (iML) features to further understand a model’s behavior and performance;
  • Integrated High-Performance Computing (HPC) for rapid model development—eliminating the need for an external Azure, AWS or Google Cloud account;
  • Secure real-time storage, retrieval and viewing of your research datasets and model results;
  • One-button access to a community repository to share and explore predictive models and begin virtual collaborations;
  • Publication-grade customizable Knowledge Graphs to support life science context building and evidence generation;
  • Interactive Help and Guidance System (HGS) to assist all user levels during ML model development;
  • COMING SOON: Integrated bioinformatics and cheminformatics functionality to support a diverse range of drug discovery and drug repurposing activities (in BETA through 2025)

 

8: What if I am unfamiliar with AI and Data Science—will there be a protracted learning curve for me?

TAP Discovery is a full-featured analytics software platform. Prior knowledge of AI or Data Science is always helpful but not required. We designed the platform to balance power with ease-of-use. Our mission is to enable engaging, accessible and efficient discovery for ALL life science and healthcare researchers. We continue to explore new ways to ease the AI learning curve including an extensive Help and Guidance System (HGS) to navigate each workflow step–from the data preparation to model training to using your model to generate actionable information. In the future we plan to add more content around common discovery scenarios and workflows covering clinical care to drug development. Last, we are committed to continuously improving the usability of the system through user feedback and the monitoring of industry trends. In summary, we will continue to work toward removing the diverse progress barriers experienced by many clinicians and life science researchers seeking to utilize AI and informatics tools.

 

9: Describe a typical CLINICAL USE CASE that TAP Discovery can assist in solving?

TAP Discovery has the potential to assist healthcare workers and healthcare researchers answer complex clinical care questions, such as which patients may return for additional care within thirty days of discharge based on the clinical, behavioral, and outcome information within their EHR record. This is known as the 30-Day Readmission Problem. A properly validated model with this capability would enable a healthcare system to predict patient trajectories and to recommend early intervention strategies, thus increasing the quality-of-care before and after discharge. TAP Discovery enables models of this nature to be developed in an efficient, rapid and principled manner.

For more information on clinical use cases, please contact us at TAPDiscovery@TuftsMedicine.org.

 

10: Describe a typical BIOLOGICAL USE CASE that TAP Discovery can assist in solving?

TAP Discovery has the potential to assist biologists in answering complex molecular and cellular questions such as which genes/proteins are involved in a disease process. Models can be developed utilizing multiscale omic data including, but not limited to, mutation, expression, epigenetic changes, and protein trafficking. This is known as Biomarker Discovery and Target Profiling. A properly validated model with this capability would enable a biologist to predict pathway and cellular trajectories and to recommend further laboratory experiments in pursuit of new therapies. TAP Discovery enables models of this nature to be developed in an efficient, rapid, and principled manner.

For more information on biological use cases, please contact us at TAPDiscovery@TuftsMedicine.org.

 

11: Describe a typical CHEMICAL USE CASE that TAP Discovery can assist in solving?

TAP Discovery has the potential to assist chemists seeking to improve the properties of a compound under study by using multiobjective optimization and ADMET property prediction. Chemical structures can be simulated to explore desired behavior without having to physically synthesize each chemical subtype in the laboratory. This is known as Quantitative Structure-Activity Relationships (QSAR) and Quantitative Structure-Property Relationships (QSPR). A properly validated model with these capabilities would enable a chemist to explore chemical space far faster and cheaper than through wet lab operations alone. TAP Discovery enables models of this nature to be developed in an efficient, rapid and principled manner.

For more information on chemical use cases, please contact us at TAPDiscovery@TuftsMedicine.org.

 

12: Describe a PRECISION MEDICINE USE CASE that TAP Discovery can assist in solving?

TAP Discovery has the potential to assist multi-disciplinary teams in formulating complex therapeutic decisions in support of advanced cancer care. TAP Discovery models and built-in LLM capability can identify multiscale patterns and outcomes between existing drugs, a patient’s genomic profile, and what is known about biological mode and mechanism of action. Properly validated models with this capability will allow a clinical team to predict patient trajectories under different drug combinations. TAP Discovery enables data science and machine learning to be utilized with multiscale omic data and diverse disease models in an efficient and principled manner.

For more information on precision medicine workflows, please contact us at TAPDiscovery@TuftsMedicine.org.

 

13: I have a dataset that includes a combination of clinical EHR, clinical imaging, and multiomics data. Can I use TAP Discovery?

Yes. The platform supports all these datatypes and permits simultaneous utilization within the same model or workflow.

For multiscale data and model development support, please contact us at TAPDiscovery@TuftsMedicine.org

 

14: Can I use datasets processed with Data Bricks, Snowflake, Arvados, or other data preparation and management tools?

Yes. We have begun adding compatibility with these tools, and this capability will only increase over time. It is noteworthy that TAP Discovery, although not a data wrangling tool, has numerous methods to prepare and ingest data directly, so the use and cost of commercial tools can be discontinued in many cases.

For data wrangling support please contact us at TAPDiscovery@TuftsMedicine.org.

 

15: Does TAP Discovery help with the “black-box” problem?

Yes. Explainable AI (xAI), and by extension, interpretable machine learning (iML), are critical supporting technologies in predictive model building, and they are included in TAP Discovery. Machine learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. Unfortunately, with some models it can be difficult to discern exactly why they learned specific relationships or how they arrive at a given prediction. The aim of xAI and iML methods is to develop models that are both understandable and effective. More specifically, xAI helps us understand how a model works (to the degree the methods permit today) and iML helps us understand what information the model is capable of conveying with respect to a prediction. These methods get us closer to a true trust relationship with our models.

 

16: Are there example models for me to learn from or utilize as a starting point?

Yes. TAP Discovery has a model store. This is a community repository for our users to freely share their work and possibly open new lines of collaboration. These models can be searched by model type, clinical/biological/chemical data set utilized, and author. The model store is both a learning resource and a place to share your accomplishments with the world.

 

17: What if I do not have sufficient quality/quantity of data for a data-driven approach to modeling and discovery?

This is an all-too-common problem. In the event your clinical or laboratory data is limited, open access databases may be a viable alternative. TAP Discovery includes a multiscale data and information repository with extensive life science content and over three billion (and growing) data points from open access sources including, but not limited to, Pubchem, NCBI Gene, Uniprot, TCRD, Reactome, Pathway Commons, and Monarch. Additional open access data and knowledge sources will be continuously integrated over time. This knowledge and data repository is utilized in multiple ways within TAP Discovery to provide open access datasets for model building, to provide supporting evidence for trained models and their predictions, and for scientific content in user customizable knowledge graphs for publication. If you do not see your data of interest listed above contact the team to discuss adding it to a future platform release.

For new data source requests please contact us at TAPDiscovery@TuftsMedicine.org.

 

18: Who sponsored the development of TAP Discovery?

TAP Discovery is supported by Tufts Clinical Translational Sciences Institute (CTSI) through National Institute of Health (NIH) CTSA award UM1TR004398 and the generous contribution of ideas and software engineering talent from IOMICS Corporation (www.iomics.ai).

 

19: Is there a cost to use TAP Discovery?

Yes–a monthly subscription fee provides access to all platform features and capabilities from data staging to model development and to a growing complement of seamlessly integrated clinical informatics, bioinformatics and chemical informatics functionality. In addition, this fee includes 10GB of data storage and unlimited data transfers each month. For predictive model development, 10 hours of High-Performance Computing (HPC) are included with an option to purchase additional computing time if needed. Because HPC is seamlessly integrated into TAP Discovery there is never a need for external services from Azure, AWS or Google Cloud.

TAP Discovery requires no contract or minimum commitment and takes only a few minutes to sign up by visiting the Tufts CTSI Research Services Portal.

For academic research TAP Discovery pricing, please visit our Pricing page.

For commercial TAP Discovery pricing, please contact us at TAPDiscovery@TuftsMedicine.org.

 

20: What are the minimum computer hardware/software requirements to use TAP Discovery?

TAP Discovery is cloud-based. Most of its computing intensive features are executed on a High-Performance H100 AI/ML System hosted at MGHPCC. For the best user experience your desktop computer or laptop should have at least 32GB of RAM and a high-resolution (1920×1200 WUXGA) display monitor. Chrome browser is recommended (with support for Microsoft Edge and Apple Safari expected in Q2 2025).

 

21: Is there Technical Support available?

Yes. The TAP Discovery Team is comprised of Software Engineers and AI Technologists who developed the platform and are Power Users available to assist. The team can be reached by submitting a service ticket through the Tufts CTSI Research Services Portal, or by telephone during normal business hours.

 

22: Is there Data Science Support available?

Yes. The TAP Discovery Team is comprised of Data Scientists and Machine Learning Experts who developed the platform and are Power Users available to assist. The team can be reached by submitting a service ticket through the Tufts CTSI Research Services Portal, or by telephone during normal business hours.

 

23: Is there Informatics Support available?

Yes. The TAP Discovery Team is comprised of Clinical Informaticist, Bioinformaticist, and Chemical Informaticist who developed the platform and are Power Users available to assist. The team can be reached by submitting a service ticket through the Tufts CTSI Research Services Portal, or by telephone during normal business hours.

 

24: I really like the concept of Open Science and Democratizing AI use within the Life Sciences. Can I contribute to future versions of TAP Discovery?

Yes. There are multiple ways to apply your talent and experience to further accelerate TAP Discovery’s impact on healthcare and the life sciences. If you are a scientific programmer there will be code challenge events in 2025. For those interested in new machine learning algorithms or xAI/iML there will be a SIG (special interest group) also forming in 2025. For those interested in expanding the informatics capability of TAP Discovery, we encourage you to join the numerous pilot projects underway in patient and population health and early stage in silico drug development.

To learn more about any of these planned events and initiatives, please contact us at TAPDiscovery@TuftsMedicine.org.