Data Science versus BioStatistics

 Data Science versus BioStatistics

To go from point A to point B, distance, time, and tools are required information to succeed. Data is a term that refers to discrete pieces of information that are normally prepared and kept in a way that is consistent with their intended use and it can be collected with observations.  Data has been present in human beings’ life since early times, and all levels of civilizations used it in some ways to cure disease, to conquer other authorities, to gain more properties, money, wealth, food, etc. As time progressed by advancement in Mathematics and Science, data has been used in healthcare, business, finance, education, travel, and other fields. To use the data properly and effectively in those fields experts should be focused on some specific domains such as health or business. In the last century names such as Biostatistics and data science merged, which will be discussed in this article. In this article, the questions like below will be elucidated.

What are the Data Science and Biostatistics fields and if they are related to each other?

So, if they are related, what are the common areas between them, and what are the similarities, and differences between them. It is hard to put a clear-cut borderline between these two fields and the reasons will be discussed later. Also, the short history of data Science and Biostatics will be provided and the common areas between them will be highlighted. To find the common areas or compare the above-mentioned fields, first, we need to know what the origins of their terms and their history are.  


According to the  National Cancer Institute, Biostatistics is the science of collecting and analyzing biological or health data using statistical methods. Biostatistics may be used to help learn the possible causes of cancer or how often cancer occurs in a certain group of people. Also called biometrics and biometry.

One of the earliest examples of utilizing statistical strategies to resolve health issues can be therapeutic talks centered on the practice of smallpox immunization in the 18th century. In1830s, one of the most prominent advocates for applying the “numerical method” to medicine was the French clinician Pierre-Charles Alexandre Louis. He collected information on the patients that were admitted to the hospital and argued that the practice of bloodletting was doing more harm than good. After observations in1835, he published a paper that showed in fact bloodletting was doing more harm than good. Since then there are many health organizations, hospitals that adopt statistical methods to cure disease and solve public health’s health-related problems. 

Biostatistics and public health benefited from Improvements in technology and digital capabilities by using data and statistical methods to maintain large amounts of data, transferring, cleaning, simulating, analyzing, and using it for medical prediction and public health interventions. 

The most common programming languages to do most of the above-mentioned operations are SAS, R, SQL, and Python.

There are numerous restorative zones where biostatisticians can contribute to the common research to advance. Clinical trials, efficient audits, meta-analysis, observational and complex interventional studies, and statistical hereditary qualities highlight the assignments and obligations of biostatisticians working in these areas. The assignments of biostatisticians in clinical trials are not constrained to the examination of the information, but numerous more duties can be discussed in a different article.  Also, another example of a biostatistical use case can be evaluating the effectiveness of different vaccines. Comparing the effectiveness of new treatment, tools, surgery methods, versus the traditional ones. 

Data Science:

 According to IBM, data science combines scientific methods, math and statistics, specialized programming,  advanced analytics, AI, and storytelling to reveal and clarify the business experiences buried in data. Data science is a multidisciplinary approach to extricating significant amounts of data from the huge and ever-increasing volumes of information collected and made by today’s organizations.

Data science is not a new field and from the old centuries, people tend to collect data to model the world through it. Data science as a term may sound and mean different things to people with different backgrounds. Data helps to reduce uncertainty about almost every principle of science. There is a difference between data-intensive science, data-intensive engineering that is science-based on data, and data science as a phrase. The former is emerging as a scientific discipline that is motivated by data-intensive science, the science of collecting, storing, handling, cleaning, visualizing, analyzing, modeling data. Machine learning models, algorithms, programs, tools, and Artificial intelligence (AI) are evolving fast and data science evolves with them at the same pace.

A graphic definition of Data Science. 

An example of data science application problems can be estimating future sales, text sentiment analysis. Image classification, voice recognition, movie or music recommendations. The most common programming languages used in data science are as follows: Python, Java, JavaScript, C/C++, MATLAB, R.


Advancement in technology, data collection, storage, handling, and processing techniques, programming languages, machine learning, and AI approaches biostatistics and data science to the point that distinguishing them as a separate field will become harder and harder. 

About Author:

I Mohammad Nosrati is a Data Scientist/ Biostatistician with undergraduate and graduate training in Statistics, Biostatistics, Master’s degree in Public Health with a concentration in Epidemiology and Biostatistics from Texas A&M. I have been teaching and mentoring for more than 8 years in a variety of statistics and math subjects at public schools and community colleges. I also participated in the Data Science program of Thinkful with various projects. I have experience working with different types of datasets and machine learning models in International trades, public health problems, and text sentiment analysis. To analyze those projects, I used SAS and Python programming languages.

My latest projects:


Related post