If you are planning to do a data science internship in Kolkata, it is important for you to understand the terminology ‘data science’. In this article, we will try to deliver you with the basics of data science that one needs to know in order to get into a data science course.

What is data science?

As said by the major data analytics training in Kolkata, data science is the use of various techniques to understand data and build predictive models to make business decisions. The sources of data and the ability to collect and store it have come a long way in the last decade. Companies are making use of a variety of tools and techniques to mine patterns in the data and gather useful insights to it.

What is the difference between data scientist and statistician?

Both the data scientist and statistician work with the data to derive useful insights from it. But the difference is that a statistician is focused on identifying the relationship in the data while a data scientist works towards using the relationships and building models to predict future outcomes. Normally, a data scientist aims at building a generalized model with high level of accuracy.

Scientists most often use tools like R, Excel, or MATLAB because they have a number of libraries for analyzing data. On the other hand, the data scientists make use of tools like Python, Apache Spark and the like to explore the data and build models.

Statistics and probability of data science

Statistics and probability is the fundamental core skills required for learning data science. There are a number of statistical techniques and probability distributions that we can leverage to understand the structure of a given data. Some of the important topics that most data scientists work upon are as follows:

1. Description statistics

  • Mean, median and mode
  • Variance and standard variation


2. Probability

  • Bernoulli trials and probability mass function
  • Central limit theorem
  • Normal distribution


3. Inferential statistics

  • Confidence interval
  • Hypothesis testing
  • Correlation

Besides, you have to work on the basic machine learning algorithms. Have a theoretical knowledge of the algorithms and how they work is as important as being able to implement the algorithm. Once you know the working, it will be easier for you to understand the various parameters as well as the type of data to be used.