What is science and what is scientific research

9 minute read

Published:

What is science and what is scientific research

Science English definition: The systematic study of the structure and behaviour of the physical and natural world through observation, experimentation, and the testing of theories against the evidence obtained.

So, my appreciation is, that science is about generalising, anything that generalizes knowledge will be science. There is no consensus about how this generalization should be, but the least disputed definition needs to satisfy two principles

  1. Minimal (or simple), for instance, if you have two models representing the same meaning, m1 = x + y, and m2 = x + y + z, where z = 0, then m1 preferable than m2
  2. Stable, it is better to give an example in physics, whenever we have a physics discovery, it needs to be true regardless of time and space.

This is not a provably true assumption but it is the least disputed norm of scientific investigation.

The way you decide to do research decides how you approach science. People usually hear something like, this person is an applied scientist and this person is a theoretical scientist, but what is, what’s the difference? What does science actually mean? The opinions about theoretical or applied research are subjective

Math and scientific research

How science has been divided, is somewhat, related to how math is defined. because math is divided into, informally, two types, more applied math (like statistics, probability theory, and geometry… ) and abstract math (like set theory, type theory, category theory or lambda calculus).

Depending on the time that the discipline emerged, traditional math(universal algebra (or abstract algebra), topology, and geometry, they all invented before the 20th century) and newer math fields (like type theory, lambda calculus, and category theory which were invented in the early 20th century.

These new maths ( type theory, lambda calculus, and category theory) are getting more and more popular nowadays because of the growth of computer science, and these math disciplines are somewhat interrelated to computer programming, and set theory is no longer appropriate anymore.

It is very common for one person to judge whether another person is doing theoretical research based on his or her field of study. Some mathematicians don’t even think probability or statistics is rigorous math. Because the area of probability can be quite rather sloppy, everything is called P, types are almost never been used. Basic notions (like conjugate prior) are introduced only via examples.

But it is wrong to judge people by which field they study, we can only judge them by the way they approach science. It is common for people to study abstract math but do applied research or people study applied math but do very serious pure scientific research. Also, it is common for the same researcher to do different kinds of research at the same time. So here, I will divide scientific research into 4 types(applied research, methodology research, foundational research, and pure scientific research), I list them in the order of abstraction.

Applied research

This kind of research solves problems that directly matter to the real world, it usually applies some existing methods/theories to practical real-world problems. Sometimes the method used may not be suitable for the problem, or their method is not scientific enough, this is somewhat can be considered as engineering research, not scientific research. The novelty and performance are the main concerns for this kind of research.

Pure applied researchers will often treat all later as theoretical research.

Methodology research

This is slightly more abstract than applied research, it usually does not directly focus on real-world problems, but a class of real-world problems that are similar, and they try to invent new methods or simpler methods for solving these problems, although the way they solve these problems may not entirely scientific (i.e. no rigorous mathematical proof, usually just intuitively true). Some of the researchers in this category will claim that they are doing theoretical research even not very scientific sometimes.

Similar to applied research, they also focus on novelty and performance, additionally, they demand more about the theory behind the problem, which means they often involve more math.

But better performance may always come with a price. That’s why when you talk with methodology and applied researchers, they often ask you, what are the advantages and limitations of your research?

Because the methodology research usually involves the intuitions to construct algorithms/models/…, sometimes their approaches are not entirely scientific. In other words, it may not have a rigorous theoretical guarantee. This happens very commonly in research about using the deep net.

Foundational research

This is something that lies between pure science research and methodology research. Their methods are often much more abstract, and require much more rigorous proof and many ideas are directly borrowed from pure science research.

Pure science research

The pure science researcher will often treat all formers as applied research.

Pure science research has a different focus compared to others. They remain curious about knowledge itself, hence some problems that they study are logic problems which may not have any practical use. For instance, the well-known Riemann hypothesis, may interesting in solving this problem, but may not have much direct impact on real words.

They often don’t care about applications (or don’t even know there exist implementations). It is for this reason that hardly to ask them, what is the advantages and limitations of your research.

Methodology or applied researchers may now come to say, well, I don’t agree with it, every research has its advantages and limitations. But it is quite hard to answer the limitation if someone proves the Riemann hypothesis or NP=P. I guess the only disadvantage might be, that it is hard to understand the proof. But is this really a disadvantage? Or how do you define disadvantages? That’s why there is hardly exist discussion about the advantages and limitations of pure mathematical paper.

The importance of rigorous proof for scientific research

A student asked him:” We can not prediction about the AI model, how it can be reliable?”

His answer is probably the nicest answer I’ve heard, he answered:” You trust a human driver is because his brain is explainable or a well-trained driver or experienced driver will do a good job at a very high probability empirically .” He also gave an airplane example, saying we trust airplanes because it is been tested thousands of times not because we have sufficient physics law.

I like his answer, but I will also give some objections here. The objection just directly comes from the definition of science above. If you have no physics lay, what you did is just engineering. If there is no physics law, we cannot calculate how much weight an airplne can take and in what situation it will crash. A similar example like building a bridge. Do you really trust a cross-sea bridge without rigorous calculation about how much weight it can take? And you don’t want to know when it will break in some extreme situation? Do you trust the cross-sea bridge just because it has been “empirically tested” by humans?

Final remarks, we try to give theoretical guarantees, not just because we want to show how smart we are or how clever the approach is. We give a theoretical guarantee because we don’t want ourselves to make stupid mistakes, and humans are so easily make stupid mistakes, and history has told us about it. Especially in some high-stakes applications, the consequence of making a stupid mistake is serious. We trust a human driver because we trust an experienced driver will not make stupid mistakes, even if they make stupid mistakes, we have control to inform them. But a black-box machine is hardly ever truth-worthy.

Final remarks, in the trend of abusing neural networks in recent years, they discovered something empirically, that is, interpretability is not important. This seems to rebut the importance of math in scientific research, there is no proof that a mathematical elegant algorithm can guarantee you to obtain better results, it is true, but for low-stack applications only. Like identifying dogs and cats, interpretability does matter in high-stack problems. People pursue a tiny bit of increased accuracy to invent some models that will never be understood, it can be useful for identifying dogs and cats. But for high-stack problems, such as medical or criminal justice where data are insufficient and messy. The interpretability really matters! Because in these applications, it is more important to understand your data, higher accuracy will just overfit your data. For instance, if you said you improved some accuracy about something, that’s not interesting to knowledge itself, but if you say that you using an interpretative model to explain a chemical reaction or construct materials systematically, that’s the science, not just counting numbers and data.