Introduction to Statistics for Engineers with Python

With the increasing digitalization of the process industry, engineers must be equipped not only with core engineering principles but also with strong digital and computational skills (Proctor & Chiang, 2023). The growing popularity of machine learning (ML) and its broader field, artificial intelligence (AI), has further highlighted the need for engineers to develop a solid foundation in descriptive and inferential statistics, as well as in supervised and unsupervised modeling techniques (Pinheiro & Patetta, 2021). Statistical tests, long-standing tools in data analysis, offer interpretable results and well-defined hypothesis testing (Montgomery, 2012). They are particularly valuable for analyzing small datasets and determining the significance of relationships (Box et al. 2005).

On the other hand, ML techniques excel at uncovering patterns in complex datasets with intricate relationships that traditional statistics may overlook (Hastie et al., 2009). However, these methods often require larger datasets and computational resources. Moreover, the "black-box" nature of certain ML models, such as deep learning, can limit their interpretability, which is crucial for making informed decisions in process engineering (Rudin, 2019).

Random variables—such as the lifespan of a pump, the time required to complete a task, or the occurrence of natural phenomena like earthquakes—play a pivotal role in both everyday life and engineering applications (Forbes et al., 2011). The probability distribution of a random variable provides a mathematical description of how probabilities are assigned across its possible values. While statistical literature describes a vast array of distributions (Wolfram MathWorld), only a limited subset is commonly used in engineering, as highlighted by Forbes et al. (2011) and Bury (1999).

Statistical tools and tests are indispensable in engineering analysis. Common parametric tests like t-tests and ANOVA are widely used for comparing means and analyzing variance (Montgomery, 2012). Non-parametric tests, such as the Kruskal-Wallis test or the sign test, are particularly useful when data fail to meet the assumptions of normality (Gopal, 2006; Kreyszig et al., 2011). Regression analysis, another critical tool, enables the investigation of relationships between variables (Montgomery et al., 2021).

The current work emphasizes the application of statistics in engineering, leveraging Python as the computational tool of choice. Furthermore, it relies extensively on Python packages such as numpy and scisuit1. The scisuit's statistical library draws inspiration from R, enabling readers to transfer the knowledge gained here to R, a widely used software in the data science domain.

Download: Introduction to Statistics for Engineers with Python

Digitalization of Food Properties using Python with Applications

In broad terms digitalization refers to the process of utilizing digital technologies to change the core of business conduct. If correctly implemented, digitalization can lead to digital transformation and thereby improve productivity, reduce costs and pave the way for the future of manufacturing. A literature survey by Demartini et al. (2018) found that the topic of the digitalization in the food industry has been studied since 2016 with the common keywords associated "Factory of the future" and "Food". The authors stated that the food companies are slower to adopt digital technologies. Nonetheless like all processing industries, the food industry is also seeking ways to enhance efficiency, reduce costs and become more environmentally friendly. Adoption of digitalization can play a significant role in achieving these goals.

Digitalization of food is challenging and Britannica1 defines food as “substance consisting essentially of protein, carbohydrate, fat, and other nutrients …”. Not only food has a complex composition, but also comes in various shapes, colors, odors, etc. Therefore, to tackle the high level of complexity, some level of abstraction was required. Therefore, in this work food was considered to consist of macronutrients (carbohydrate, lipid and protein) and also water, ash and salt. This is consistent with USDA NAL Database which has compositional data of approximately 9000 food items. This level of abstraction facilitates various tasks. However, it does not exempt us from the complexity of carbohydrates, lipids and proteins that are divided into sub-groups with different physical and thermal properties.

In programming languages, two major trends can be seen: i) procedural programming, ii) object-oriented programming. Several object-oriented programming languages support operator overloading (e.g. C++, Python etc.). These languages allow definitions such as , therefore enabling construction of new food items from existing ones, such as after a mixing operation.

Foods are also subject to various operations that may require the knowledge of different physical properties. For example, to calculate the heat required to raise the food's temperature requires specific heat capacity(Cp) whereas heat transfer modeling requires thermal conductivity and Cp. For microwave processing, dielectric properties should also be known.

In this document, the complexities of calculation/automation of various food properties, as well as their use in food process calculations will be reduced by using the scisuit's open-source food class that can be found at GitHub2. Reduction in computational complexities not only will shorten the amount of work but will also enable to model a wider range of processes.

The target audience of this work is food process engineers. This document assumes that the reader already has some knowledge in food/chemical engineering concepts and basic to intermediate level of understanding of Python. The code used in this document was generated in a Windows 11 operating system using Visual Studio Code (1.83.1) environment, running Python 3.10.6. Detailed examples of applications in food process engineering will be presented.

Download: Digitilization of Food Properties

Solving Linear and Nonlinear Equations: An Insight for Engineers with Applications

Before the use of computers, there were several ways to solve algebraic and transcendental equations. In some cases, the roots could be obtained by direct methods, however many other equations could not be solved directly (Chapra & Canale 2013). The linear and nonlinear equations have arisen not only in many aspects of process engineering analysis but also in many machine learning (ML) and artificial intelligence (AI) methods. Therefore, mastering the methods to obtain the solutions of these equations is not only essential to be able to understand, analyze and design engineering systems but also enables engineers to tackle a range of ML/AI challenges, from predictive modeling to feature extraction.

Given the abundance of tools at our disposal, students generally question whether it is necessary to learn the methods presented in this work. The answer is, it depends. If one only needs to solve a single equation that can be conveniently plotted, then this approach would probably be the best. On the other hand, if the equation is generated by say Process A and Process B needs the solution to continue its computation then the need to numerically solve the equations arises. Even without the knowledge of any of the methods in the following sections, it is still possible to "numerically" find the root of a given function by writing a simple code. Let's work on finding the root of f(x) = x2-5 = 0 in the interval of [0, 4].


import numpy as np

n=1
while True:
    x = np.linspace(start=0, stop=4, num=10**n)
    y = np.fabs(x**2 - 5)

    index = np.argwhere(y < 1E-5)
    if len(index) == 0:
        n += 1
        continue
    print(f"Generated {10**n} numbers and root={x[index]}")
    break

Generated 1000000 numbers and root=[[2.23607]]

Note that somewhere between 100,000 to one million linearly spaced numbers has to be generated to be able to find the root with a tolerance of 10-5. Needless to say, this also means that the function has to be evaluated over 100,000 times.

If all we were interested in was finding the solution of the equation, then the above method should work fairly well; however, it drastically suffers from a performance point of view. If the solution was needed many times such approach could have been clearly the bottleneck. Of course, the above approach was a rather naive way of solving the equation but clearly demonstrated the need for better approaches. Let's try another way to solve the same problem:


f = lambda x: x**2 - 5

x0 = 5 #initial guess
length, TOL = 1, 1E-5
iter = 0 #number of iterations

while True:
    iter += 1

    fx = f(x0)
    if abs(fx) < TOL:
        break

    x0 = x0 - length if fx > 0 else x0+length
    length /= 1 if abs(fx)>1 else 2

print(x0, iter)

This approach is much better than our first attempt since the number of iterations to find the same root is reduced from over 100,000 to 24.

In the following sections, you'll find methods that, with the help of modern computers, make solving complex equations efficient. The target audience of the current work is engineers and therefore this document assumes the reader already has some background in calculus, numerical analysis and a basic to intermediate knowledge of Python programming language. Throughout, you'll encounter detailed, real-world engineering examples designed to bring these methods to life and deepen your understanding.

Solving Linear Nonlinear Equations
Test funcs (py)