I will give a very short intro, in case you are not familiar with Pydantic. If we develop our Python class objects as derived from Pydantic’s BaseModel class, then we can have very helpful things like type validation, type hinting, JSON data serialization and so on for our class. e.g.
And this plays nicely with ORM as well.
If you are dealing with scientific or numerical data in Python, naturally you will use Numpy arrays. But how to handle Numpy arrays within Pydantic BaseModel?
We can define a custom type for our Numpy arrays using the Annotated
type. This will wrap around Numpy’s original ndarray
class. But we need to provide two additional things:
Sample code for this custom datatype MyNumPyArray
creation is given below:
Now, you can include numpy
arrays in Pydantic classes as given below:
pydantic.dataclasses
if you only require data validation from Pydantic for Numpy arrays. See, for example, https://stackoverflow.com/questions/70306311/pydantic-initialize-numpy-ndarrayWe will try to read this using 6 MPI processes. And we want each process to read a part of this data as shown below: (this is the same partitioning used when we wrote the data)
This partitioning is arbitrary and we could have chosen different ways to divide up the data among processes to read. Similar to the writing process, there are three approaches in MPI I/O which can be used to read this data:
As the name suggests, we need to calculate where each process should start reading its data (‘offset’) and the length of the data to be read in by that process (‘count’). So, proc 0 should read at the beginning of the file (3 characters to be read), proc 1 should start writing after 3 characters (5 characters to be read), proc 2 should start reading after 8 characters (3 characters to be read) and so on. This can be done by the following piece of code:
‘disp’ denotes the offset calculated and ‘arr_len_local’ is the count (length) of the data to read in. Once the data is read in, the MPI processes will have the following data in their ‘test_txt’ array:
While the explicit offset approach seems quite straightforward, it is preferable to use the individual file pointer method for more complex datasets and partitioning. This is discussed next.
Since we have explained the individual file pointer method in the previous post, we shall only outline the approach here. Instead of calculating individual file pointer locations manually, we define a new global datatype to represent the data partitioning. This is done by:
Once the global datatype is created like this, reading the data in parallel is very simple as shown below:
In the shared file pointer approach, we specify simply the ‘count’ of the data to be read by each process and let MPI calculate the offsets for us. This can be achieved by the following code snippet:
Of course, this will come with a performance penalty as the shared file pointer is synchronized internally by MPI. The approach to use for reading/writing should be tested and its performance evaluated before deploying for production use in a software.
For demonstration purposes, we have chosen to read/write some character array in the examples here. We shall look into writing/reading general numerical data in some complex data partitioning in the next articles in this series on MPI I/O.
]]>MPI I/O fixes these problems and provides an elegant solution for MPI applications. It is very easy to read/write data in parallel using MPI I/O if you are familiar with the basics of MPI communications (viz. point-to-point, collective). In this article, I will show you how to do this with a simple example. The entire code (written in C) can be found in full in my Github repo MPI_Notes/MPI_IO.
Parallel file writing can be thought of in the following way. Say, we have 10 people trying to write something to the same notebook. Instead of passing the notebook to them one by one, we want to tell everyone where they should start writing (e.g. the page number) their part so they can all write at the same time. Obviously, we don’t want any of them to overwrite what others have written. So we need to have the page numbers for each person calculated correctly based on:
In the case of POSIX parallel I/O, we would simply be giving separate notebooks to each person and let them write whatever they want. So, we don’t have to calculate the above things. But then, it is a lot of notebooks to carry around! It may not be possible to distribute and collect back these many notebooks in a short time as well.
Let us assume that we have 6 MPI processes in our code. And each of them have some data in a character array as shown below:
We shall ask this data to be written to a single file. And we expect the final data to be in order of the ranks, thus listing all the alphabets in sequence. There are three ways to do this in MPI I/O:
As the name implies, we simply calculate the location where each process needs to write. This is usually specified as an ‘offset’/’displacement’ from the beginning of the file. In our example, proc 0 should write at the beginning of the file (after 0 characters), proc 1 should start writing after 3 characters (that would be written by proc 0), proc 2 should start writing after 8 characters (to take into account data from proc 0 and proc 1) and so on. Clearly, every process needs to know only where it should start writing. Additionally, we also need to inform MPI about how much data would be written.
Both this information should be supplied in the units of the datatype we are writing. In this example, the data is made of characters (these are MPI_CHAR type). The following code snippet opens/creates a file for parallel access by MPI I/O and the writes the data using the explicit offset approach collectively: (Full code here)
Once we run the program with 6 MPI processes, we will have the file “file_exp_offset.dat” written to disk. As we have written character data, we can view the contents of this file in any text editor. The file contains the following:
When a file is opened using MPI_File_open() command, MPI creates an individual file pointer for each MPI process so that it tracks the position in file of that process. It is very similar to the file pointer in C, for example, but maintained for each MPI process separately. Note that this pointer could be in different position in the file for different processes. Using this individual file pointer, it is possible to read/write data conveniently to file in parallel. While it is possible to set the individual pointers manually per process, the most common way is to inform MPI about the global view of the data we are planning to write. Let me explain.
In our example, this can be achieved by:
The above snippet creates a MPI datatype called ‘char_array_mpi’ which describes the global view of the data. ‘total_len’ is the total length of global data, ‘arr_len_local’ is the length of the data in the current process and ‘disp’ gives the displacement of the local data in the global array that we described earlier.
Once the global datatype is created, we inform MPI I/O by setting a ‘File View’ as follows:
‘File View’ essentially changes the way the file is accessed. MPI now knows that our read/write operations are towards reading/writing data of type ‘char_array_mpi’ and each individual process will contain only the local data for this datatype. It also additionally knows that every element we shall write is of type MPI_CHAR. This command also moves the individual (and shared) file pointer positions to the correct locations as per the global datatype (and redefines them as zero position of those pointers).
After setting the ‘File View’, we can write our character arrays to file using:
The complete code snippet is:
While this may seem a bit tedious, this approach can be used to write more complex datatypes and partitioning commonly used in MPI applications. Also, note that we can repeatedly write the same datatype again if needed without much additional effort. If we want to read/write a different type of data, we need to change the ‘File View’.
One important point to note is that individual file pointers are what they claim to be, only ‘individual’. The processes do not know about how individual pointers of other processes move about. There are some situations where every MPI process needs to have a synchronized file pointer. For this purpose, MPI maintains a single shared file pointer for every MPI I/O file opened. When a read/write is done using this shared file pointer, every MPI process knows about a change in the shared file pointer position. For the sake of completeness, we shall demonstrate the collective writing of data using this approach in our example. The following code snippet writes using the shared file pointer:
Note that the displacements are calculated on the fly by MPI in this case as we are using the shared file pointer (because of the call to ‘MPI_File_write_ordered’). But this will come with a performance penalty as well.
Please go through the entire code here as it provides all the details. I have simplified a lot of things in this article but this is enough to get you started with MPI I/O. In the next article, we shall see how to read back the data in parallel.
]]>We shall look at the following approaches:
The first case I will discuss is the ‘Slab decomposition’ strategy.
I used this approach in my Direct Numerical Simulation (DNS) code to simulate turbulent boundary layers. Essentially, we need to simulate the flow in a box as shown below.
And the aim is to simulate this flow using as many processors as possible. I chose the MPI (the standard for distributed parallel computing) approach. The processors work independently in this model on their data and communicate with each other when they need to exchange data/information.
In this problem, the computation needs to be performed for every point (discretized) in the rectangular box. Fortunately, all the parts of the box have almost the same amount of computation/workload. There are special treatments required for the bounding surfaces for boundary conditions. But this is very minor compared to the majority of computational workload at individual points. So, it makes sense to divide the flow domain into equal parts and distribute the parts among processors. This is critical since if one of the processors is overworked, it would create a bottleneck for other processors continuing on with their calculation. Note that the computation is inter-dependent and exchange of data is necessary (to be discussed later). The act of making sure each process handles an equal share of the computation and no bottlenecks arise due to this division of workload is called “load balancing”. In this particular case of flow in a box, it so happens that dividing the flow domain equally would lead to equal workload for each process.
So, how does one go about dividing the box domain so that the processors work efficiently together? A well known approach for such simulations is the ‘slab decomposition’ shown below.
We split the entire domain into ‘n’ slices in the x-direction, just like cutting a loaf of bread. Each slice will be given to a processor/core. The grid and variable values associated with a given part is available only in the processor owning that part. Throughout the simulation, we expect to maintain this association.
If the slices are totally independent of each other and can proceed with their part of the calculation without depending on data from neighbouring slices, then the job is literally done. But this is not usually the case. The calculations in a slice will depend on neighbouring slices which reside in another processor’s control. Particularly, it would be necessary to access at least the data from points on the neighbouring slice’s surface. The idea of ‘halo cells’ is introduced for this purpose which stores the neighbour’s data (only the required part). How this data communication is handled is very important as it will play a crucial role in overall speed of the simulation. It is possible in MPI to introduce what is known as “Cartesian topology”. This will help considerably in the ‘halo cell exchange’ of data from neighbouring processors.
Additionally, there may be parts of the simulation where it might be necessary to have the slab decomposition done in another direction instead of the original x-direction. For example, the slabs created in the z-direction is shown below.
To achieve this configuration, we have to ‘transpose’ the necessary data from the x-wise slabs to z-wise slabs. Lots of data exchange/communication is involved in this process. So, it would be wise to do this only when absolutely necessary. ‘Transposition’ should be done efficiently as well.
(to be continued…)
]]>Another movie I watched was “Master and Commander: The Far Side of the World” by Peter Weir. It is about a british captain (played by Russell Crowe) of a small warship trying to maniacally battle a French warship all around South America in in the atlantic and pacific oceans at the turn of the 19th century (fictional story). The thing that most impressed me was that these were both sailing ships. Once again, I started wandering the internet about how the sailors were relying on trade winds to navigate the oceans.
Being a student of fluid mechanics, I started digging deeper and found a treasure trove of books on physical oceanography. The first thing a person must appreciate when trying to study the oceans is the Earth’s rotation. This leads to the Coriolis force as we are observing Earth while moving along with its rotation. The Coriolis force plays a major role in creating the peristent ocean currents and the trade winds. Though I have had an acquaintance with geophysical fluid dynamics, it always takes a spark to make one go mad about a subject. This spark came from Henry Stommel’s great little book “Science of the Seven Seas” (freely available here). The way Stommel has introduced the oceans and its mystery to the reader is wonderful. Any high school student or undergrad picking up this elementary book would really feel what Stommel describes as “the call of the sea”. The fluid mechanics of the ocean and the atmosphere is facinating. While the fluid mechanics as a subject is quite mature with a long illustrated history, the scale of the oceans and the atmosphere leaves our computations and understanding falling short of satisfaction. Of course, this only gives impetus to the thousands of scientists working in this area to search deeper into nature. Henry Stommel was one of the great scientists in this field and his popular introduction definitely leaves a lasting impression about the subject and will inspire many to take up the oceans for their study and lifelong pursuit. I also highly recommend his more technical works (see here).
]]>What is the Fourier transform, really?
For some, it is the magic of seeing everything as waves. For others, it is like holding a prism in a beam of sunlight and seeing what it contains, a rainbow of colors. For some others, it is a tool to visualize what a piece of sound recording contains and adjusting it to make it sound better.
One can even take the romanticism out of the concept and say, it converts a bunch of given numbers into another bunch. It is through doing this that many important uses of Fourier transform can be realized. And this is precisely where many miss the woods for the trees. I hope to bridge both sides of this amazing idea. Let us begin!
When we have a clearly defined function, as shown in the plot below, everything is straightforward.
This means that we have a known expression for how the function depends on the variable x
:
All the nice definitions of the Fourier transform become useful and if the integral is not monstrous, we would have a good looking expression for the Fourier transform of the function. Life is tricky and usually we only have some values of the function at hand. Like:
Most of the time, we just have the data points as shown by the red dots. You can imagine a function passing through the data like the blue line. But it does not matter. Now, it is just us and the red dots. The entire business of discrete Fourier transform (DFT) is to take the Fourier transform of such a bunch of data points. Before I throw some formulas at you, there are some ground rules to cover.
The universe plays the game strict and fair. If you have $N$ values in the data set, you will only get back $N$ number of values from the DFT. If you are getting anything more than $N$ values, then some of them are definitely redundant.
The most common data points are spaced (sampled) uniformly as equi-distant points in $x$. This is usually the case since most of the digital systems sample signals at a given rate. We shall always assume this to be true in our discussions. If your samples are non-uniformly spaced, you might want to look elsewhere.
The DFT models the data (and the underlying function - the blue line) using sines and cosines. Not just any sines and cosines. Some that are chosen particularly for the current dataset.
Euler’s identity is the key to everything and never lose sight of it:
With all this preamble aside, let us look at how DFT is defined for a given set of $N$ data points (like the red dots in the plot above). We will list them as:
\[f_0,f_1,f_2 . . . , f_{N-1}\]And it is also given that these points are spaced at intervals $ \Delta x$.
Now, we shall use a trick to tell the DFT that our set of points is periodic, even though they are actually not. Imagine that the given set of points are repeated infinitely on both sides, like:
$ …f_0,f_1,f_2 . . . , f_{N-1}, f_0,f_1,f_2 . . . , f_{N-1},\ f_0,f_1,f_2 . . . , f_{N-1}, f_0,f_1,f_2 . . . , f_{N-1},f_0,f_1,f_2 . . . , f_{N-1}…$
This would make the graph look like:
We have got the function looking like a periodic function. But what is the period of our function (the repeated pattern in the plot above)? How long does a single period span in $x$? This is easy. Take a look at the blue part of the curve above which corresponds to one period. We see that this part contains all the points $f_0,f_1,f_2 . . . , f_{N-1}$, but we also need to connect to the adjacent point (from the start of green curve). So, we actually have to include another point in our dataset (to make it periodic) which is given by $f_N = f_0$. This is important to calculate the period. Now, there are in total $N+1$ points with the number of intervals among them as $N$. The interval spacing is $\Delta x$. So, the period is,
\[L = N \Delta x \Delta s=1\]But we usually do not include the last point $f_N$ in the dataset to avoid redundancy (since it simply repeats the $f_0$ value again). We just call our function periodic (in the sense given above) and this is enough.
Now that we got that clarified, the DFT for our data points is given by the strange looking formula:
\[F_k= \frac{1}{N}\sum\limits_{n = 0}^{N-1} f_n e^{-i 2\pi n k/N}\qquad k = -(N/2)+1,..,-2,-1,0,1,2,...,N/2\]We need to understand what this formula says in all its glory. $F_k$’s are the Fourier coefficients and there are in total $N$ of them corresponding to as many Fourier modes. The above formula is actually $N$ similar looking formulas abbreviated into one. The symbol $k$ stands for the wavenumber/frequency associated with the Fourier mode. We will see what this all means in the next post.
P.S.: Have you noticed that the period $L$ does not even appear in this equation?! Have we wasted time in understanding the period of our function (in a strange made-up sense explained above)? NO!! You will later see that it plays a crucial role in understanding the DFT and appears explicitly in the derivative of our function if we are interested in that.
Next article: (In preparation)
Previous article: 1. Fourier transform for confused engineers
]]>Plot the power spectral density of a signal you have
Find the dominant frequency/mode in a signal
A journal paper stating, “It is obvious(?!) that this operation is straightforward to carry out in wave space”.
I have been in many such instances myself and referred several texts on Fourier analysis. There are some wonderful ones, don’t mistake me, but I have always felt that it is not straightforward to see the implications of the theory in the output of a Fourier transform function spit by a program/library in Python/Matlab. There are many equivalent ways in which the Fourier transform can be formulated, computed and interpreted. Throw the dreaded complex numbers in the mix , it is quite normal to feel lost. The feeling of not fully comprehending the idea, and yet repeatedly using it like a black box is frustrating. Having once been lost myself and now with a good grip on the means and ends of this wonderful mathematical tool, I attempt to write about this often-explained but rarely-understood method. This is mainly for my satisfaction, and if it helps someone understand this tool with a bit more of clarity, I will take that too.
This is the start of a series of posts on how to numerically compute and interpret the Fourier transform of signals/images/fields. Let’s dive right in!
Next article: 2. What is the Fourier transform?
]]>The installation procedure follows what is given in CGNS website, but with some important tweaks to get it to succeed installing in Ubuntu 18.04.
Download the CGNS source code into some directory.
Currently, this downloads CGNS 3.4.0 source code into a directory named CGNS
in the present location. Although you can install HDF5 separately, and possibly a newer version from HDF5 website, I strongly advise against it. We want the CGNS library to work with our HDF5 installation. So, it is best to install the HDF5 version suggested by CGNS.
Change into the CGNS directory.
There are install scripts provided in the bin
folder. Three of them in fact, one to install HDF5 (./bin/install-hdf.sh
), another to configure CGNS for our system (./bin/config-cgns.sh
) and the last one to install the configured CGNS (./bin/build-cgns.sh
). Of course, we need to do the installation in the order I have listed.
You can choose to use the install-hdf.sh
script. Some options inside that script failed for me. So, instead, use the following commands:
This will download the HDF5 v1.8 into the newly created directory hdf5_1_8
. Change into this directory and use the following command:
Note that I am choosing to install HDF5 in the location $HOME/hdf5
, you can choose any other convenient location as well. But remember, this is the location you need to link any program trying to compile with HDF5 libraries.
The installation of HDF5 will take some time, and finally it will output the HDF5 configuration installed in the system. I have configured HDF5 to include Fortran bindings as well by the flag --enable-fortran
. If you only want C binidings, you can replace that by --disable-fortran
.
After this, we need to configure CGNS using the ./bin/config-cgns.sh
script. The default script looks like this:
Edit it to look like this:
Apart from some specific modifications to make it work for Ubuntu, I have also specified the CGNS installation directory as $HOME/cgns
. You can change this to any other convenient location. But again, this is where the built CGNS libraries and headers will be kept. You will be required to link to this location when compiling and linking programs with CGNS.
(If you have installed the HDF5 to some location other than $HOME/hdf5
, you should specify that here in the flag --with-hdf5=
)
Run this configure script using:
This will configure the CGNS for your system.
Finally, install CGNS using:
That’s it. You have installed HDF5 & CGNS in your Ubuntu system at $HOME/hdf5
and $HOME/cgns
respectively.
There are test codes available inside the downloaded CGNS repo at: ./CGNS/src/Test_UserGuideCode
. They would have been tested during the build process as well.
If you have a program test.c
which uses CGNS functions, then you would want to compile and link the program as follows:
test.out
is the final executable created for the program.
Notice that we have chose not to install the utility tools for CGNS by choosing the flag --disable-cgnstools
in out configure script. Enabling this fails for me. I tried to install cgnstools using various other methods as well, but it fails for me in Ubuntu 18.04.
HDFVIEW
If you want a GUI for viewing your CGNS/HDF5 files, you may want to use HDFVIEW software. Note that the hdfview
available in ubuntu repositories does not work. Even the source code compilation process always fails in ubuntu. Just download the prebuilt binary for CentOS7 from the HDF website. And run the HDFView-3.1.0-Linux.sh
script inside the downloaded archive to create the binary. It works properly in Ubuntu 18.04. I found this useful tip from Michael Hirsch’s blog.
$ f’’’+\frac{m+1}{2}f f’'+m\left[1-(f’)^2\right]= 0 $
where m is a constant representing the pressure gradient parameter. Our objective is to solve this differential equation for $ f(\eta)$ , for a given value of m using the boundary conditions,
$ f(0)=0 \quad \rightarrow \text{no wall transpiration}$
$ f’(0)=0 \quad \rightarrow \text{ no-slip condition at the wall}$
$ f’(1)=1 \, \rightarrow \text{free-stream velocity is reached at the edge of the boundary layer}$
Generally, initial value problems (IVP) are preferred for ODEs. In the case of IVP, We will be given where to start and which direction to proceed. Then we use the differential equation to progress in that direction in a step by step manner. But here we have a boundary value problem. Shooting method is used in situations where a boundary value problem has to be solved using initial value methods. The method is described as follows.
Guess two values for $ f’'(0)$.
Solve the FS equation using RK4 method with initial conditions $ f(0)=0, f’(0)=0, f’'(0) = Guess1$.
Solve the FS equation using RK4 method with initial conditions $ f(0)=0, f’(0)=0, f’'(0) = Guess2$.
Find out the resulting boundary value $ f’(1)$ from both these solutions.
If the boundary value $ f’(1)$ is different from the required value $ f’(1)=1$, find a better initial guess using Secant method.
Solve the FS equation by RK4 method using the new initial value guess (obtained from secant method).
Repeat the process until the required boundary value $ f’(1)$ is obtained.
A Fortran program for solving the Falkner-Skan equation implementing the above algorithm is provided below. (Click on any part of the code and use right arrow key to scroll to right).
Line 114 in the above code calls the gnuplot program FS.plt to plot the solution. The most relevant plots for fluid dynamics in this problem are the streamwise velocity profile $ f’(\eta)$ and the wall-normal velocity profile given by,
$ \frac{v}{U}=-\frac{1}{2\sqrt{Re_x}}\left[ (m+1)f+(m-1)\eta f’ \right] $
Sample results for the case m=0 (Blasius boundary layer) are shown below.
“Sound is a wave.. Light is a wave.. Of course we see water waves.. “
Thats the usual answer we think about when asked that question. Moreover, we are told that sound waves are longitudinal waves (compression waves) and light waves are transverse waves. But we do not often ask the question ‘Why?’. Why are sound waves longitudinal?. I give a short answer and a lengthy one.
Both light propagation and sound propagation (in air or water) are governed by the same wave equation. But in case of a light wave or traveling waves on a string, the variable governed by the wave equation is the disturbance itself. Thus leading to a transverse wave. In the case of a sound wave, the variable governed by the wave equation is the velocity potential $\phi$. This is related to the disturbance velocity in the following way,
\[\vec{V}=\nabla \phi\]And from vector analysis, it is easy to show that the gradient vector $ \nabla \phi$ is perpendicular to constant lines of the original field $ \phi$. Hence, even though $ \phi$ propagates like a transverse wave, the disturbance velocity $ \vec{V}$ propagates like a longitudinal wave.
Waves are everywhere around us. In simple terms, a wave is a disturbance propagating through a medium (say, air or water). Most wave phenomenon we see in nature are governed by the so-called wave equation,
\[\frac{\partial^2 \psi}{\partial t^2}=c^2\frac{\partial ^2 \psi}{\partial x^2}\]The solution to the above equation is any (good!) function traveling in the x-direction with speed c. Formally written as $ \psi=f(x\pm ct)$. For example, in the case of a ripple in a pond, a change in the height of the water plays the role of the disturbance. This change propagates through the pond from the source that created the disturbance. In this case, $ \psi $ is the displacement of water from the undisturbed level.
Similarly, for a light wave, the thing that changes is the electromagnetic field. And this change is propagated through space. There are two types of waves. Longitudinal waves and transverse waves. In the example given above, the change in the water level is propagating outwards throughout the pond. But the change itself makes the water go either up/down at a given location. Thus we say, the wave propagates in one direction and the displacement of the medium is at right angles to that direction. This kind of wave is called a transverse wave. An illustration of a transverse wave is given below: (taken from Wikipedia)
On the other hand, we have longitudinal waves where the displacement of the medium occurs in the same direction as that of wave propagation. This is illustrated in the following image: (taken from Wikipedia)
When sound propagates from left to right, the air molecules are compressed and rarefied just like the vertical grid lines in the above figure. During sound propagation in the x-direction, the velocity potential obeys the wave equation.
\[\frac{\partial^2 \phi}{\partial t^2}=c^2\frac{\partial ^2 \phi}{\partial x^2}\](This equation is derived from the Navier-Stokes equation after linearization and some other assumptions).
The solution to the above equation is given as,
\[\phi=f(x\pm ct)\]From this solution and the definition of the velocity potential ($ \vec{V}=\nabla\phi$), we can find the disturbance velocity field as,
\[u=f'(x \pm ct)\] \[v=0\] \[w=0\]Thus we see that the transverse wave of $ \phi$ in the x-direction leads to a disturbance velocity involving only the x-component velocity u. So the air/water molecules are displaced in the same direction as the direction of propagation.
Sound waves propagate as longitudinal waves in fluid media. In solids, however, they travel both as longitudinal and transverse waves.
]]>