Jekyll2020-08-02T09:26:21-07:00https://pswpswpsw.github.io/feed.xmlHomePh.D. student in Aerospace EngineeringShaowu Panshawnpan@umich.eduObservation on some similarity between Riesz-Markov Theorem and Birkhoff theorem2019-12-20T00:00:00-08:002019-12-20T00:00:00-08:00https://pswpswpsw.github.io/posts/2019/12/riesz-markov-and-brikhoff-erogodic-theorem<h1 id="two-theorems-at-a-glance">Two theorems at a glance.</h1>
<h2 id="riesz---markov-theorem">Riesz - Markov Theorem</h2>
<p><strong>Theorem</strong>: Let $X$ be a locally compact Hausdorff space. For any positive linear functional $\psi$ on $C(X)$, there is a unique regular Borel measure $\mu$ on $X$ such that</p>
<script type="math/tex; mode=display">\begin{equation}
\forall f \in C(X), \psi(f) = \int_X f(x) d\mu(x)
\end{equation}</script>
<h2 id="birkhoffs-ergodic-theorem">Birkhoff’s ergodic theorem</h2>
<p><strong>Theorem</strong>: Let $(X, \mathcal{B}, \mu, T)$ be a measure-preserving system ($T: X \mapsto X$ is measure-preserving transformation). For any $f \in \mathcal{L}_{\mu}^1$,</p>
<script type="math/tex; mode=display">\begin{equation}
\lim_{n\rightarrow \infty} \frac{1}{n} \sum_{i=0}^{n-1} f\circ T^i(x) = \int_X f d\mu,
\end{equation}</script>
<p>is true almost everywhere in $X$.</p>
<h1 id="discussions">Discussions</h1>
<p>First, both right hand sides are the same.</p>
<p>If we take the finite approximation of the left hand side on the second equation, denote $\mathcal{K}$ as the Koopman operator on the measure-preserving system associated with $T$, then we have</p>
<script type="math/tex; mode=display">\begin{equation}
\frac{1}{n} \sum_{i=0}^{n-1} f \circ T^i(x) = \left(\frac{1}{n} \sum_{i=0}^{n-1} \mathcal{K}^i\right) f \triangleq \bar{\mathcal{K}}_n f.
\end{equation}</script>
<p>Since $\bar{\mathcal{K}_n}$ is a linear operator (so as the corresponding limit) rather than a positive linear functional, one cannot directly apply RMT to obtain Birkhoff theorem. However, I guess the ergodic nature makes the linear operator evaluated pointwise resembles a linear functional. But there is still some difference that makes them quite different.</p>
<p>Just to take the note here to not confuse one with another.</p>Shaowu Panshawnpan@umich.eduSome difference and some similarity.Sensitivity of warm-start in computing MultitaskElasticNet path by coordinate descent in Sklearn2019-01-26T00:00:00-08:002019-01-26T00:00:00-08:00https://pswpswpsw.github.io/posts/2018/09/multi-task-elastic-net-path-sklearn-issue<h2 id="abstract">Abstract</h2>
<p>This post describes a phenomena that we encounter in computing MultitaskElasticNet path, i.e., computing the coefficients of MultitaskElasticNet model with sweeping sparsity regularization parameter $\alpha$. We solve this problem by manually setting up the warm starts and everything works as expected.</p>
<h2 id="what-kind-of-problem-does-multitaskelasticnet-solve">What kind of problem does MultiTaskElasticNet solve?</h2>
<h3 id="linear-regression">Linear regression</h3>
<p>Most useful scientific problem in daily life can be cast into linear regression problem if the features are well designed.
So let’s begin with a standard linear regression problem, with $N$ as number of data points, $M$ as the dimension of the data, $P$ as the number of features used. To find the $W$ with least square of the residuals, we simply solve the following problem,</p>
<script type="math/tex; mode=display">\begin{equation}
\min \lVert Y - XW \rVert^2_{F},
\end{equation}</script>
<p>where $X \in \mathbb{R}^{N \times P}$ are features, $Y \in \mathbb{R}^{N \times M}$ are targets, $W \in \mathbb{R}^{P \times M}$ are model coefficients.</p>
<h3 id="unique-and-sparse-solution-is-preferred-in-modeling-scientifc-problem">Unique and sparse solution is preferred in modeling scientifc problem</h3>
<p>In general, the above problem can be viewed as a linear inversion problem especially if the problem is ill-condition either due to lack of observations or the heavily correlated features. Thus simply solving the above least square problem won’t give us unique solution. However, most of the ground truth behind a inversion problem in scientic community is unique. So what can we do? The standard procedure is to consider regularization: let model to prefer a certain type of solution, for example, sparsity, which is ideal in most of our cases. This naturally leads to the development of <strong><a href="https://en.wikipedia.org/wiki/Lasso_(statistics)">LASSO</a></strong>. This would be helpful in noisy data since the model will not only consider MSE but also the sparsity of the solution $W$.</p>
<p>Further, for cases where features are correlated, even though the truth is a unique solution. Certainly there is no unique solution to the optimization problem, even LASSO the typical $L1$ sparsity is considered. To at least uniquely determine the solution, <strong><a href="https://en.wikipedia.org/wiki/Elastic_net_regularization">ElasticNet</a></strong> is proposed based on <strong>LASSO</strong> simply adding a $L2$ regularization. One needs to carefully tune the <strong>L2</strong> regularization though.</p>
<h3 id="multitask-learning-dominant-features-are-shared-across-different-tasks">MultiTask learning: dominant features are shared across different tasks</h3>
<p>Most of the time, if there is a multi-output (multi-task) linear regression problem, besides sparsity and uniqueness of the solution, another desired property often overlooked is: dominant features are <strong>shared</strong> across different tasks. Similar as before, one can come up with a loss function that considers preference for this desired property. For example, consider the following penalty on $W$
as
<script type="math/tex">\begin{equation}
\lVert W \rVert_{21} = \sum_{i} \sqrt{\sum_{j} W_{ij}^2 }.
\end{equation}</script></p>
<p>This <script type="math/tex">L_{2,1}</script> is first proposed by <a href="https://ttic.uchicago.edu/~argyriou/papers/mtl_feat.pdf">Argyriou et al.</a> in 2008. It can be thought as first compute the 2-norm for each row, and then compute 1-norm on the resulting norm vector. Following the similar spirit in LASSO, the second step encourages the <strong>the number of zero rows</strong> in the solution $W$, which is encouraging <strong>a small subset of features</strong>, i.e., common features across all tasks. Now. let’s upgrade the previous problem of <strong>ElasticNet</strong> into the following <strong>MultiTaskElasticNet</strong>, we have the new objective function,</p>
<script type="math/tex; mode=display">\begin{equation}
\frac{1}{N} \lVert Y - XW \rVert_{F}^2 + \alpha c \lVert W \rVert_{21} + 0.5 \alpha (1 - c) \lVert W \rVert_{F}^2
\end{equation}.</script>
<h2 id="does-it-work-a-toy-example-show-the-sensetivity-of-warm-start">Does it work? A toy example show the sensetivity of warm start.</h2>
<p>Download the case:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/pswpswpsw/example_sensetivity_initial_guess_MultiTaskElasticNet.git
</code></pre></div></div>
<p>where I have prepared a case with 1600 data points, for a two tasks regression with 14 features.</p>
<p>Our goal is to draw the path of the coefficient, i.e., find the optimal $W$ with each time varying the regularization coefficients $\alpha$ while keep $c$ fixed as 0.5. To optimize the aforementioned MultiTaskElasticNet loss function, we simply take the Sklearn implementation of <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.MultiTaskElasticNet.html">MultiTaskElasticNet</a>. Alternatively, one can also call the <a href="https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.enet_path.html">enet_path</a>.</p>
<p>Then</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>python test.py
</code></pre></div></div>
<p>It is surprise to see that the results are different between <strong>MultiTaskElasticNet</strong> and <strong>enet_path</strong> even with tuned optimization hyperparameters, say increasing the number of iterations. While the later reaches a better local optimum than the former one. Note that <strong>MultiTaskElasticNet</strong> calls the <strong>enet_path</strong>. So there must be something weird!</p>
<p>Recall that the algorithm in Sklearn is just a simple coordinate descent algorithm but usually sufficient and fast for linear models, the issue turns out to be whether or not reuse previous solution as initial condition. In default, <strong>MultiTaskElasticNet</strong> calls <strong>enet_path</strong> every single time but explicitly disable resuing the coefficient in default. However, directly using <strong>enet_path</strong> will enable the reusing of the coefficient.</p>
<h1 id="acknowledgement">Acknowledgement</h1>
<p>I thank <a href="https://www.linkedin.com/in/weitao-sun/">Alex Sun</a> for debugging to figure out the issue of initial condition and <a href="http://alexandre.gramfort.net/">Alexandre Gramfort</a> for noticing the warm_start could be an issue.</p>Shaowu Panshawnpan@umich.eduWe found warm-start strategy can be critical for computing MultitaskElasticNet path using the coordinate descent implemented in Sklearn.Can we use coefficient of determination for nonlinear regression?2018-09-27T00:00:00-07:002018-09-27T00:00:00-07:00https://pswpswpsw.github.io/posts/2018/09/note-coefficient<h2 id="abstract">Abstract</h2>
<p>This is a note of my thoughts on R^2 after taking Time Series Analysis class by Prof. Byon. I will make the following assumptions.</p>
<ol>
<li>scalar target: $y \in \mathbb{R}^1$</li>
<li>data is sampled i.i.d.</li>
</ol>
<h2 id="introduction">Introduction</h2>
<p>Coefficient of determination arose from the observation in <strong>linear regression</strong> that</p>
<script type="math/tex; mode=display">\begin{equation}
SST = SSR + SSE,
\end{equation}</script>
<p>where</p>
<script type="math/tex; mode=display">\begin{equation}
\textrm{sum of squared total variance: }SST = \sum_{i} (y_i - \overline{y})^2, \\
\textrm{sum of squared error: }SSE = \sum_{i} (y_i - \hat{y}_i)^2, \\
\textrm{sum of squared regression: }SSR = \sum_{i} (\hat{y}_i - \overline{y})^2.
\end{equation}</script>
<p>The proof is quite ubiquitous in any textbook or <a href="https://stats.stackexchange.com/questions/207841/why-is-sst-sse-ssr-one-variable-linear-regression">online materials</a>. With this equality, one can find a nondimensionalized version of it as, <strong>coefficient of determination , $R^2$</strong> as</p>
<script type="math/tex; mode=display">\begin{equation}
R^2 = \frac{SSR}{SST} = 1 - \frac{SSE}{SST}.
\end{equation}</script>
<p>Note that in the above equation there is two =, which of them is the definition for $R^2$ is not really certain. <a href="https://en.wikipedia.org/wiki/Coefficient_of_determination">Wiki</a> says it is the second one, while <a href="http://statisticsbyjim.com/regression/r-squared-invalid-nonlinear-regression/">Jim</a> claim the first one is more natural. Both of them is well defined and equal in the context of <em>linear regression</em>. In general, we believe the R^2 is <em>a statistic that measures how much proportions of variance of the target is explained by predictor variable, excluding the constant</em>. In such sense, the first one is more natural.</p>
<h2 id="appearance-in-nonlinear-regression">Appearance in nonlinear regression</h2>
<p>From my viewpoint, there are mainly two aspects of $R^2$ in the context of linear regression, that makes it popular.</p>
<ol>
<li>the <strong>nondimensional property</strong> i.e., we don’t have to worry about getting different performance measure over different datasets. In general, $R^2$ over 0.9 is a good indicator of well performed models. Note that if nondimensional property is not favored, for example, we simply interested in one datasets and we don’t have issue of measuring performance of models across different datasets with different scales, then simply one can choose to use $RMSE$ as commonly seen in machine learning and fluid dynamic community, or the so-called <a href="http://statisticsbyjim.com/regression/standard-error-regression-vs-r-squared/">standard error of regression</a>.</li>
<li><strong>variance explanation</strong> property. Note that in the context of linear regression, another equivalent phase for <em>explaining variance</em> is correlation coefficient. It can be shown that the square of Person correlation coefficient between $y$ and $\hat{y}$ is essentially $R^2$.. There are other alternative correlation coefficient out there but not really satisfies me anyway. I will write another post about a <em>potentially dumb/workable</em> nonlinear coefficient.</li>
</ol>
<p>However, as it has been mentioned a lot of times that in the context of <em>nonlinear modeling</em>,</p>
<script type="math/tex; mode=display">\begin{equation}
SST \neq SSR + SSE.
\end{equation}</script>
<p>Therefore, one need to <em>make a choice for</em> the definition for $R^2$. Most of the time, people like to use the one with SSE since minimizing SSE is what we do and the smaller the higher for $R^2$. Note that <em>it is implemented in</em> in <a href="https://github.com/scikit-learn/scikit-learn/blob/bac89c2/sklearn/metrics/regression.py#L448">Scikit-learn</a> as the SSE is well-defined for both <em>linear</em> and <em>nonlinear regression</em>. However, the variance explanation property might not be hold. Because of this issue, there are some negative viewpoint on the usage of $R^2$.</p>
<h2 id="the-difference-might-be-small-for-well-trained-nonlinear-models">The difference might be small for well-trained nonlinear models</h2>
<p>The key to make the equality lies in the following condition</p>
<script type="math/tex; mode=display">\begin{equation}
\sum_{i}(y_i - \hat{y}_i) (\hat{y}_i - \overline{y}) = 0, \\
\rightarrow \sum_{i}\epsilon_i (\hat{y}_i - \overline{y}) = 0.
\end{equation}</script>
<p>Note that the following two are sufficient conditions for the above,</p>
<script type="math/tex; mode=display">\begin{equation}
\sum_{i}\epsilon_i \hat{y}_i = 0, \\
\sum_{i}\epsilon_i \overline{y} = 0.
\end{equation}</script>
<p>The second one is easy, as long as model <em>takes constant as linear features</em></p>
<script type="math/tex; mode=display">\begin{equation}
\hat{y} = \alpha + g_{\beta}(x).
\end{equation}</script>
<p>One can show that the OLS solution corresponds to the extrema in $\alpha$ would lead to</p>
<script type="math/tex; mode=display">\begin{equation}
\sum_{i} \epsilon_i = 0.
\end{equation}</script>
<p>For the first one, we notice the following</p>
<script type="math/tex; mode=display">\begin{equation}
0 = \sum_i \epsilon_i \hat{y}_i = \sum_i \epsilon_i(\hat{y}_i - \frac{1}{N}\sum_j \hat{y}_j) \sim \mathbb{Cor}(\epsilon, \hat{y}).
\end{equation}</script>
<p>Note $\epsilon$ is the residual and clearly it is zero mean, if the data is sampled i.i.d, so the <em>explicitly unweighted</em> sum in the above is proportional to the <em>correlation between residual and prediction</em>.</p>
<p>For well-trained, models, <a href="https://www.sheffield.ac.uk/acse/staff/sab">Billings et. al.,</a> derived several criterions on determining whether the neural network model is well-trained or not. In the <em>equation 21</em> in <a href="https://www.sheffield.ac.uk/acse/staff/sab">their paper</a>, it shows the above <em>uncorrelation in the linear sense</em> is a condition required, which would certainly leads to
<script type="math/tex">\begin{equation}
SST \approx SSR + SSE.
\end{equation}</script>
Under the above sense, we should expect that one is able to use $R^2$ for nonlinear regression with the variance explanation property and nondimensionalized property.</p>Shaowu Panshawnpan@umich.eduA conjecture that justify the use of R^2.Adjusted $R^2$ is a weaker penalty than AIC2018-09-27T00:00:00-07:002018-09-27T00:00:00-07:00https://pswpswpsw.github.io/posts/2018/09/adjusted_r2_vs_aic<h2 id="abstract">Abstract</h2>
<p>This is a note of thought that I bumped into randomly.</p>
<p>It has been a long history in statistics on model selection that penalizes the exploding number of parameters. <strong>AIC, i.e., Alkaline information criterion</strong>, is perhaps the most famous one due to its simplicity and generality. While, when $R^2$ is introduced in the class, immediately adjusted $R^2$ is introduced. The later one does not follow the “variance explanation” per se since there is no guarantee about the ratio being kept in $[0,1]$. But, it is supposed to penalize large number of parameters by showing a lower $R^2$. In this post, I will show that adjusted $R^2$ has a weaker penalty than AIC/BIC criterion.</p>
<h2 id="introduction">Introduction</h2>
<p>Before discussion, let’s make the definition clear. $n \in \mathbb{N}$ is the total number of samples. $p$ is the number of predictors (excluding 1).</p>
<h3 id="adjusted-r2">Adjusted $R^2$</h3>
<p>The difference between common $R^2$ and adjusted $R^2$ is that, <strong>adjusted $R^2$ considers the variance explanation by taking independency into account.</strong> Therefore, the more parameter you have, the residuals of all data, would be <strong>less</strong> independent simply due to more and more constraints are imposed by the OLS formulation.</p>
<p>The expression for adjusted $R^2$ is</p>
<script type="math/tex; mode=display">\begin{equation}
\overline{R}^2 \triangleq 1 - \dfrac{SSE}{SST} \dfrac{n-1}{n-p-1}.
\end{equation}</script>
<h3 id="minimal-description-length-mdl">Minimal Description Length (MDL)</h3>
<p>AIC originates in information theory. In Bayes belief net, there is a criterion called <strong>Minimal Description Length (MDL)</strong>. One would like to choose the Bayes belief net models with the <strong>shortest</strong> MDL.</p>
<p>In general, for a given training set $D = { {x}_1,\ldots,{x}_m }$, the scoring function on Bayes net $B = \langle G, \Theta \rangle$ on the training set $D$ is</p>
<script type="math/tex; mode=display">\begin{equation}
s(B|D) = f(\theta)|B| - LL(B|D),
\end{equation}</script>
<p>where <script type="math/tex">f(\theta)</script> is the bits required to describe each parameter while <script type="math/tex">\vert B \vert</script> is the number of parameters in the Bayes net.</p>
<script type="math/tex; mode=display">\begin{equation}
LL(B|D) = \sum_{i=1}^m \log P_B({x}_i) \sim -\frac{m}{2} \log(\sigma^2) -\frac{1}{2\sigma^2} SSE \sim -\frac{m}{2}\log(SSE/m),
\end{equation}</script>
<p>which is the log-likelihood of all data and $\sigma^2$ is the uncertainty in the likelihood model. Note that $\sigma^2 = SSE/m$ as a MLE estimation for the residual variance. Remember that this implies i.i.d, i.e., independently identical distribution.</p>
<p>MDL induces several concepts which are shown below without proof. Note that the sources are from <a href="https://www.goodreads.com/book/show/31193897-machine-learning">here</a>.</p>
<ul>
<li>AIC</li>
</ul>
<script type="math/tex; mode=display">\begin{equation}
AIC(B|D) = |B| - LL(B|D) = p - \frac{1}{2\sigma^2} SSE,
\end{equation}</script>
<p>which assumes each parameter costs $1$ bit for description.</p>
<ul>
<li>BIC</li>
</ul>
<script type="math/tex; mode=display">\begin{equation}
BIC(B|D) = \frac{\log m}{2}|B| - LL(B|D) = \frac{\log m}{2}p - \frac{1}{2\sigma^2} SSE,
\end{equation}</script>
<p>which assumes each parameter costs $\log m /2$ bits for descriptions.</p>
<h2 id="adjusted-r2-penalize-weaker-than-aicbic">Adjusted $R^2$ penalize weaker than AIC/BIC</h2>
<p>Note that for model selection, we hope to select the one maximize criterion.</p>
<p>Start with $\overline{R}^2$, so it is equivalent to minimize the $\dfrac{SSE}{SST} \dfrac{n-1}{n-p-1}$. Since $SST$ is fixed by the given data and $n$ is also fixed. It is simply the one that minimize the</p>
<script type="math/tex; mode=display">\begin{equation}
SSE/(n-p-1)
\end{equation}</script>
<p>while it does not hurt to take the $\log$ and add/minus constants so
<script type="math/tex">\begin{equation}
\log SSE/n + \log \frac{1}{n(1-(p+1)/n)} = \log SSE/n + \log \frac{1}{n} + \log \frac{1}{1-(p+1)/n} \\
\sim \log SSE/n +\log \frac{1}{1-(p+1)/n}
\end{equation}</script></p>
<p>Second, for AIC/BIC, the equivalent quantity to minimize is</p>
<script type="math/tex; mode=display">\begin{equation}
2 C p + n \log (SSE/n),
\end{equation}</script>
<p>where $C = 1, \frac{\log n}{2}$ for AIC and BIC respectively.</p>
<p>Note that $n$ is a constant, therefore it is equivalent to minimize</p>
<script type="math/tex; mode=display">\begin{equation}
2 C p/n + \log (SSE/n) \sim 2 C (p+1)/n + \log (SSE/n).
\end{equation}</script>
<p>Clearly, the ratio between the $\log \frac{1}{1-(p+1)/n}$ and $2 C (p+1)/n$ determines the relative penalization between adjusted $R^2$ and AIC/BIC.</p>
<p>First, let’s investigate this ratio as follows</p>
<script type="math/tex; mode=display">\begin{equation}
f(x) = \frac{\log(\frac{1}{1-x})}{x} \ge 1, \forall x \in (0,1).
\end{equation}</script>
<p>To see this, note $f(0^+) = 1$ and one simply take $f’(x)$ as</p>
<script type="math/tex; mode=display">\begin{equation}
f'(x) = \frac{\frac{x}{1-x} - \log \frac{1}{1-x}}{x^2} \\
= \frac{ \frac{1}{1-x} - \log \frac{1}{1-x} - 1}{x^2} \ge 0, \forall x \in (0,1).
\end{equation}</script>
<p>Therefore, take $x = \frac{p+1}{n}$, we have it as a monotonic increasing function with respect to $(p+1)/n$ and when $p \ll n$, we have the ratio between adjusted $R^2$ and AIC/BIC as the following:</p>
<script type="math/tex; mode=display">\begin{equation}
\frac{\log(1/(1-(p+1)/n))}{2C(p+1)/n} \ll 1,
\end{equation}</script>
<p>where $C = 1, \frac{\log n}{2} $.</p>Shaowu Panshawnpan@umich.eduA simple calculus exercise.The ultimate way to postprocess OpenFoam data in Python (updated to Pyvista)2018-09-27T00:00:00-07:002018-09-27T00:00:00-07:00https://pswpswpsw.github.io/posts/2018/09/modify-vtk-openfoam<h2 id="abstract">Abstract</h2>
<p>In this post, I use <strong>foamToVTK</strong> in <a href="https://www.openfoam.com/">OpenFoam</a> to convert OpenFoam data into <strong>legacy VTK <a href="https://www.vtk.org/Wiki/VTK">(The Visualization ToolKit)</a>)</strong> format, then use <strong><a href="https://github.com/akaszynski/vtkInterface">vtkInterface</a></strong> for data manipulation in <strong>Python</strong> under <strong>Ubuntu</strong>.</p>
<h2 id="introduction">Introduction</h2>
<p><a href="https://www.openfoam.com/"><strong>OpenFoam</strong></a> is a popular open source code for <strong>computational fluid dynamics</strong> (CFD). Although it contains various helpful postpocessing modules in command line such as <strong>postProcess</strong>, it is still designed for convenience but not flexibility. For example, it only provides operations that are very common in the context of fluid mechanics or vector mathematics and it hides the details of operation such as the numerical scheme to approximate the derivatives. Most of the time, <strong>OpenFoam</strong> saves the data in folder named by the current time and in each folder contains a <strong>special OpenFoam format-txt like</strong> data, which is also designed for convenience such that one can directly read the result in the field data. However, if one wants to manipulate the data in a more flexible sense in the modern data-driven era, a Python-script-driven manipulation of data is extremely favorable. Also, to avoid dealing with the mesh in the script, if would be great if one can simply add the modified field on the original mesh.</p>
<p>Fortunately, with the help of an awesome Python package on <strong>Github</strong>: currently <strong><a href="https://github.com/pyvista/pyvista">Pyvista</a></strong> previously <strong><a href="https://github.com/akaszynski/vtkInterface">vtkInterface</a></strong> , originally by <strong><a href="https://github.com/akaszynski">Alex Kaszynski</a></strong>, one can easily leverage the powerful libraries in Python environment to postprocessing traditional, mature, standard and specialized scientific computing data and immediately put them back in to again, leverage the existing powerful visualization software in scientific computing community.</p>
<h2 id="using-pip-to-install-vtkinterface">Using pip to install vtkInterface</h2>
<p>Note that it supports well in Python 3.5+.</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>pip <span class="nb">install </span>pyvista
</code></pre></div></div>
<h2 id="tutorial-2d-flow-past-cylinder">Tutorial: 2D Flow past cylinder</h2>
<p>The material can be obtained from <a href="http://www.wolfdynamics.com">Wolf Dynamics</a> at this <a href="http://www.wolfdynamics.com/images/begtuts/vortex_shedding.tar.gz">link</a>.</p>
<h3 id="prepare-data">Prepare data</h3>
<ol>
<li>
<p><code class="language-plaintext highlighter-rouge">untar</code> the <strong>.tar</strong> file</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">tar</span> <span class="nt">-zxvf</span> vortex_shedding.tar.gz ./
</code></pre></div> </div>
</li>
<li>
<p>go to <code class="language-plaintext highlighter-rouge">c1</code> directory for running a standard case</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">cd </span>c1
blockMesh
checkMesh
icoFoam <span class="o">></span> log &
</code></pre></div> </div>
</li>
</ol>
<h3 id="convert-openfoam-default-format-to-vtk">Convert OpenFoam default format to VTK</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>foamToVTK
</code></pre></div></div>
<h3 id="using-python-to-manipulate-vtk-data">Using Python to manipulate VTK data</h3>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pyvista</span> <span class="k">as</span> <span class="n">vtki</span>
<span class="kn">import</span> <span class="nn">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="c1">## grid is the central object in VTK where every field is added on to grid
</span><span class="n">grid</span> <span class="o">=</span> <span class="n">vtki</span><span class="p">.</span><span class="n">UnstructuredGrid</span><span class="p">(</span><span class="s">'./VTK/c1_1000.vtk'</span><span class="p">)</span>
<span class="c1">## point-wise information of geometry is contained
</span><span class="k">print</span> <span class="n">grid</span><span class="p">.</span><span class="n">points</span>
<span class="c1">## get a dictionary contains all cell/point information
</span><span class="k">print</span> <span class="n">grid</span><span class="p">.</span><span class="n">cell_arrays</span> <span class="c1"># note that cell-based and point-based are in different size
</span><span class="k">print</span> <span class="n">grid</span><span class="p">.</span><span class="n">point_arrays</span> <span class="c1">#
</span>
<span class="c1">## get a field in numpy array
</span><span class="n">p_cell</span> <span class="o">=</span> <span class="n">grid</span><span class="p">.</span><span class="n">cell_arrays</span><span class="p">[</span><span class="s">'p'</span><span class="p">]</span>
<span class="c1">## create a new cell field of pressure^2
</span><span class="n">p2_cell</span> <span class="o">=</span> <span class="n">p_cell</span><span class="o">**</span><span class="mi">2</span>
<span class="n">grid</span><span class="p">.</span><span class="n">_add_cell_scalar</span><span class="p">(</span><span class="n">p2_cell</span><span class="p">,</span> <span class="s">'p2'</span><span class="p">)</span>
<span class="c1">## remember to save the modified vtk
</span><span class="n">grid</span><span class="p">.</span><span class="n">save</span><span class="p">(</span><span class="s">'./VTK/c1_1000_shaowu.vtk'</span><span class="p">)</span>
</code></pre></div></div>
<h3 id="visualize-the-new-field-in-paraview">Visualize the new field in ParaView</h3>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>paraview
</code></pre></div></div>
<p><img src="/images/blog-10-20-save.png" alt="Screen shot" /></p>Shaowu Panshawnpan@umich.eduGetting industrial standard CFD format data into Python is not that easy as it seems to be. I used a toolkit library and everything works like charm.Reading List From M.Jordan2018-09-26T00:00:00-07:002018-09-26T00:00:00-07:00https://pswpswpsw.github.io/posts/2018/09/reading-list<h2 id="note-on-mj-ml-reading-list">Note on M.J ML reading list</h2>
<h3 id="elementary">Elementary</h3>
<ul>
<li>Casella, G. and Berger, R.L. (2001). <strong>“Statistical Inference”</strong> Duxbury Press.</li>
</ul>
<p>For a slightly more advanced book that’s quite clear on mathematical techniques, the following book is quite good:</p>
<ul>
<li>Ferguson, T. (1996). <strong>“A Course in Large Sample Theory”</strong> Chapman & Hall/CRC.</li>
</ul>
<p>You’ll need to learn something about asymptotics at some point, and a good starting place is:</p>
<ul>
<li>Lehmann, E. (2004). <strong>“Elements of Large-Sample Theory”</strong> Springer.</li>
</ul>
<p>Those are all frequentist books. You should also read something Bayesian:</p>
<ul>
<li>Gelman, A. et al. (2003). <strong>“Bayesian Data Analysis”</strong> Chapman & Hall/CRC.</li>
</ul>
<p>and you should start to read about Bayesian computation:</p>
<ul>
<li>Robert, C. and Casella, G. (2005). <strong>“Monte Carlo Statistical Methods”</strong> Springer.</li>
</ul>
<p>On the probability front, a good intermediate text is:</p>
<ul>
<li>Grimmett, G. and Stirzaker, D. (2001). <strong>“Probability and Random Processes”</strong> Oxford.</li>
</ul>
<p>At a more advanced level, a very good text is the following:</p>
<ul>
<li>Pollard, D. (2001). <strong>“A User’s Guide to Measure Theoretic Probability”</strong> Cambridge.</li>
</ul>
<p>The standard advanced textbook is Durrett, R. (2005). <strong>“Probability: Theory and Examples”</strong> Duxbury.</p>
<p>Machine learning research also reposes on optimization theory. A good starting book on linear optimization that will prepare you for convex optimization:</p>
<ul>
<li>Bertsimas, D. and Tsitsiklis, J. (1997). <strong>“Introduction to Linear Optimization”</strong> Athena.</li>
</ul>
<h3 id="advanced">Advanced</h3>
<p>And then you can graduate to:</p>
<ul>
<li>Boyd, S. and Vandenberghe, L. (2004). <strong>“Convex Optimization”</strong> Cambridge.</li>
</ul>
<h3 id="linear-algebra">Linear Algebra</h3>
<p>Getting a full understanding of algorithmic linear algebra is also important. At some point you should feel familiar with most of the material in</p>
<ul>
<li>Golub, G., and Van Loan, C. (1996). <strong>“Matrix Computations”</strong> Johns Hopkins.</li>
</ul>
<p>It’s good to know some information theory. The classic is:</p>
<ul>
<li>Cover, T. and Thomas, J. <strong>“Elements of Information Theory”</strong> Wiley.</li>
</ul>
<h3 id="functional-analysis">Functional Analysis</h3>
<p>Finally, if you want to start to learn some more abstract math, you might want to start to learn some functional analysis (if you haven’t already). Functional analysis is essentially linear algebra in infinite dimensions, and it’s necessary for kernel methods, for nonparametric Bayesian methods, and for various other topics. Here’s a book that I find very readable:</p>
<ul>
<li>Kreyszig, E. (1989). <strong>“Introductory Functional Analysis with Applications”</strong> Wiley.</li>
</ul>Shaowu Panshawnpan@umich.eduA nice reading list of statistical learning (the elegant) aspect of ML