Yi's Knowledge Base
http://www.y1zhou.com/
Recent content on Yi's Knowledge BaseHugo -- gohugo.ioen-us2019-{year}Mon, 26 Apr 2021 10:20:00 -0500Naive Bayes
http://www.y1zhou.com/series/data-mining/data-mining-naive-bayes/
Wed, 03 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-naive-bayes/We talk about one of the simplest classification methods, naive Bayes classifiers, and its applications in text classification. It’s not really machine learning as we only need a single pass through the data to compute necessary values.Introduction to Bayesian Statistics
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-introduction/
Wed, 13 Jan 2021 10:20:00 -0400http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-introduction/First lecture of the course, and a brief history of Bayesian statistics.Introduction
http://www.y1zhou.com/series/time-series/time-series-introduction/
Fri, 28 Aug 2020 19:05:11 -0400http://www.y1zhou.com/series/time-series/time-series-introduction/We introduce some basic ideas of time series analysis and stochastic processes. Of particular importance are the concepts of stationarity and the autocovariance and sample autocovariance functions.Matrices
http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrices/
Wed, 26 Aug 2020 15:14:34 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrices/Matrix algebra plays an important role in many areas of statistics, such as linear statistical models and multivariate analysis. In this chapter we introduce basic terminology and some basic matrix operations. We also introduce some basic types of matrices.Estimation
http://www.y1zhou.com/series/linear-model/linear-models-estimation/
Mon, 30 Sep 2019 13:46:57 -0400http://www.y1zhou.com/series/linear-model/linear-models-estimation/In this chapter we introduce the concept of linear models. We use the ordinary least squares estimator to get unbiased estimates of the unknown parameters. $R^2$ is introduced as a measure of the goodness of fit, and the different types of sum of squares in a linear model are briefly discussed.Basic Concepts
http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-basic-concepts/
Wed, 25 Sep 2019 11:05:06 -0500http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-basic-concepts/Introducing the concept of the probability of an event. Also covers set operations and the sample-point method.Basic Concepts
http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-basic-concepts/
Fri, 25 Jan 2019 22:50:34 -0500http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-basic-concepts/A brief introduction to what we’re going to discuss in later chapters.Conditional Probability
http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-conditional-probability/
Thu, 26 Sep 2019 11:51:56 -0500http://www.y1zhou.com/series/maths-stat/1-probability/mathematical-statistics-conditional-probability/Introducing conditional probability and independence of events. Bayes' rule comes in as well.Fundamentals of Nonparametric Methods
http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-fundamentals/
Fri, 25 Jan 2019 22:50:34 -0500http://www.y1zhou.com/series/nonparam-stat/1-introduction/nonparametric-methods-fundamentals/Some basic tools such as the permutation test and the binomial test. We also introduce order statistics and ranks, which will come in handy in later chapters.Text Classification with Naive Bayes and NLTK
http://www.y1zhou.com/series/data-mining/data-mining-text-classification-naive-bayes-nltk/
Tue, 09 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-text-classification-naive-bayes-nltk/In the last post we talked about the theoretical side of naive Bayes in text classification. Here we will implement the model in Python, both from scratch and utilizing existing packages.
The corpus we use is a 26-line poem by T.S. Eliot. In each line a dummy string “ZZZ” or “XXX' has been inserted, representing the class of the line (“ZZZ” for class 0 and XXX for class 1).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 corpus = [ "And indeed there will be time ZZZ", "For the yellow smoke that slides along the street XXX", "Rubbing its back upon the window-panes ZZZ", "There will be time, there will be time ZZZ", "To prepare a face to meet the faces that you meet XXX", "There will be time to murder and create ZZZ", "And time for all the works and days of hands ZZZ", "That lift and drop a question on your plate ZZZ", "Time for you and time for me ZZZ", "And time yet for a hundred indecisions XXX", "And for a hundred visions and revisions XXX", "Before the taking of a toast and tea ZZZ.Frequentist Inference
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-frequentist-inference/
Mon, 18 Jan 2021 10:20:00 -0400http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-frequentist-inference/A simple problem in the binomial setting solved under the frequentist view of statistics.Linear Dependence and Independence
http://www.y1zhou.com/series/linear-algebra/2-linear-dep-and-indep/
Mon, 31 Aug 2020 12:33:24 -0400http://www.y1zhou.com/series/linear-algebra/2-linear-dep-and-indep/A short piece on linearly dependent and independent sets of vectors.Autoregressive Series
http://www.y1zhou.com/series/time-series/2-arma/time-series-autoregressive-model/
Fri, 28 Aug 2020 20:31:05 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-autoregressive-model/We talk about autoregressive models of different orders, and introduce their mean, variance, ACF and PACF values. Its stationarity is also briefly discussed.Definitions for Discrete Random Variables
http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-discrete-rv-definition/
Sun, 06 Oct 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-discrete-rv-definition/The probability mass function, cumulative distribution function, expectation and variance for random variables.Location Inference for Single Samples
http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-single-sample-location-inference/
Tue, 26 Mar 2019 21:12:45 -0500http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-single-sample-location-inference/The Wilcoxin signed rank test explained.Moving Average Model
http://www.y1zhou.com/series/time-series/2-arma/time-series-moving-average-model/
Fri, 04 Sep 2020 20:31:13 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-moving-average-model/The mean, variance, ACF and PACF of moving average models. Instead of stationarity, a new property called invertibility is introduced.Common Discrete Random Variables
http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-common-discrete-random-variables/
Sun, 06 Oct 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/2-discrete-random-variables/mathematical-statistics-common-discrete-random-variables/We introduce the binomial (Bernoulli), geometric and Poisson probability distributions and their properties. The properties include their expectations, variances and moment generating functions.Other Single Sample Inferences
http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-other-single-sample-inferences/
Fri, 26 Apr 2019 23:45:36 -0500http://www.y1zhou.com/series/nonparam-stat/2-single-samples/nonparametric-methods-other-single-sample-inferences/Explore whether the sample is consistent with a specified distribution at the population level. Kolmogorov’s test, Lilliefors test and Shapiro-Wilk test are introduced, as well as tests for runs or trends.ARMA Model
http://www.y1zhou.com/series/time-series/2-arma/time-series-arma-model/
Sat, 12 Sep 2020 20:31:18 -0400http://www.y1zhou.com/series/time-series/2-arma/time-series-arma-model/The mean, variance, ACF and PACF of ARMA models. The backshift operator is introduced, and the stationarity and invertibility of the general ARMA(p, q) model is discussed.Gradient Descent and Linear Regression
http://www.y1zhou.com/series/data-mining/data-mining-gradient-descent-linear-regression/
Thu, 11 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-gradient-descent-linear-regression/We implement linear regression using gradient descent, a general optimization technique which in this case can find the global minimum.Bayesian Inference for the Binomial Model
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-binomial/
Mon, 25 Jan 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-binomial/The general procedure for Bayesian analysis. We use two different prior models and compare the resulting posteriors (visually and mathematically).Model Fitting and Forecasting
http://www.y1zhou.com/series/time-series/time-series-model-fitting-and-forecasting/
Mon, 14 Sep 2020 12:06:41 -0400http://www.y1zhou.com/series/time-series/time-series-model-fitting-and-forecasting/This model-building strategy consists of three steps: model specification (identification), model fitting, and model diagnostics.Vector Space
http://www.y1zhou.com/series/linear-algebra/linear-algebra-vector-space/
Mon, 31 Aug 2020 13:13:34 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-vector-space/We introduce some basic terminology - vector space, subspace, span, basis, and dimension. These concepts lay the foundation for future discussions on matrices and matrix properties.Definitions for Continuous Random Variables
http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-continuous-rv-definition/
Wed, 25 Sep 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-continuous-rv-definition/The probability density function, cumulative distribution function, expectation and variance for a continuous random variable.Methods for Paired Samples
http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-paired-samples/
Mon, 29 Apr 2019 14:22:47 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-paired-samples/An obvious extension of the one-sample procedures.Common Continuous Random Variables
http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-common-continuous-rvs/
Fri, 01 Nov 2019 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/3-continuous-random-variables/mathematical-statistics-common-continuous-rvs/The uniform distribution, normal distribution, exponential distribution and their properties.Two Independent Samples
http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-two-independent-samples/
Thu, 02 May 2019 12:09:42 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-two-independent-samples/With two independent samples, we may ask about the centrality of the population distribution and see if there’s a shift. Wilcoxon-Mann-Whitney is here!Basic Tests for Three or More Samples
http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-three-or-more-samples/
Sat, 04 May 2019 12:09:42 -0400http://www.y1zhou.com/series/nonparam-stat/3-multiple-samples/nonparametric-methods-three-or-more-samples/Nonparametric analogues of the one-way classification ANOVA and the simplest two-way classifications, namely the Kruskal-Wallis test, the Jonckheere-Terpstra test, and the Friedman test.Logistic Regression
http://www.y1zhou.com/series/data-mining/data-mining-logistic-regression/
Fri, 12 Feb 2021 15:55:00 -0400http://www.y1zhou.com/series/data-mining/data-mining-logistic-regression/In linear regression, the function learned is used to estimate the value of the target $y$ using values of input $x$. While it could be used for classification purposes by setting the target value to a distinct constant for each class, it’s a poor choice for this task. The target attribute takes on a finite number of values, yet the linear model produces a continuous range.
For classification tasks, logistic regression is a better choice.Bayesian Inference for the Poisson Model
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-poisson/
Mon, 08 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-inference-poisson/This lecture discusses Bayesian inference for the Poisson model, including conjugate prior specification, a different way to specify a “non-informative” prior, and relevant posterior summaries.Multivariate Probability Distributions
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-multivariate-probability-distributions/
Wed, 06 Nov 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-multivariate-probability-distributions/Joint probability distributions of two or more random variables defined on the same sample space. Also covers independence, conditional expectation and total expectation.Mean Trend
http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-mean-trend/
Wed, 30 Sep 2020 11:30:42 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-mean-trend/We introduce detrending and differencing, two methods that aim to remove the mean trends in time series.Definitions in Arbitrary Linear Space
http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-definitions-in-arbitrary-linear-space/
Mon, 14 Sep 2020 20:28:58 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-definitions-in-arbitrary-linear-space/This chapter provides an introduction to some fundamental geometrical ideas and results. We start by giving definitions for norm, distance, angle, inner product and orthogonality. The Cauchy-Schwarz inequality comes useful in many settings.Correlation and Concordance
http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-correlation-and-concordance/
Sun, 05 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-correlation-and-concordance/Measures for the strength of relationships between variables (two or more). The Spearman rank correlation coefficient, Kendall’s tau and Kendall’s W are introduced.ARIMA Models
http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-arima/
Mon, 05 Oct 2020 11:31:11 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-arima/Combining differencing and ARMA models and we get ARIMA. The procedures of estimation, diagnosis and forecasting are very similar as that of ARMA models.Projection
http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-projection/
Mon, 21 Sep 2020 20:38:19 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-projection/Geometrically speaking, what is the projection of a vector onto another vector, and the projection of a vector onto a subspace?Categorical Data
http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-categorical-data/
Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/4-association-analysis/nonparametric-methods-categorical-data/Dealing with contingency tables. Fisher’s exact test comes back, together with Chi-squared test and likelihood-ratio test. We also talk about testing goodness-of-fit.Unit Root Test
http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-unit-root-test/
Wed, 07 Oct 2020 16:42:23 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-unit-root-test/A test that helps us determine whether differencing is needed or not. We also talk about over-differencing (don’t do it!) and model selection (AIC/BIC and MAPE).Orthogonalization
http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-orthogonalization/
Mon, 28 Sep 2020 21:55:43 -0400http://www.y1zhou.com/series/linear-algebra/4-geom-considerations/linear-algebra-orthogonalization/Introducing the Gram-Schmidt process, a method for constructing an orthogonal basis given a non-orthogonal basis.Variability of Nonstationary Time Series
http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-stationarity-variability/
Fri, 09 Oct 2020 11:30:59 -0400http://www.y1zhou.com/series/time-series/4-nonstationary/time-series-stationarity-variability/Using the Box-Cox power transformation to stabilize the variance. At the end of this section, the standard procedure for fitting an ARIMA model is discussed.Monte Carlo Sampling
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-monte-carlo-sampling/
Mon, 15 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-monte-carlo-sampling/This lecture discusses Monte Carlo approximations of the posterior distribution and summaries from it. While this might not seem entirely useful now, this underlies some of the key computational methods used for Bayesian inference that we will discuss further.Seasonal Time Series
http://www.y1zhou.com/series/time-series/seasonal-time-series/
Tue, 13 Oct 2020 23:36:46 -0400http://www.y1zhou.com/series/time-series/seasonal-time-series/We introduce seasonal differencing, seasonal ARMA models, and combine them to get SARIMA models.Linear Space of Matrices
http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-linear-space/
Wed, 30 Sep 2020 13:23:18 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-linear-space/The column space, row space and rank of a matrix and their properties.Functions of Random Variables
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-functions-of-random-variables/
Sun, 08 Dec 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-functions-of-random-variables/Finding the distribution of a real-valued function of multiple random variables. There’s the method of distribution functions, transformations and moment generating functions.Bootstrap
http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-bootstrap/
Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-bootstrap/The procedure and applications of the nonparametric bootstrap.Density Estimation
http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-density-estimation/
Mon, 06 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-density-estimation/Wanna know more about histograms and density plots?Modern Nonparametric Regression
http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-modern-nonparametric-regression/
Wed, 08 May 2019 10:46:18 -0400http://www.y1zhou.com/series/nonparam-stat/5-modern-methods/nonparametric-methods-modern-nonparametric-regression/LOWESS, penalized least squares and the cubic spline.Bayesian Inference for the Normal Model
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-distribution/
Mon, 22 Feb 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-distribution/The normal distribution has two parameters, but we focus on the one-parameter setting in this lecture. We also introduce the posterior predictive check as a way to assess model fit, and briefly discuss the issue with improper prior distributions.Decomposition and Smoothing Methods
http://www.y1zhou.com/series/time-series/time-series-decomposition-and-smoothing/
Mon, 02 Nov 2020 11:12:20 -0500http://www.y1zhou.com/series/time-series/time-series-decomposition-and-smoothing/Decomposition procedures to extract trend, seasonal and other components from a time series. Smoothing techniques like moving average and Lowess are often used, and exponential smoothing (Holt-Winters) is another powerful tool.Matrix Trace
http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-trace/
Wed, 07 Oct 2020 13:07:16 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-trace/Such a simple concept with so many properties and applications!Sampling Distribution and Limit Theorems
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-sampling-distribution-and-limit-theorems/
Sat, 28 Dec 2019 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-sampling-distribution-and-limit-theorems/We observe a random sample from a probability distribution of interest and want to estimate its properties. The CLT also comes into place.The Normal Model in a Two Parameter Setting
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-two-param-setting/
Mon, 15 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-normal-two-param-setting/This lecture discusses Bayesian inference of the normal model, particularly the case where we are interested in joint posterior inference of the mean and variance simultaneously. We discuss approaches to prior specification, and introduce the Gibbs sampler as a way to generate posterior samples if full conditional distributions of the parameters are available in closed-form.Spectral Analysis
http://www.y1zhou.com/series/time-series/time-series-spectral-analysis/
Tue, 17 Nov 2020 15:12:15 -0500http://www.y1zhou.com/series/time-series/time-series-spectral-analysis/We talk about a method that helps us find the periodicity of a time series – the spectral density.Matrix Inverse
http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-inverse/
Mon, 12 Oct 2020 12:42:03 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-matrix-inverse/…for a nonsingular matrix. We talk about left and right inverses, <em>the</em> matrix inverse and orthogonal matrices.Brief Review Before STAT 6520
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-brief-review-before-6520/
Wed, 08 Jan 2020 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-brief-review-before-6520/A brief review of probability theory and statistics we’ve learnt so far.Metropolis-Hastings Algorithms
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-metropolis-hastings-algorithms/
Mon, 22 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-metropolis-hastings-algorithms/This lecture discusses the Metropolis and Metropolis-Hastings algorithms, two more tools for sampling from the posterior distribution when we do not have it in closed form. These are used when we are unable to obtain full conditional distributions. MCMC for the win!Conditional Heteroscedastic Models
http://www.y1zhou.com/series/time-series/time-series-conditional-heteroscedastic-models/
Mon, 23 Nov 2020 18:22:39 -0500http://www.y1zhou.com/series/time-series/time-series-conditional-heteroscedastic-models/Introducing volatility to our time series models. The properties and building procedures of ARCH and GARCH models are discussed.Generalized Inverse
http://www.y1zhou.com/series/linear-algebra/linear-algebra-generalized-inverse/
Wed, 21 Oct 2020 12:45:56 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-generalized-inverse/The generalized matrix inverse that applies to any $m \times n$ matrix.Bias and Variance
http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-bias-and-variance/
Sat, 25 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-bias-and-variance/The bias, variance and mean squared error of an estimator. The efficiency is used to compare two estimators.Consistency
http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-consistency/
Mon, 27 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-consistency/Introducing consistency, a concept about the convergence of estimators. We start from the convergence of non-random number sequences to convergence in probability, then to consistency of estimators and its properties.The Method of Moments
http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-method-of-moments/
Tue, 28 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/8-estimation/mathematical-statistics-method-of-moments/A fairly simple method of constructing estimators that’s not often used now.Hierarchical Models
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-hierarchical-models/
Mon, 29 Mar 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-hierarchical-models/This model is useful for accommodating data which are grouped or having multiple levels, as the main feature is the addition of a between-group layer which relates groups to each other. The presence of this layer forces group-level parameters to be more similar to each other, displaying the important properties of partial pooling and shrinkage.Projection Matrix
http://www.y1zhou.com/series/linear-algebra/linear-algebra-projection-matrix/
Fri, 23 Oct 2020 13:21:22 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-projection-matrix/We introduce idempotent matrices and the projection matrix. Both are very important concepts in statistical analyses such as linear regression.Maximum Likelihood Estimator
http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-maximum-likelihood-estimator/
Wed, 29 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-maximum-likelihood-estimator/Under parametric family distributions, there’s a much better way of constructing estimators - the maximum likelihood estimator.Sufficiency
http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-sufficiency/
Thu, 30 Jan 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-sufficiency/Introducing sufficient statistics for the inference of parameters. The factorization theorem comes in handy!Optimal Unbiased Estimator
http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-optimal-unbiased-estimator/
Sun, 02 Feb 2020 10:46:18 -0400http://www.y1zhou.com/series/maths-stat/9-estimation-under-parametric-models/mathematical-statistics-optimal-unbiased-estimator/Introducing the Minimum Variance Unbiased Estimator and the procedure of deriving it.Bayesian Linear Regression
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-linear-regression/
Mon, 05 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-bayesian-linear-regression/The main difference with traditional approaches is in the specification of prior distributions for the regression parameters, which relate covariates to a continuous response variable. However, the Bayesian approach also provides a fairly intuitive way to add random effects (such as a random intercept or random slope), which results in what is traditionally known as a linear mixed model.Determinant
http://www.y1zhou.com/series/linear-algebra/linear-algebra-determinant/
Mon, 26 Oct 2020 13:25:26 -0400http://www.y1zhou.com/series/linear-algebra/linear-algebra-determinant/The determinant is a very important concept for square matrices, and its properties are key to various other notions such as block matrices and matrix inverses.Confidence Intervals
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-confidence-intervals/
Sat, 08 Feb 2020 09:57:16 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-confidence-intervals/Confidence intervals and methods of contructing them.Penalized Linear Regression and Model Selection
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-penalized-linear-regression-and-model-selection/
Mon, 12 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-penalized-linear-regression-and-model-selection/This lecture covers some Bayesian connections to penalized regression methods such as ridge regression and the LASSO. Further discussion of the posterior predictive distribution as well as model selection criterion (DIC) is included.Quadratic Form
http://www.y1zhou.com/series/linear-algebra/linear-algebra-quadratic-form/
Wed, 04 Nov 2020 16:45:31 -0500http://www.y1zhou.com/series/linear-algebra/linear-algebra-quadratic-form/This long post covers the quadratic form and the positive definiteness of matrices. The decomposition of symmetric matrices is slightly touched on, and the entire post is mainly to prepare for the next chapter – eigenvalues and eigenvectors.Statistical Decision
http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-decision/
Wed, 01 Apr 2020 16:55:50 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-decision/Up till now we’ve made the assumption that the data is generated from a statistical model controlled by some parameter(s). We used estimation to determine a point or a range of possible values of parameters based on the sample. On the other hand, the goal of data analysis is often to help make decisions, which is not directly addressed by estimation.
Drug approval example Suppose a new drug can be approved only with $\geq 90%$ effective rate.Statistical Test
http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-test/
Wed, 01 Apr 2020 16:55:50 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-statistical-test/Here we introduce the elements of a statistical test, namely null and alternative hypotheses, test statistic, rejection region, and type I and type II errors. We then proceed to large-sample Z-tests and some small-sample tests derived from the small sample CIs.p-values
http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-p-values/
Thu, 09 Apr 2020 18:26:34 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-p-values/Introducing the definition of p-values, and why they are important in statistical tests.Optimal Tests
http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-optimal-tests/
Tue, 14 Apr 2020 18:26:34 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-optimal-tests/Briefly introducing the optimality of a statistical test and showing why it’s a difficult problem to solve.Likelihood Ratio Test
http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-likelihood-ratio-test/
Sat, 18 Apr 2020 22:15:07 -0400http://www.y1zhou.com/series/maths-stat/11-hypothesis-testing/mathematical-statistics-likelihood-ratio-test/In the previous section, we considered the situation where we
Test $H_0$: $\theta = \theta_0$ vs. $H_a$: $\theta = \theta_a$ using rejection rule $\frac{L(\theta_0)}{L(\theta_a)} < k_\alpha$. Test $H_0$: $\theta = \theta_0$ vs. $H_a$: $\theta \in \Theta_a$ (typically one-sided) using the rejection rule $\frac{L(\theta_0)}{L(\theta_a)} < k_\alpha$ if it does not depend on $\theta \in \Theta_a$. Beyond these situations, there’s many other cases, such as
What if $H_0: \theta \in \Theta_0$ is composite?Bayesian Generalized Linear Models
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-generalized-linear-models/
Mon, 19 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-generalized-linear-models/This lecture discusses a simple logistic regression model for predicting a binary variable. GLMs are necessary when the response variable cannot be modeled appropriately by a normal distribution, and use a link function to connect parameters of the response distribution to the covariates.Eigenvalues and Eigenvectors
http://www.y1zhou.com/series/linear-algebra/linear-algebra-eigenvalues-and-eigenvectors/
Wed, 18 Nov 2020 18:45:52 -0500http://www.y1zhou.com/series/linear-algebra/linear-algebra-eigenvalues-and-eigenvectors/Probably the most important lecture in this course – we start from the calculation of eigenvalues and eigenvectors, and move on to related topics such as the eigendecomposition, singular value decomposition, and the Moore-Penrose inverse.Linear Models
http://www.y1zhou.com/series/maths-stat/mathematical-statistics-linear-models/
Tue, 21 Apr 2020 13:05:20 -0400http://www.y1zhou.com/series/maths-stat/mathematical-statistics-linear-models/So far we’ve finished the main materials of this course - estimation and hypothesis testing. The starting point of all the statistical analyses is really modeling. In other words, we assume that our data are generated by some random mechanism, specifically we’ve been focusing on i.i.d. samples from a fixed population distribution.
Although this assumption can be regarded reasonable for many applications, in practice there are other scenarios where this doesn’t make sense, e.A Bayesian Perspective on Missing Data Imputation
http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-missing-data-imputation/
Mon, 26 Apr 2021 10:20:00 -0500http://www.y1zhou.com/series/bayesian-stat/bayesian-stat-missing-data-imputation/This lecture discusses some approaches to handling missing data, primarily when missingness occurs completely randomly. We discuss a procedure, MICE, which uses Gibbs sampling to create multiple “copies” of filled-in datasets.Molecular identification of protein kinase C beta in Alzheimer's disease
http://www.y1zhou.com/publications/2020-zhike-alzheimers/
Sun, 15 Nov 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-zhike-alzheimers/The purpose of this study was to investigate the potential roles of protein kinase C beta (PRKCB) in the pathogenesis of Alzheimer’s disease (AD). We identified 2,254 differentially expressed genes from 19,245 background genes in AD versus control as well as PRKCB-low versus high group. Five co-expression modules were constructed by weight gene correlation network analysis. Among them, the 1,222 genes of the turquoise module had the strongest relation to AD and those with low PRKCB expression, which were enriched in apoptosis, axon guidance, gap junction, Fc gamma receptor (FcγR)-mediated phagocytosis, mitogen-activated protein kinase (MAPK) and vascular endothelial growth factor (VEGF) signaling pathways.About
http://www.y1zhou.com/about/
Tue, 30 Jun 2020 00:00:00 +0000http://www.y1zhou.com/about/1 2 3 4 5 6 7 8 9 10 11 12 13 14 { "name": "Yi", "job": { "PhD student": "bioinformatics", "Masters student": "statistics" }, "interests": [ "cancer", "systems biology", "metabolic reprogramming", "NLP", "data visualization" ], "skills": ["R", "Python", "Linux", "Docker"], "personality": "ENTP-A" } Co-expression based cancer staging and application
http://www.y1zhou.com/publications/2020-xiangchun-coexpression-classifier/
Tue, 30 Jun 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-xiangchun-coexpression-classifier/A novel method is developed for predicting the stage of a cancer tissue based on the consistency level between the co-expression patterns in the given sample and samples in a specific stage. The basis for the prediction method is that cancer samples of the same stage share common functionalities as reflected by the co-expression patterns, which are distinct from samples in the other stages. Test results reveal that our prediction results are as good or potentially better than manually annotated stages by cancer pathologists.Metabolic Reprogramming in Cancer: the bridge that connects intracellular stresses and cancer behaviors
http://www.y1zhou.com/publications/2020-yi-nsr-perspective/
Thu, 30 Apr 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-yi-nsr-perspective/We outline in this perspective a novel framework for cancer study from the angle of stress-induced metabolic reprogramming. The driving question is: what may dictate the same or highly similar evolutionary trajectory across different cancers, consisting of cell proliferation, drug resistance, migration and metastasis? We have observed that cancer and cancer-forming cells are under a persistent intracellular alkaline stress, due to chronic inflammation and local iron overload. A wide range of reprogrammed metabolisms (RMs) are induced to keep the intracellular pH within a livable range for survival.Install Dependencies for Puppeteer on Manjaro Linux
http://www.y1zhou.com/posts/manjaro-puppeteer/
Mon, 13 Apr 2020 21:31:56 -0400http://www.y1zhou.com/posts/manjaro-puppeteer/Elucidation of Functional Roles of Sialic Acids in Cancer Migration
http://www.y1zhou.com/publications/2020-sun-sialic-acid/
Tue, 31 Mar 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-sun-sialic-acid/Sialic acids (SA), negatively charged nine-carbon sugars, have long been implicated in cancer metastasis since 1960’s but its detailed functional roles remain elusive. We present a computational analysis of transcriptomic data of cancer vs. control tissues of eight types in TCGA, aiming to elucidate the possible reason for the increased production and utilization of SAs in cancer and their possible driving roles in cancer migration. Our analyses have revealed for all cancer types:Automatic and Interpretable Model for Periodontitis Diagnosis in Panoramic Radiographs
http://www.y1zhou.com/publications/2020-haoyang-miccai/
Sat, 14 Mar 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-haoyang-miccai/Periodontitis is a prevalent and irreversible chronic inflammatory disease both in developed and developing countries, and affects about 20% - 50% of the global population. The tool for automatically diagnosing periodontitis is highly demanded to screen at-risk people for periodontitis and its early detection could prevent the onset of tooth loss, especially in local community and health care settings with limited dental professionals. In the medical field, doctors need to understand and trust the decisions made by computational models and proposing interpretable machine learning models is crucial for disease diagnosis.Neural Functions Play Different Roles in Triple Negative Breast Cancer (TNBC) and non-TNBC
http://www.y1zhou.com/publications/2020-renbo-neural-tnbc/
Thu, 20 Feb 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-renbo-neural-tnbc/Triple negative breast cancer (TNBC) represents the most malignant subtype of breast cancer, and yet our understanding about its unique biology remains elusive. We have conducted a comparative computational analysis of transcriptomic data of TNBC and non-TNBC (NTNBC) tissue samples from the TCGA database, focused on genes involved in neural functions. Our main discoveries are:
While both subtypes involve neural functions, TNBC has substantially more up-regulated neural genes than NTNBC, suggesting that TNBC is more complex than NTNBC; Non-neural functions related to cell-microenvironment interactions and intracellular damage processing are key inducers of the neural genes in both TNBC and NTNBC, but the inducer-responder relationships are different in the two cancer subtypes; Key neural functions such as neural crest formation are predicted to enhance adaptive immunity in TNBC while glia development, along with a few other neural functions, induce both innate and adaptive immunity in NTNBC.Metabolic Reprogramming in Cancer is Induced to Increase Proton Production
http://www.y1zhou.com/publications/2020-huiyan-metabolic-reprogramming/
Mon, 13 Jan 2020 00:00:00 +0000http://www.y1zhou.com/publications/2020-huiyan-metabolic-reprogramming/Considerable metabolic reprogramming has been observed in a conserved manner across multiple cancer types, but their true causes remain elusive. We present an analysis of around 50 such reprogrammed metabolisms (RMs) including the Warburg effect, nucleotide de novo synthesis and sialic acid biosynthesis in cancer.
Analyses of the biochemical reactions conducted by these RMs, coupled with gene expression data of their catalyzing enzymes, in 7,011 tissues of 14 cancer types, revealed that all RMs produce more H+ than their original metabolisms.Transcription regulation by DNA methylation under stressful conditions in human cancer
http://www.y1zhou.com/publications/2017-sha-transcription-methylation/
Thu, 23 Nov 2017 00:00:00 +0000http://www.y1zhou.com/publications/2017-sha-transcription-methylation/We aim to address one question: do cancer vs. normal tissue cells execute their transcription regulation essentially the same or differently, and why? We utilized an integrated computational study of cancer epigenomes and transcriptomes of 10 cancer types, by using penalized linear regression models to evaluate the regulatory effects of DNA methylations on gene expressions.