Sampling Methods
Why Sampling?
- Approximate Expectation
- estimate statistics
- poster inference
- Visualization
Pros and Cons
- Pros
- Easy
- Could figure out very complicated model
- Cons
- Too easy - used inappropriately
- Slow
- insufficient to get "good" samples (which could represent well to the whole data set)
- difficult to assess (you hard to know your approximation is good or not)
Monte Carlo Approximation [History]
- Credits: Con Nevmann, Fermi, Ulam
- Ulam's gambling uncle likes to go to Monte Carlo Casino in Monaco
- The intuition is try to evaluate the expectation of intractable event.
- 直接的作法莫過於把所有的資料取平均(獲取一個unbias的期望值),但實務上卻往往不可能這樣作的。
- 核心:sample的資料越多,估計出來的期望值就會越接近真實期望值。
Example
- 如何知道黑色點在上圖的分佈期望值? (Mandelbrot)
- m(A)/m(S) = p(X->A) = EI(X->A)
A表示在上圖中的黑色點點集合,S表示在上途中的全部點點集合,X表示黑色點點出現的隨機變數,相當於I(indicator)該點是黑色為1否為0的期望值。
m^(A) = R*1/n sum(I(X->A))
假設今天上圖範圍面積有R個點點,隨機採樣n個點點,可以來估計分佈~!
Importance Sampling
- 有沒有可能利用不同的Distribution q來作Samping,更好的還原目標的Distribution p?
- 可以寫成期望值PDF,會發現可以用不同的採樣方式表達同一個分佈期望值
(原本是從p(x)採樣估計E[f(x)],現在則是從q(x)來採樣估計E[f(x)*p/q])
- importance weight = p/q
- 理論上可以找出最佳的q,但現實問題中常常很難作到。如果找的好可以大幅降低variance~!
- 兩種狀況:
- 如果今天不知道p,會盡可能找出一個最接近p的q
- 已知p,但是想用更好的q來降低 Monte Carlo Approx's Variance 。
Ref:
No comments:
Post a Comment