通常做研究時,當會拿到原始數據,再導入統計軟件中直接分析,但偏偏遇到有些資料丟失,只有當初已統計好的平均值、標準差以及個案數,那該如何求得p 值?
若是問正在學統計學的學生,絕對沒有問題,套入公式在查表便是,但統計軟體用的行雲流水的職場統計師,可有點傷腦筋了~要再回顧學生時的場景了....這邊分幾個步驟講解,最後提供驗證過的SAS語法,之後用起來便可以信心十足拉~也可以審閱他人文章或是檢查學生結果算得對不對啦~
步驟:
- Step 1. 創建兩組連續資料 (Conduct two continuous data)
- Step 2. 透過獨立T檢定得到兩組平均值、標準差及個案數 (Run Student's t test to get mean, std and n)
- Step 3. 透過公式求p值 (Find p-value through formula)
Step 1. 創建兩組連續資料 (Conduct two continuous data)
data a; |
Step 2. 透過獨立T檢定得到兩組平均值、標準差及個案數 (Run Student's t test to get mean, std and n)
proc ttest data=a;var height;class gp;run; |
結果:
n1=30; mean1=160.1; std1=11.8203;
n2=30; mean2=150.9; std2=9.4005;
兩組變異數之間沒有顯著差異(p=0.2233)-->選變異數"均等"之p值: p=0.0015
Step 3. 透過公式求p值 (Find p-value through formula)
data b; *Assumed the two distributions have the same variance; *The two population variances are not assumed to be equal; |
結果: t值為3.34和Step 2結果不符
Step 3.1 用Excel算出精確平均值和標準差
data b; *Assumed the two distributions have the same variance; *The two population variances are not assumed to be equal; |
結果: t值為3.32和Step 2結果相符,p1 & p2 亦相同
ref:
https://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a000245936.htm
https://en.wikipedia.org/wiki/Student's_t-test
Assumed the two distributions have the same variance:
where
df=n1+n2-2
The two population variances are not assumed to be equal:
where