只給兩組平均值、標準差及個案數，求p值 (How to get p-value when only give mean, std and n of two independent group )－統計散記-Vivian 經驗分享

通常做研究時，當會拿到原始數據，再導入統計軟件中直接分析，但偏偏遇到有些資料丟失，只有當初已統計好的平均值、標準差以及個案數，那該如何求得p 值?

若是問正在學統計學的學生，絕對沒有問題，套入公式在查表便是，但統計軟體用的行雲流水的職場統計師，可有點傷腦筋了~要再回顧學生時的場景了....這邊分幾個步驟講解，最後提供驗證過的SAS語法，之後用起來便可以信心十足拉~也可以審閱他人文章或是檢查學生結果算得對不對啦~

步驟:

Step 1. 創建兩組連續資料 (Conduct two continuous data)
Step 2. 透過獨立T檢定得到兩組平均值、標準差及個案數 (Run Student's t test to get mean, std and n)
Step 3. 透過公式求p值 (Find p-value through formula)

Step 1. 創建兩組連續資料 (Conduct two continuous data)

返回目錄

data a;
input gp $1 height 8.0;
datalines;
1   190
1   151
1   149
1   145
1   170
1   162
1   152
1   184
1   155
1   169
2   156
2   162
2   159
2   138
2   145
2   132
2   149
2   170
2   147
2   159
1   161
1   151
1   149
1   145
1   170
1   162
1   152
1   164
1   155
1   169
2   156
2   150
2   159
2   138
2   145
2   132
2   149
2   150
2   147
2   159
1   150
1   151
1   149
1   145
1   170
1   172
1   152
1   164
1   175
1   169
2   166
2   150
2   159
2   138
2   145
2   152
2   149
2   150
2   157
2   159
;
run;

Step 2. 透過獨立T檢定得到兩組平均值、標準差及個案數 (Run Student's t test to get mean, std and n)

返回目錄

proc ttest data=a;var height;class gp;run;

結果:

n1=30; mean1=160.1; std1=11.8203;
n2=30; mean2=150.9; std2=9.4005;

兩組變異數之間沒有顯著差異(p=0.2233)-->選變異數"均等"之p值: p=0.0015

只給兩組平均值、標準差及個案數，求p值 (How to ge

Step 3. 透過公式求p值 (Find p-value through formula)

返回目錄

data b;
n1=30; x1=160.1; s1=11.8203;
n2=30; x2=150.9;s2=9.4005;

*Assumed the two distributions have the same variance;
df1=n1+n2-2;
var1= ((n1-1)*s1**2+(n2-1)*s2**2 )/df1;
std1=var1**0.5;
ivn=(1/n1+1/n2)**0.5;
t1= (x1-x2)/(std1*ivn);
p1=(1-probt(abs(t1),df1))*2;

*The two population variances are not assumed to be equal;
t2= (x1-x2)/(s1**2/n1+s2**2/n2)**0.5;
df2=(s1**2/n1+s2**2/n2)**2/(((s1**2/n1)**2)/(n1-1)+((s2**2/n2)**2)/(n2-1));
p2=(1-probt(abs(t2),df2))*2;
proc print;var df1 df2 t1 t2 p1 p2;run;

結果: t值為3.34和Step 2結果不符

只給兩組平均值、標準差及個案數，求p值 (How to ge

Step 3.1 用Excel算出精確平均值和標準差

data b;
n1=30;x1=160.0666667;s1=11.82030204;
n2=30;x2=150.9;s2=9.400476877;

*Assumed the two distributions have the same variance;
df1=n1+n2-2;
var1= ((n1-1)*s1**2+(n2-1)*s2**2 )/df1;
std1=var1**0.5;
ivn=(1/n1+1/n2)**0.5;
t1= (x1-x2)/(std1*ivn);
p1=(1-probt(abs(t1),df1))*2;