生醫研究之統計方法

最新消息 :

數字分析 - 中國於非洲農業報導之破解

生醫研究之統計方法

醫學期刊中統計分析與數據表達

中興大學生物系統工程研究室陳加忠

為了使得期刊內容有關統計部份能夠合理，Biochemia Medica期刊特別刊載了一篇評論文章「Practical recommendations for statistical analysis and data presentation in Biochemia Media Journal (2012) ,22(1):15-23。此篇文章內容對於生醫論文之撰寫極有幫助，在此介紹主要內容：

一、主要的研究結果是否在摘要（Abstract）表現

閱讀學術論文時，讀者首先閱讀摘要。摘要的撰寫必須容易理解而且提供此篇研究的重點。樣本數目，使用之處理族群，統計估計值，信賴區間，P值等都應該加以敘述。以下以兩個實例說明：

不好的表達方式：

Results: The concentration of New BioMarker^TM in patients with acute myocardial infarction was higher than in healthy controls（P<0.05）.There was a significant correlation of New BioMarker^TM with serum copeptine concentrations.

好的表達方式：

Results: There were 250 patients with acute myocardial infarction and 232 healthy controls. The concentration of New BioMarker^TM was higher in patients than in healthy controls（7.3±0.6mmol/L vs. 5.4±0.5mmol/L, respectively=0.002） New BioMarker^TM was associated with serum copeptine concentration（r=0.67,P=0.026）.

二、統計分析部份寫作是否良好，準確與容易理解

統計分析（statistical analysis）通常置放於材料與方法（Materials and methods）之子標題。在此統計分析子標題的內容，作者需要解釋為何使用此種統計技術進行分析與使用此統計技術的合理性。所有的統計方法一定在此子標題下加以標示，所有標示的統計技術都要被使用。因此以下的內容需要加以注意：

1. 研究數據的型式，類別或是數值？

2. 如何描述這些數據？

3. 這些數據是否為常態分佈？常態試驗的方法應該註明。

4. 統計技術是如何選擇？可能的差異性是如何檢定，與數據的相關是什麼？

5. 類別數據的統計檢定方法是什麼？

6. 樣本群是否夠大，可探查其效應（effect）。

7. 試驗分析其顯著水準是什麼？

8. 使用的統計軟體是什麼？版本、生產廠商等資訊都必須提供。

兩種寫作內容的比較：

不好的表達方式： Statistical analysis

Data were presented as mean ± standard deviation. Differences were tested by t-test. Pearson correlation was used to analyze the association between all studied parameters. Data analysis was done using MedCalc.

好的表達方式：Statistical analysis

The Kolmogorov-Smirnov test was used to assess the normality of distribution of investigated parameters. All parameters in our study were distributed normally. Data were expressed as mean ± standard deviation. Differences were tested by two-tailed t-test. Pearson’s correlation was used to analyze the association between all studied parameters. The valued P < 0.05 were considered statistically significant. Statistical analysis was done using MedCalc 12.1.4.0 statistical software （MedCalc Software, Mariakerke,Belgium）.

三、結果之章節寫作

在結果（Result）此章節需要細心的表達與統計分析之內容，作者需要考慮如下重點：

1. 描述分析方法是否適當。

2. 結果的表達是否適度的準確或精確。

3. 所有的估計值是否以信賴區間加以表達。

4. 對數據分析採用正確的統計檢定。

5. 圖與表具有知識性（informative）。

6. 對所有的統計檢定都有提供P值。

（一）敘述分析是否適當

在進行數值化數據分析，需要適當的量測趨中性與分散程度，在進行數據表現時，需要測試其常態分佈。如果樣本數目大於30，數據為常態分佈，則可使用參數統計（平均值與標準差）。然而樣本數目小於30，數據非常態分佈，建議作者採用眾數或四分之一中位數（median, interquartile range IQR, Q1,與Q3）。對於臨界樣本數是多少，學術界尚未有一定的標準值。但是如果n小於30，建議採用非參數統計。

標準均差（SEM, standard error of the mean）無法用以量測數據的分散程度，因此不應該採用SEM以描述數據。正確的用法是採用標準差（standard deviation）。

（二）、結果的表現是否有適當的精確與準確

對於數據的表現應該著重於量測的精確與準確。例如對於一個人每日抽煙的數目調查，10.21±3.16此方式並不適當，因為香煙的支數計算不應該是小數點。合理的表示方式是10±3。

數據的表示是否合理，以表1a與1b加以說明：

不好的表達方式： Table 1a The example for erroneously presented results for observations in two groups （groups A and B）.
	Group A（N=11）	Group B（N=14）	P
Age（year）	55.905±2.112	28.107±4.016	0.687
WBC（x 109/L）	13.177（6.387-15.272）	6.898（3.283-11.496）	0.207
Female, N（%）	6（54.5%）	8（57.1%）	0.783
WBC - white blood cells

好的表達方式：Table 1b The example for erroneously presented results for observations in two groups （groups A and B）.
	Group A（N=11）	Group B（N=14）	P
Age（year）	56（51-60）	58（52-63）	0.687
WBC（x 109/L）	13.2（6.8-15.3）	6.9（3.3-11.5）	0.207
Female, N（%）	6/11	8/14	0.783
WBC - white blood cells

1. 年齡應以年代表示，至多使用1個小數點。如果研究對象是小孩，年齡數據的表現最好是月份與天數。此外年齡最好以眾數與範圍加以表示。因此56（51-60）比55.905±2.112更好。

2. 實驗室量測的參數應該有平均值與標準差。以WBC數據為例，表達之精密度至小數點3位數不是適當，因為在一般的實驗室此數據只有小數點1位。因此13.2（6.8-15.3）x 10⁹/L比起13.77（6.837-15.272）x 10⁹/L更適當。

3. 由於樣本數目不大，以實際數據比使用比例數據值更有意義，例如（9/11與8/14比起54.5%與59.1%）更有意義。

有關數據之表達方式舉例如下：

1. 如果數目小於100，不需使用百分率，例如以0.67代替67%。

2. 百分比一定使用整數，除非百分比小於10%。數值小於10%，可再使用小數點1位，例如0.3%。

3. 如果樣本數小於30，不要使用百分比或比例，以樣本數對全部數目之表達方式更佳。例如3/11比27%更好。

表1A與表1B，說明正確的數據表達方式。

作者對於所有估計值（estimates），應該註明信賴區間與P值。尤其在diagnostic accuracy, odds ratio, relative risks, regression analysis等，由表2a, 2b可比較兩者之不同。

不好的表達方式： Table 2a. Examples for flawed presentation of results.
	Group A
Sensitivity	92%
AUC	0.783
Odds ratio	2.5
AUC – area under the curve

好的表達方式：Table 2b. Examples for flawed presentation of results.
	Group A	P
Sensitivity（95% Ci）	92(88-97)	0.021
AUC（95% Ci）	0.78(0.63-0.89)	0.038
Odds ratio（95% Ci）	2.5(1.7-12.3)	0.019
AUC – area under the curve

信賴區間之重要性在於可以表示估計值的精確性。信賴區間如果太廣，代表估計值的精密性很小。信賴區間也可用以評估兩處理是否有顯著不同。可以藉由檢查兩信賴區間是否有重疊。以下之文章即可說明重要性：對於A與B處理之AUC值加以比較，以95%信賴水準之信賴區間，A為0.78（0.60-0.89），B為0.99（0.80-0.99），因為有0.80-0.89之重疊區間，因此在顯著水準α=0.05條件下，兩處理無顯著不同。

(三)、使用的統計分析方法是否正確

在選用統計方法之前，需要考慮以下條件：

1. 數據是否常態分佈？

2. 數據為數字或分類？

3. 有多少處理？

4. 每個研究族群有多大？

5. 量測是否獨立？

研究人員常犯的錯誤包括：

1. 作者在使用統計檢定之前，對假設條件未確認

2. 作者並未描述如何進行統計檢定

3. 作者並未告知所使用之方法

如果數據非常態分佈，數目少於30，需要使用非母數檢定。投稿Biochemia Medica期刊常見到的錯誤有：

1. 未進行常態分佈檢定，對數據分佈或數據數目未加考慮。

2. 雖然使用相互量測，但是未採用成對t檢定。

3. 在2x2試驗中，樣本數不高，或期望值不高，仍然採用卡方檢定。

4. 有一個變數其數據為類比型，或數據分佈非常態分佈，仍然採用Pearson係數分析。

5. 以t檢定進行三組或更多組之顯著差異檢定，而不是採用ANOVA或Kraskal-Walis檢定。

6. ANOVA或Kruskal-Walis檢定顯示有差異存在（例如P>0.05），未持續進行Post hoc test。

7. Post hoc test 之名稱與為何選擇此方法，在論文內未交待。

（四）、所有檢定是否需要顯示P值

P值之表達必須節達小數點以下三位，例如P=0.027。以下表達方式都不適合， NS, P>0.05,P<0.05,p=0.00001。P以大寫表示不可斜體，最小P值的表達方式是P<0.001。

（五）、數據解釋

為了使得檢定正確，在進行檢定之前必須標明顯著水準。以下之文字是不適當：

1. We have observed the difference between our study groups, although not statistically significant.

2. Though not statistically significant, concentration of glucose was higher in females than in males.

3. There was a trend towards higher values of marker X with increasing concentrations of marker Y. The observed association was unfortunately not statistically significant.

（六）、相關分析

許多稿件對相關性分析並不正確。作者首先要確定相關係數顯著性的水準。只有有顯著相關，再加以解釋。例如P>0.05，相關係數不顯著不用解釋。

四、因果關係之解釋

研究之參數如果發現有關聯，作者往往以因果關係加以解釋，這種論述十分危險。例如增加serum C-reactive protein（CRP）與較高的BMI（body mass index）有關，但是不能用此證明因為BMI增大，CRP即增加。

只有經由實驗或臨床研究，研究人員才能進行因果關係之結論。如果研究數據來自觀察，觀察值彼此的相關不能以因果加以解釋。

以下的例子用以說明不正確的表達：

不正確: Compared with the control group, ox-LDl levels were significantly increased in patients on hemodialysis（P=0.001）.

正確： Compared with the control group, ox-LDL levels were significantly higher in patients on hemodialysis（P=0.001）.

不正確: We found a significantly decreased level of GPx in blood of asthmatic children as compared to age and sex matched controls（13.61±5.73 vs. 15.22±6.75, respectively; P=0.036）.

正確: We found a significantly lower level of GPx in blod of asthmatic children as compared to age and sex matched controls（13.61±5.73 vs. 15.22±6.75, P=0.036）.

不正確: We observed that carrying AA genotype is significantly increased in healthy controls compared to patients（OR 2.5,95% Ci = 1.7-3.9;P=0.012,）.

正確: We observed that frequency of AA genotype is significantly higher in healthy controls compared to patients（OR 2.5,95% Ci = 1.7-3.9;P=0.012,）.

不正確: Obstructive sleep apnea induced the increase in concentrations of hsCRP compared to healthy controls（P=0.045）.

正確: Concentrations of hsCRP were higher in children with obstructive sleep apnea, compared to healthy controls（P=0.045）.

不正確: Logistic regression identified serum copeptin（OR 3.1,95% Ci= 1.7-12.4; P=0.043）as an independent predictor of 1-month mortality of patients suffering from traumatic brain injury. We therefore conclude that copeptin induces mortality after traumatic brain injury.

正確: Logistic regression identified serum copeptin（OR 3.1,95% Ci= 1.7-12.4;P=0.043）as an independent predictor of 1-month mortality of patients suffering from traumatic brain injury. We therefore conclude that increased serum copeptin concentrations are associated with higher risk of mortality after traumatic brain injury.

五、檢查清單

進行投稿之前，最好將論文內容中有關統計部份進行檢查。檢查清單如下：

問題

一、摘要

1. 主要之研究結果是否包含摘要內容

二、材料與方法統計分析

2. 試驗設計處理及樣本數目有否說明

3. 有否列出所有使用的統計方法

4. 所有列出之統計方法是否在研究中都已使用

5. 進行分析所有統計方法是否正確？

6. 有否述明信賴水準？

7. 使用的統計軟體有否述明出版公司，版本等資訊

三、結果

8. 結果之表達是否精確與準確

9. 敘述統計是否適當

10. 對參數估計值有否提供信賴區間

11. 對所有統計檢定，是否提供P值

四、討論

12. 除非研究內容藉由試驗，是否在結論中報導因果關係？