[TowerTalk] 1 or 2 dB

Thu May 19 13:47:55 EDT 2022

On 5/19/2022 8:43 AM, Kim Elmore wrote:
> My test worked, so I'll try again...
>
> To everyone that has commented: Thank you very much! This discussion 
> is fabulous!
>
> I concede the point that statistically, there is likely to be score 
> improvements with 1 dB power increase and certainly 2 dB. These are 
> best described as statistical improvements and I suspect that the data 
> set has to be moderately large to detect a "significant" difference, 
> that is for a statistic to have much power. Thus a DXer, who is not a 
> contester (there are such things) probably won't notice a significant 
> improvement in how long it takes to break through a pile-up with a 
> power increase of 1 dB. Over a long run, a contester will be able to 
> see a score increase at with a 1 dB power increase.  A smaller data 
> set (possibly much smaller) will show statistical score improvements 
> given a 2 dB increase, significant at some arbitrary p-value.
Perhaps this is what you are getting at, Kim, but unless an operator is 
switching between power levels on some fairly short time interval, so as 
to create two otherwise identical data sets, I don't see how you can get 
any meaningful statistics when looking at small differences in power. 
Comparing previous contests or within a single contest against another 
nearby station running a slightly different power still leaves you with 
a lot of uncontrolled variables.

> I tend to skepticism when someone says they can notice a significant 
> improvement given a *single* contest assuming everything else 
> (including the operator) is held constant. It will take several 
> contests to see a *statistical* improvement but I'll now bet it's 
> there. Fewer contests will be required for a 2 dB increase. I prefer 
> resampling (i.e. a permutation test) to parametric statistics simply 
> because parametric test assumptions are almost always violated, 
> leading to unknown degradations of the test's validity.

You could test this hypothesis by taking some public logs and dividing 
them in two by making one log out of even minute entries and another log 
out of odd minute entries and comparing the scores (QSOs, QSO points, 
mults, total score, etc). That would give you some feel for how 
sensitive a test would be that interleaved Pout vs Pout + NdB with 
similar dwell intervals. If the score difference between odd and even 
minutes in the control logs (i.e. constant power logs) was consistently 
less than 10%, then the power difference in the interleaved power 
difference logs would have to show a score difference significantly 
greater than 10% to be considered statistically significant.

73, Mike W4EF