Very well done Jose. Thank you much. I have enjoyed reading it.
Couple of thoughts came to my mind.
There are instances where one station can be heard on several frequencies.
1) Stations with parasitic AM modulation can have 3 signals spaced aprox.
1.5 kHz
2) Second and third harmonic signals.
3) Multy/ Multy stations.
How does your algorithm treats these cases?
73, Igor UA9CDC
----- Original Message ----- From: "José Nunes CT1BOH" <ct1boh@gmail.com>
To: <cq-contest@contesting.com>
Sent: Tuesday, July 16, 2013 4:23 AM
Subject: [CQ-Contest] Implementing a Dynamic error free RBN-Skimmer 
network
The purpose of this post is to present a way to implement a dynamic error
free Skimmer-RBN/Packet network that automatically:
1. Flags and eliminates “Busted Spots” from the network
2. Flags  and eliminates “wrong frequency” spots from the network
3. Prevents inaccurate Skimmer-Spotters from feeding incorrect spots to 
the
network
4. Eventually allow the Skimmer-RBN users to customize reception of spots
according to the quality flag and several parameters of the algorithm
In my various CQWW CW operations since 1989
http://www.qsl.net/ct1boh/**operations.htm<http://www.qsl.net/ct1boh/operations.htm>I’ve 
only enjoyed
Skimmer-RBN/Packet networks from the pile-up end side as PY0F, P40E, CR3E,
etc.
Skimmer-RBN and packet network has been a blessing for the DX operator as 
a
constant pile-up generator and a reason for the never ending increase in
total QSO number throughout the years.
Recently, building the idea of operating Assisted on CW from a DX location
for the first time, I begun to study how to correctly use
Skimmer-RBN/Packet networks. Operating Assisted in minor Contests from
home, I discovered several problems that made the use of callsign and
frequency information from a RBN feed band map not 100% reliable:
- Busted spots
- Non-existent spots on a particular frequency
- Small frequency shifting spots and
- “Unstable” band map with callsigns alternating, popping-in and
popping-out
Wanting to use Assistance and not being able to completely trust the
information is a strange concept to me. If I use Assistance I would expect
not to waste time in my mind processing if the call in the band map is 
good
or bad.
Solving the problems requires a system that is able to automatically 
assess
the spot before send it to the network and “learns” from itself. The
solution is a dynamic error free RBN-Skimmer algorithm.
Studding RBN data that can be downloaded here
http://www.reversebeacon.net/**raw_data/<http://www.reversebeacon.net/raw_data/>I 
came to a solution. The algorithm
I propose has a simplicity beauty and works extraordinary well. For every
new spot that is provided to the Reverse Beacon Network by a Skimmer
Spotter or by a packet network by a human, the system will automatically
generate the following “Quality Tag” for each spot:
Good Spot
Good Call, New Frequency Spot?
Busted Spot
? Spot
In short, the algorithm can be described as follows:
Any new spot will be tagged as a “Good Spot” if looking back 25 minutes
there are two more Spots with the same call as the new spot, in the
approximate same frequency (+/- 0.3 Khz)
Any new spot will be tagged as a “Good Call, New Frequency Spot?” if
looking back 25 minutes there is a Spot that was already tagged as “Good
Spot”, with the same call as the new spot, but the new spot is in an
adjacent frequency (less or equal than -0.4 Khz and greater or equal than
+0.4 Khz)
Any new spot will be tagged as “Busted Spot” if looking back 25 minutes
there are at least three already tagged “Good Spot” with a similar call, 
in
the approximate same frequency of the new spot (+/- 0.1 Khz). The similar
call is a call that can be transformed into the new spot call by character
insertion, deletion or substitution.
Any Spot that is not a “Good Spot”, a “Good Call, New Frequency Spot?” or 
a
“Busted Spot” is an undetermined spot “?Spot”.
Let’s have a closer look at it, with examples from RBN spots from CQWW CW
2012:
“Good call” quality flag
#1       call1    freq1   time1   #Spotter      Quality tag
711     CR3E   7045    4           #G4HYG       ?Spot
860     CR3E   7045    5           #WB8BIL      ?Spot
918     CR3E   7045    5           #WB2LSI      Good Spot
3077   CR3E   7045    20         #G4HYG       Good Spot
3254   CR3E   7045    21         #S52AW       Good Spot
3336   CR3E   7045    22         #KB9AMG    Good Spot
3892   CR3E   7045    25         #DK9IP         Good Spot
…
CR3E started the contest (CQWW CW 2012) on 7045. The first two spots get
the quality flag “?Spot”, but by the third spot of WB2LSI skimmer CR3E is
flagged as a “good call” and all subsequent spots on 7045 will get the
“Good Spot” quality flag.
After a lot of testing I can say the system should operate with a 
bandwidth
filter of +/- 0.3 KHz. All spots that do not fall within this +/- 0.3 KHz
filter will not get the “Good Spot” quality tag.
“Good Call, New Frequency Spot?”
#1          call1    freq1      time1   #Spotter      Quality tag
…
33721   CR3E   7045       220       #F5MUX       Good Spot
34154   CR3E   7041.3    223       #KA9SWE     Good Call, new freq?
34460   CR3E   7045       225      #RU9CZD      Good Spot
…
40711   CR3E   7044.9    261       #KQ8M         Good Spot
40740   CR3E   7041.3    261       #KA9SWE    Good Call, new freq?
41213   CR3E   7045.1    264       #K3LR          Good Spot
…
CR3E continues to be on 7045. All of a sudden #KA9SWE skimmer spots CR3E 
on
7041.3. The systems detects a frequency difference and flags it as a “Good
call, new frequency?”.  Is it really a QSY to a new frequency by CR3E? If
yes, then, shortly after two more spots the system will change the flag to
“Good spot”. It is not the case in this example because skimmer #RU9CZD
confirms there was no QSY. Later on we see that #KA9SWE sends another spot
on 7041.3.Obviously #KA9SWE skimmer needs frequency calibration.
After a lot of testing I can say the system should operate with a 
bandwidth
filter of greater than +/- 0.3 Khz. All spots that do not fall within this
+/- 0.3 KHz filter will not get the “Good Spot” quality tag, and this
should be the accuracy threshold.
“Busted Spot”
#1          call1       freq1     time1   #Spotter      Quality tag
31159   CR3E      7045      204       #S52AW      Good Call
31172   KR3E      7045      204       #K9QC          Busted
31205   CR3E      7045      205       #G4HYG       Good Call
…
CR3E continues to be on 7045. All of a sudden #K9QC skimmer sends a KR3E
spot. The system detects that KR3E is a similar call of CR3E on the same
frequency of a Good Spot and it will flag this spot as a Busted spot. The
system uses Levenshtein distance to calculate a similar call (
http://en.wikipedia.org/wiki/**Levenshtein_distance<http://en.wikipedia.org/wiki/Levenshtein_distance>).
 Depending on the
length of the callsign it will look for calls that are x-off letters away.
After a lot of testing I can say the system should operate with a 
bandwidth
filter of +/- 0.1 Khz. A Busted Spot comes from a good spot, and usually
from Skimmer that has already spotted the call, therefore a threshold of
0.1 is what works best.
“?Spot”
All “?Spot” are spots that cannot be determined as “Good Spot” or as “Good
Call, New Frequency Spot?” or as “Busted Spot”.  Some are good spots – The
first and the second spot on a new frequency when a run starts. But the
majority is “spots” send by skimmers, of stations calling RUN stations.
These spots should never be sent out to the network by skimmer. They are
false positive running stations.
How does this proposed system works?
I can say it works extraordinary well!
I tested all 40 meter spots from CQWW CW 2012 – almost a million sots:
Quality flag                               Spots        % of spots
?                                            46.593              4.69%
Busted                                   20.734               2.09%
Good Call                             855.227              86.08%
Good Call, new freq?               70.994               7.15%
Grand Total                          993.548            100.00%
After running my algorithm I also went back to validate both “? Spots” and
“Good call, new freq?” spots. If I have the following spots:
711     CR3E   7045    4           #G4HYG       ?Spot
860     CR3E   7045    5           #WB8BIL      ?Spot
918     CR3E   7045    5           #WB2LSI      Good Spot
It is easy to determine after running the algorithm that spot #711 and
spot#918 are “Good Spots” from spot 918. This cannot be done with a real
time system, because once a quality tag is given to a spot it is given.
Of the 46.593 spots with “?Spot” quality flag:
    20768 are good calls (these are all the first and second spot of a run
that just started)
    25825 spots are indeed “? Spots” (mostly stations calling on pile-ups)
Of the 70.994 spots with “Good Call, New Freq?” quality flag:
    24.502 are good call (these are all the first and second spots of a
run that just started in a new frequency)
    46.492 spots are indeed “Good Call” that are sent to the network with
a wrong frequency by an uncalibrated skimmer.
My algorithm also allows the RBN to detect uncalibrated skimmer spotters.
Looking at the list of skimmers it is easy to build a list based on the %
of “Good call, New freq?” quality flag. Let’s take a look at the top ten
Skimmer spotters according to spots sent to the RBN:
Skimmer     Spots          % of “Good Call, New Freq?”
#K3MM     45.726         3.1%
#GW8IZR  29.984         8.7%
#S52AW    29.214        5.2%
#DL8LAS   26.906         3.7%
#DR1A      25.831         3.1%
#RU9CZD 24.301          6.4%
#HA6PX     23.317      38.6%
#OL5Q      23.006         4.4%
#W3LPL   21.889          2.0%
#KQ8M    21.833          7.1%
We can see that a calibrated skimmer should not have more than 3% of “Good
Call, New frequency?” spots, because that is the dynamic of people 
changing
frequency in the contest. Numbers greater than that show uncalibrated
skimmers, such is the case of #GW8IZR, #HA6PX, or #KQ8M.
To finalize several considerations:
1 RBN (Reverse Beacon Network) is a fantastic instrument for contesters 
and
DXers.
2 We have the instruments to turn the current RBN network into a dynamic
error free system
3 The system should allow the user to decide to filter out “Busted” spots,
“Good call, New Freq?” spots and “? Spots”. By giving a quality flag it
would be up to the user to use the quality flags to filter out spots
4 The system should warn uncalibrated skimmers
If you want to play with the data and with the algorithm you can:
The algorithm and several graphs that explain how well the system works is
available to download here
http://www.qsl.net/c/ct1boh//**dl/ <http://www.qsl.net/c/ct1boh//dl/>
Please note that the excel file is ~90MB
In sheet 1 there are 993.548 spots from CQWW CW 2012 on 40 meters
In sheet 2 there is the results of the algorithm. This would be the output
of the system with real time adding a quality flag to each spot (if you do
Alt-Q you will activate the macro. it takes 1892 seconds to flag all
993.548 spots on my PC. Or on average 0.0019 seconds to flaf every 
incoming
spot.
In sheet 3 there is a pivot table to manipulate data from sheet 2
In sheet 4 there is a list and graph that show the performance of skimmers
as far as frequency calibration is concerned
In sheet 5 there is a list and a graph of calls and the % of Busted of 
each
call according to number of spots
In sheet 6 there is the algorithm code (please note that I’m not a
programmer. I just learned VBA to do this)
--
José Nunes
CONTEST CT1BOH - http://www.qsl.net/ct1boh
______________________________**_________________
CQ-Contest mailing list
CQ-Contest@contesting.com
http://lists.contesting.com/**mailman/listinfo/cq-contest<http://lists.contesting.com/mailman/listinfo/cq-contest>
______________________________**_________________
CQ-Contest mailing list
CQ-Contest@contesting.com
http://lists.contesting.com/**mailman/listinfo/cq-contest<http://lists.contesting.com/mailman/listinfo/cq-contest>