A project with 127000 individuals.
Selected a subset of only 23 records.
Plugin says Estimated run time is only a few seconds, but it took over 50 minutes.
Why this long time?
Why not search only the 23 records?
* Find Duplicate Individuals; why so long time?
Re: Find Duplicate Individuals; why so long time?
nearly 3 million comparisons.
thats a little over 53000 comparisons per minute.
thats a little over 53000 comparisons per minute.
FH V.6.2.7 Win 10 64 bit
- tatewise
- Megastar
- Posts: 28414
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Find Duplicate Individuals; why so long time?
127,000 x 23 is actually just over 2.9 million and dividing by 50 gives over 58 thousand per minute.
But the plugin is doing more than that as it also compares close relatives to improve the hit rate and reduce false positives, so it is making much more than 2.9 million comparisons.
Have you checked the plugin Help & Advice button, which explains how it works?
There are also tips in the FAQ to avoid the long run-time on large GEDCOM databases.
After it has been run once the subsequent runs will be much faster as only changed records are considered.
I'm not sure why its estimated run time is so much less than the actual run time but it is only a guess and depends on many factors including your PC performance and the complexity of your Individual records.
Were the results useful?
If you only wanted the 23 Individual records to be compared amongst themselves then you would have to export a GEDCOM containing just those 23 and their close family records. Afterwards, the duplicate records would need to be fed back into your master project.
But the plugin is doing more than that as it also compares close relatives to improve the hit rate and reduce false positives, so it is making much more than 2.9 million comparisons.
Have you checked the plugin Help & Advice button, which explains how it works?
There are also tips in the FAQ to avoid the long run-time on large GEDCOM databases.
After it has been run once the subsequent runs will be much faster as only changed records are considered.
I'm not sure why its estimated run time is so much less than the actual run time but it is only a guess and depends on many factors including your PC performance and the complexity of your Individual records.
Were the results useful?
If you only wanted the 23 Individual records to be compared amongst themselves then you would have to export a GEDCOM containing just those 23 and their close family records. Afterwards, the duplicate records would need to be fed back into your master project.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Find Duplicate Individuals; why so long time?
screenshots from the run:
If I did not select the 23, there are 127000x127000 comparisons?
The results were OK, but I still don't understand the subset of 23 included.If I did not select the 23, there are 127000x127000 comparisons?
- tatewise
- Megastar
- Posts: 28414
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Find Duplicate Individuals; why so long time?
Yes, if you did not select the 23, there would be 127000x127000 comparisons.
If you tick the top option when you run the plugin again it should be much faster.
If you do NOT tick that option and do NOT select that subset of 23 records, what is the estimated run time?
I can only apologise for it getting its estimate too low.
BTW: I am fascinated how you got those screenshots saying the Plugin Not Run Yet when you have clearly already run it for 50 minutes and got some results. Did you use Edit > Undo Plugin Updates?
If you tick the top option when you run the plugin again it should be much faster.
If you do NOT tick that option and do NOT select that subset of 23 records, what is the estimated run time?
I can only apologise for it getting its estimate too low.
BTW: I am fascinated how you got those screenshots saying the Plugin Not Run Yet when you have clearly already run it for 50 minutes and got some results. Did you use Edit > Undo Plugin Updates?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Find Duplicate Individuals; why so long time?
Just started again, now saying 764 minutes ...
I just took a screen shot with windows' Clip-and-draw.
I just took a screen shot with windows' Clip-and-draw.
- tatewise
- Megastar
- Posts: 28414
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Find Duplicate Individuals; why so long time?
It is actually saying between 191 and 764 minutes.
Large numbers of comparisons inevitably take a long time.
Large numbers of comparisons inevitably take a long time.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry