* Find Duplicate Individuals; why so long time?

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
Post Reply
avatar
torleiffh
Gold
Posts: 26
Joined: 14 Dec 2014 00:11
Family Historian: V7
Contact:

Find Duplicate Individuals; why so long time?

Post by torleiffh »

A project with 127000 individuals.
Selected a subset of only 23 records.
Plugin says Estimated run time is only a few seconds, but it took over 50 minutes.

Why this long time?
Why not search only the 23 records?
User avatar
Ron Melby
Megastar
Posts: 917
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

Re: Find Duplicate Individuals; why so long time?

Post by Ron Melby »

nearly 3 million comparisons.

thats a little over 53000 comparisons per minute.
FH V.6.2.7 Win 10 64 bit
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals; why so long time?

Post by tatewise »

127,000 x 23 is actually just over 2.9 million and dividing by 50 gives over 58 thousand per minute.

But the plugin is doing more than that as it also compares close relatives to improve the hit rate and reduce false positives, so it is making much more than 2.9 million comparisons.

Have you checked the plugin Help & Advice button, which explains how it works?
There are also tips in the FAQ to avoid the long run-time on large GEDCOM databases.

After it has been run once the subsequent runs will be much faster as only changed records are considered.

I'm not sure why its estimated run time is so much less than the actual run time but it is only a guess and depends on many factors including your PC performance and the complexity of your Individual records.

Were the results useful?

If you only wanted the 23 Individual records to be compared amongst themselves then you would have to export a GEDCOM containing just those 23 and their close family records. Afterwards, the duplicate records would need to be fed back into your master project.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
torleiffh
Gold
Posts: 26
Joined: 14 Dec 2014 00:11
Family Historian: V7
Contact:

Re: Find Duplicate Individuals; why so long time?

Post by torleiffh »

screenshots from the run:
Skjermbilde 2021-09-16 133317.png
Skjermbilde 2021-09-16 133317.png (8.97 KiB) Viewed 2265 times
Skjermbilde 2021-09-16 124557.png
Skjermbilde 2021-09-16 124557.png (27.72 KiB) Viewed 2265 times
The results were OK, but I still don't understand the subset of 23 included.
If I did not select the 23, there are 127000x127000 comparisons?
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals; why so long time?

Post by tatewise »

Yes, if you did not select the 23, there would be 127000x127000 comparisons.

If you tick the top option when you run the plugin again it should be much faster.

If you do NOT tick that option and do NOT select that subset of 23 records, what is the estimated run time?

I can only apologise for it getting its estimate too low.

BTW: I am fascinated how you got those screenshots saying the Plugin Not Run Yet when you have clearly already run it for 50 minutes and got some results. Did you use Edit > Undo Plugin Updates?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
torleiffh
Gold
Posts: 26
Joined: 14 Dec 2014 00:11
Family Historian: V7
Contact:

Re: Find Duplicate Individuals; why so long time?

Post by torleiffh »

Just started again, now saying 764 minutes ...
I just took a screen shot with windows' Clip-and-draw.
Skjermbilde 2021-09-17 183214.png
Skjermbilde 2021-09-17 183214.png (27.16 KiB) Viewed 2178 times
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Find Duplicate Individuals; why so long time?

Post by tatewise »

It is actually saying between 191 and 764 minutes.

Large numbers of comparisons inevitably take a long time.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Post Reply