* Issue with Lumped Source Splitter

Writing and using plugins for Version 5 and above.
avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 24 Sep 2021 19:16

Odd. I don’t think witnesses make any difference, as it is the event that is witnessed, and the fact of being witnessed doesn’t have a separate source.

The plugin first examines each use of the lumped source to determine how many unique citations there are. Then it creates a new source for each one of these. Finally, it goes back to the sourced facts and transfers the citation to the new source. It sounds like this step is not always happening. I’ll experiment, but if you see anything different about affected facts that will help.
Mark Draper

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 24 Sep 2021 19:26

Tip: Nicola, as long as you don't close FH you can examine the before and after Plugin state quite easily.
Use Edit > Undo Plugin Updates and the database instantly reverts to the before Plugin state.
Use Edit > Redo Plugin Updates and the database instantly reverts to the after Plugin state.
You can flip flop back and forth as many times as you like looking for clues. I do it often when debugging plugins.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 25 Sep 2021 00:14

Thanks Mike, I'm well familiar with the "undo" function. :D

I chose not to undo the Probate Calendar entries but to link them manually as they were already created and I expected it would happen again if I re-ran it, but anything is possible. One weird thing I did notice was that the sources for some of the witness facts didn't show up against the linked witness, I had to link them manually. From memory it wasn't all of them so it's a bit strange.

Even more strange is that I've just run the plugin against another source record where a number of the citations had witness facts too. This time it ran correctly against them, albeit one anomaley was that it rejected a family record which had a citation. I'm not sure how that citation got created because I don't create sources or citations for family records, only the marriage events. Anyway I deleted it and all is OK, the corresponding marriage event had a source record created from the same citation.

The only difference between the two sources was that the witness records were different, one was executors for probate and then other marriage witnesses. Not sure though whether that should make a difference.

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 25 Sep 2021 10:17

You say "the sources for some of the witness facts didn't show up against the linked witness".
There are two scenarios related to shared Fact Witnesses and associated Source Citations.
You may already know this but I thought I would check.
  1. If the Source Citation is linked directly to the Principal Fact then only that one Citation needs to be converted.
    It will appear not only against the Principal Fact (blue bullet) but also against every Witnessed Fact (blue arrow).
    Those Witnessed Fact Citations are not separately converted they are simply displaying the Principal Fact Citation.
    So I don't understand how the Principal Fact can have a converted Citation but the Witnessed Fact does not show it?
    *
  2. If the shared Fact Witness has its own Source Citation then it is only shown against the Witnesses for... window when that Fact Witness is selected. However, I think it unlikely that you have used such Citations.
Perhaps some screenshots showing the Citation and its omission would help.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 25 Sep 2021 10:23

Thanks Nicola - I can explain one of your issues, but not the other one.

If a fact has witnesses, that on its own does not matter. The witnesses are completely separate from the source, so are not affected by changing the source citation. If, however, the witnessing has its own source (which I must admit, I didn't realise until today was possible), that source is not seen by the plugin, so is not processed and remains cited to the original lumped source. I will fix that.

So far, I haven't been able to reproduce your unassigned new sources. I think I can see where in the plugin it is happening, but not why. If it sees a citation it doesn't understand, it aborts the linking and leaves the remaining records unlinked. It would be better to just abort that new source and continue with the remaining ones, but I missed modifying that step when I changed the linking process. I'll fix that as well, but we need to understand why it is getting upset.

Could I ask you to make a small change to the plugin script please, which will help diagnose the record or fact that is causing the problem. In the plugin window, click on the More >> button in the bottom right corner to display extra options. Highlight version 1.1.2 of the plugin and select Edit.... Scroll down to line 445 and change it from

Code: Select all

if not key then return end			-- do not process non-lumped citations
to

Code: Select all

if not key then fhMessageBox('Null key detected for ' .. fhGetDisplayText(pFact)) return end			-- do not process non-lumped citations
Close the editor window and accept the option to save changes.

This will generate a message to indicate which fact is causing the problem, so hopefully we'll be able to determine what is different about it if it happens again.
Mark Draper

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 25 Sep 2021 11:23

Mark, I wonder how your plugin searches for Source Citations.
The usual technique to find a particular tag that may occur in various different positions is to use ptr:MoveNextSpecial() as illustrated in the Sample Plugin Scripts for Find Date Phrases and List All UDF Fields.
Then it does not matter what data items have a Source Citation the Plugin search loop will find every SOUR tag.
You don't need to know what item types may or may not have Citations as the exhaustive search will find them all.
e.g. Do you know that all the following tags may have a SOUR citation, rare though some may be:
NAME, FONE, PLAC, ROMN, OBJE, _SHAR, _SHAN & NOTE2 and of course INDI, FAM, NOTE & OBJE records, although any record that can contain NOTE2 tags may have SOUR tags.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 25 Sep 2021 12:27

Good tip, thanks. I'd covered most of those by specific searches but using that method should speed it up as well as capturing the more obscure sources automatically. There are also some techniques I developed for assessing data equivalence in connection with FH/RM syncing that might be useful here. The more general I can make the script, the more resilient it should be to unusual data that didn't exist in the test datasets.
Mark Draper

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 25 Sep 2021 13:10

If you are interested, there are some efficiency tricks in my Where Used Record Links plugin, which has to solve a similar problem of searching for links to a subset of selected records. See lines about 600 to 650.
You will see where I've inserted progbar.Step() and collectgarbage("step",0) in the ptrItem:MoveNextSpecial() loop.

The main trick is a shortcut to determine if the record link found by the search refers to one of the selected records.
Originally, it searched the list of selected records in the arrRecs table until it found a ptrLink match:

Code: Select all

local ptrLink = fhGetValueAsLink(ptrItem)
for intRecord, ptrRecord in ipairs (arrRecs) do
	if ptrLink:IsSame(ptrRecord) then	-- Found a selected record in slow search loop
		-- do stuff with record --
		break
	end
end
Now it first builds a lookup table dicRecs using Record Id as the key derived from arrRecs:

Code: Select all

local dicRecs = {}				-- Table of Record Id keys and arrRecs keys as values
for intRecord, ptrRecord in ipairs (arrRecs) do	-- Build a Record Id index to target records
	local intRecId = fhGetRecordId(ptrRecord)
	dicRecs[intRecId] = intRecord		-- This avoids searching the array of records
end
Then to determine if ptrLink matches a selected record it just indexes the dicRecs table:

Code: Select all

local ptrLink = fhGetValueAsLink(ptrItem)
local intRecord = dicRecs[fhGetRecordId(ptrLink)]
if intRecord then				-- Found a selected record without a slow search loop
	local ptrRecord = arrRecs[intRecord]
	-- do stuff with record --
end
The speed of the plugin was originally governed by the number of selected record links to search for.
The greater the number of selected record links the slower and slower it got, especially in large Projects.
Now it runs dramatically faster and is independent of the number of selected record links.

Mark, I suspect you already do this, but to provide regression tests for my plugins I create a specific Project for each Plugin and populate it with test cases to ensure they continue to be handled correctly.
I also have some very large Projects to test run time performance, memory issues, progress bars, etc.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 25 Sep 2021 14:46

Thanks Mike, I wasn't aware you could have two scenarios for shared Fact Witnesses and associated Source Citations.

I think it's quite likely that's the difference between when it ran OK and when it failed. The second time there were Fact Witnesses the marriages would have been edited first to include text for "Where Within" because the Citations to source record search returns those at the top of the list. However, for the Probate Calendar witnesses, I think it's quite likely I'd edited the source citation to include Where Within for a witness before I got to the principal Fact and had to add it again. The text would have been the same but it obviously was a different citation to the one on the Fact Witness. I'm assuming it was something about that it got upset about and aborted after successfully linking to three new sources. The next must have been one of those and it didn't like something about it.

Mark I've added the extra code to the plugin as requested. I'm not sure though whether I'll have any more Fact Witnesses amongst the sources still to split but anything is possible. I wasn't expecting the ones I found last night that completed successfully. If I do find one I may have a play with how I add the Where Within text and see what happens.

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 25 Sep 2021 15:41

No problem - the change should catch other similar issues as well, not just witnesses.

Out of curiosity, I ran my 50k record test project through my original Python splitter to see how it compared. The logic there is similar in principle to MoveNextSpecial as it just searches for matching sources without being too bothered about what they are attached to. What grinds to a halt in FH7 without active garbage collection runs in about 2-3 mins in FH7 with collection or FH6 without, and about 20-25 secs in the Python version. Hopefully with some further optimisation I can squeeze plugin completion time a bit more for larger projects (and possibly even simplify the script in the process).
Mark Draper

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 25 Sep 2021 16:34

Mark, I have taken a quick look at your plugin.

Firstly, it only reviews INDI, FAM & OBJE records, whereas many more record types may contain SOUR citations.
Any record type that may contain a local Note (NOTE2) may also have SOUR citations.
That includes Source, Repository, Submitter, Place and Research Note records.

Secondly, the CountCitations() function is called once for each selected Source in tblS.
Thus, all the INDI, FAM & OBJE records (and maybe more) need to be searched again and again for each Source.
That is not a significant problem for a small Project with only a few selected Source records.
But I ran it on a Project with about 25,000 Individual & Family records and selected all 3,000 Source records.
It took about 10 seconds to count the Citations for each of the Source records. It would have taken hours for all 3,000.
That can only get worse if more record types are searched and ptr:MoveNextSpecial() is used.
Whereas, the current Where Used Record Links finds all the Citations for all 3,000 Source records in about a minute.
The earlier version without the shortcut trick would have taken hours.

So, perhaps your plugin can find all the Citations for each of the multiple Sources in a preliminary pass of the database saving the results in a table, which would be quick, and then handle their conversion one Source at a time.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 25 Sep 2021 16:49

That's exactly what we've been discussing today. The existing logic is fine for selective splitting in simple databases up to a few thousand records, but needs updating for more general coverage and large datasets.
Mark Draper

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 28 Sep 2021 09:27

Changes in this latest update:
  • All source citations are processed, no matter how obscure. Earlier versions searched only the commonly used source items, so did not process things like distinct sources for witnessing.
  • The initial citation search is stored for reuse, so subsequent splits are much faster. Probably not noticeable for projects up to a few thousand records, but significant for larger ones.
  • The progress bar has been revised to reduce its drain on the system (progress bars generally slow down the process they are monitoring as they divert resource, so the ideal compromise is to update the bar only as frequently as is needed for a useful output).
Attachments
Lumped Source Splitter (1.1.3).fh_lua
(20.63 KiB) Downloaded 58 times
Mark Draper

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 29 Sep 2021 17:22

Thanks Mark, that's much quicker checking records.

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 20 Oct 2021 10:29

The update is now in the Plugin Store as version 1.2.
Mark Draper

Post Reply