* Issue with Lumped Source Splitter

Writing and using plugins for Version 5 and above.
avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Issue with Lumped Source Splitter

Post by NickiP » 17 Sep 2021 16:37

I'm running Mark's Lumped Source Splitter against some of my early source records as I've moved census entries from citations to full sources and need to convert all the early ones I created.

It's been very helpful so far. However, have encounted an issue with trying to close the plugin by using the top right hand x once it's completed. On three occaisions now it's just started running again and FH goes no responding (which to be fair it does when it's running normally). The first time it did stop after a short time but now on two occaisons it's just gone back to running, albeit I don't know against what as the original source record should now be deleted, and FH goes not responding indefinitely. Obviously if I end task on FH, I lose the changes because it hasn't saved them yet and have to run it again. :?

I'm not sure why it's doing this but its getting rather annoying. Just thought I'd mention it as will continue to try and get the lumped sources split but I don't know if it's know behaviour or something else.

Nicola

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 17 Sep 2021 17:40

Mark will no doubt respond in due course.
However, he may find it useful if you could make a copy of your project now as a test case for investigating the problem.
i.e. In the Project Window use More Tasks... > Copy Project and give it a memorable name.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 17 Sep 2021 18:45

I’ve not seen that before, but happy to investigate.

A bit more detail would be useful please. Are you splitting generic or templated sources? One source at a time or in batches? Roughly how many individuals are there in your project? Typically, how many citations does the split source have (few, dozens, hundreds, thousands?). If you are seeing “not responding” during a normal run, it suggests the plugin may be running a bit slow, as splitting is virtually instantaneous for me.
Mark Draper

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 18 Sep 2021 00:53

Hi Mark

Sorry should have included more details. I'm splitting generic sources, one source at a time. I've just under 68k in the project but the two sources that have caused me issues had 74 (split to 39 sources) and 81 (split to 38 sources) citations respectively. Even when there has only been 8 citations to split (into 3 sources from memory) it's taken a fair time to run, but not as long as the above. That said it successfully without any issues split sources with 100-120 citations. However, each time run it has gone to not responding and then recovered, apart from the two occasions when I clicked to close the plugin and then it reverted to running again and went not responding. I had to end task on FH as it didn't recover and lost the split sources. I have so far managed to get it to split one of the two subsequently without issue but haven't tried the other one again yet.

In case it helps, it is a slightly elderly 8.5 year old HP laptop but does have an SSD (not the newer faster versions whose type slips my mind), i3 2.20ghz (4 cores) with 8gb RAM so it's not that low spec'd. It could of course just be my laptop's elderly age causing the issue. :(

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 18 Sep 2021 07:41

Thanks Nicola. I've a couple of mods in mind that might help, so I'll post an updated version here in a day or two.
Mark Draper

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 18 Sep 2021 08:35

Many thanks Mark. I've got sufficient data cleansing to do for now anyway. :D

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 19 Sep 2021 09:56

Nicola,

I can't see anything obvious in the original script that could cause the issue you have seen, so we'll take it one step at a time. This new version has a new main menu with a dedicated "Cancel" button. When you close this, does it stay closed?

More user feedback would be useful while processing larger projects, so I will add that later.
Attachments
Lumped Source Splitter (1.1.1).fh_lua
(18.43 KiB) Downloaded 34 times
Mark Draper

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 19 Sep 2021 12:04

Thanks Mark, I'll give it a try.

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 19 Sep 2021 22:33

Hi Mark

I've run this against a few sources (had to add "where within" citation text to a few as had literally just been linking to the source) but so far so good, albeit had one repeat of the issue.

First three times I ran it today worked as expected, cancel button after completing closed the plugin and didn't start "running" again as has happened before. No issues with the 1901 Census source issue I had previously when I ran it against the source record again and in fact it strangely took about a minute to report how many sources it wanted to create and another 3 minutes to complete (there were originally 81 sources and it reduced it to 51). I've never seen it complete that quickly! Usually it's 5 or so minutes to report and about 20 to complete irrespective of how many citations involved. The fourth source record I ran it against it unfortunately started running again after I clicked cancel when it completed. After a little while I end tasked on FH and then re-ran again and it worked OK so a little strange. Subsequently I've been closing FH before running the plugin against each individual source record just in case it's a resource issue but I wouldn't have expected that. Task manager anyway shows only max 20% CPU usage the whole time the plugin is running.

With the "not responding" issue, to be honest it seems to do this sometimes when I just change windows on the laptop but task manager shows the CPU usage updating regularly so I think it's just the display not responding and processing continues as it did finish. Possibly down to the age of my laptop.

I've run it subsequently against a couple more sources. I had issues with one but I think that was more down to me leaving the laptop running while getting some tea and it went to sleep. I did leave the process running when it woke up but after a while end tasked on it and ran again successfully taking the usual 20 or so minutes to complete.

I've some more to run it against with varying numbers of citations. However, some updating is required on them as they don't currently have any "where within" citation text which is obviously needed to use the plugin.

The "publication info" field obviously duplicates part of what is copied into the title so for me personally I just have to edit each new source record and remove it. I don't know whether there is a way to include the option to delete this in each source once created, suspect possibly not, but it's not that time consuming to do.

That said the plugin has been very useful in tidying up my sources which are still in a bit of a mess.

Thanks

Nicola

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 20 Sep 2021 10:04

When the plugin is running for several minutes is there any display output indicating progress?
If not, and the plugin is just busy working on converting data, then that can cause the 'not responding' status.
If the plugin does not interact via FH with Windows in any way, the OS thinks FH has gone dormant and 'not responding'.
The 'solution' is to provide some occasional progress display like the Progress Bar or use the new Plugin API function fhExhibitResponsiveness(). Some of my plugins use other techniques.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 20 Sep 2021 17:17

There's no progress bar on the plugin. I tend to run it from the Plugin window because at least then you get the status "running" showing until it prompts for interaction and/or completes. If you "pin" it to the tools menu, you get nothing at all to show its running until it prompts.

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 20 Sep 2021 18:55

I hasn’t appreciated quite how big the performance hit would be in a large project. In my test project (ca. 3k individuals, several hundred citations to process per source), the initial count is virtually instantaneous, and processing takes maybe two seconds!

If this experience is typical, it clearly needs a progress bar, so I’ll play a bit with very large projects this week and see what works best. May be a few days before I post back here though.
Mark Draper

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 21 Sep 2021 19:07

Now that I have added a Progress Bar, I can see what the problem is. I created a 50k individual test database by multiple clones of an unrelated individual with lots of sources. In FH6, all works as intended, and processing runs smoothly and completes in about 2-3 mins even on my old very basic laptop.

However, if I load the same project into FH7 on my desktop PC, it runs much more slowly. It often pauses for several seconds (sometimes much longer), and even after it exits the status remains “running” for several seconds.

I’ve played a little with garbage collection parameters, but haven’t yet found settings that enable the plugin to run smoothly to completion. I’ll raise a ticket with CP, as it’s not ideal for plugin authors to muddle through by trial and error. They’re the ones who know the internal plumbing, so need to get FH7 working properly, or at the very least provide “best practice” advice.
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 3238
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Issue with Lumped Source Splitter

Post by ColeValleyGirl » 21 Sep 2021 19:10

Have you thought of asking on the lua mailing list?

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 21 Sep 2021 19:15

No - it’s not my job to fix FH7.
Mark Draper

User avatar
ColeValleyGirl
Megastar
Posts: 3238
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Issue with Lumped Source Splitter

Post by ColeValleyGirl » 21 Sep 2021 20:15

I asked because it sounds like a difference between lua 5.1 and 5.3… not an fh issue. Assuming you're using the same api calls.
Last edited by tatewise on 21 Sep 2021 20:19, edited 1 time in total.
Reason: Spelling corrections

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 21 Sep 2021 20:17

There is already open CP ticket 'Plugin issues with large Projects' #381229 dated June'21 that I've raised on this problem.
It seems to be related to garbage memory management in Lua that is much more sluggish in v5.3 than v5.1 was.

See the FHUG Plugin Discussions postings:
Export Gedcom / FH7 problem? (18417)
Export GEDCOM Plug - Problems with File Conversion (19092)
exporting to GED error plugin 4.4 (19316)

I have given Martin at CP some tests to run on large projects using two versions of my Show Project Statistics plugin.
One has no fix for garbage memory management and the other version does fix the problem.
However, he has not found time to complete those test runs.
Since no solution is on the horizon, I have started adding the fix to many of my plugins.

1) Simple Fix
In the iteration loop typically once per record, I have now been inserting the following statement:
collectgarbage("step",0)
That lets the plugin progress smoothly and faster usually without any (Not Responding) status.

The recommended alternative of using the fhSleep() function in the same position does not help.
fhSleep(0,0) does not improve the symptoms at all.
fhSleep(1,1) dramatically increases the run time of the plugin.
Also, the new function fhExhibitResponsiveness() does not seem to help.

2) Aggressive Fix
At the start of the plugin before other statements are executed, I have now been inserting these two statements:
collectgarbage("setpause",50)
collectgarbage("setstepmul",300)

They perform a more aggressive garbage memory management than the default but without any run time increase.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
ColeValleyGirl
Megastar
Posts: 3238
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Issue with Lumped Source Splitter

Post by ColeValleyGirl » 21 Sep 2021 20:21

I believe the lua garbage collection approach changed between 5.1 and 5.3 but don't quote me on it.

https://stackoverflow.com/q/58116647

User avatar
Ron Melby
Megastar
Posts: 797
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

Re: Issue with Lumped Source Splitter

Post by Ron Melby » 21 Sep 2021 20:29

__g = collectgarbage('count')
if __g >= 622380 then
collectgarbage('collect')
end

I do my table writing and table saving gc like this.

the idea is to keep it at around 3/4 to 5/8 of a gig total run space. best time I can get after weeks of test.

I can strategically place this, so for general stuff, it would need some type of wrapper count or something, so it is not run every cycle. 'automated' garbage collection is terrible on any version of lua.
FH V.6.2.7 Win 10 64 bit

User avatar
tatewise
Megastar
Posts: 22768
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Issue with Lumped Source Splitter

Post by tatewise » 21 Sep 2021 20:50

Thank you Ron, but here we are discussing FH v7 Lua 5.3 problems and you don't have v7 so your solutions may not help.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry

User avatar
Ron Melby
Megastar
Posts: 797
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

Re: Issue with Lumped Source Splitter

Post by Ron Melby » 21 Sep 2021 23:06

5.3 has a limit in program size as well, in 5.1 it looks to be about 1.24 gig what is it in 5.3? you cannot malloc in 5.3 so you are fixed memory there as well. I have spent months on that tipping point in 5.1, and while it may be upwards to double that its no better, I hear the same problems in gc and mem in 5.3 AND 5.4 from guys I can barely understand because their math skills and programming skills and expertise in C++ are much better than mine. So, there is an optimal number for 5.3, and at some point I will get 7, after some of the obvious bugs are worked out, and I will mess with that and find the number. right now, I am skating memory failure on a program that is fundamentally that _G table deconstructor you helped me make, stripped down. it runs like blazes on like your loc file, which is probably very real world, but on my huge file, it has been running for about a week now, and deconstructing that single table into several tables to display with fhoutputresultcolumn (whatever the display thing is) is running two days so far, and I am sure I am in it, no printing or debug, but sure I am there. it raises a problem in that we still do not have object SIZE in the language, to consider if the program should run. I would doubt however back to the gc, that the number can be doubled, after all, the number is in kilobytes and still does not exceed 5 in length, in 5.3 nor 5.4.
FH V.6.2.7 Win 10 64 bit

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 22 Sep 2021 16:42

Thanks folks. The "simple fix" solution seems to provide the more reliable option. With this, the plugin always completes, and is usually (but not always) smooth. There is still the occasional holdup, but it clears itself relatively quickly.

I've got some ideas about making the searching more efficient, so I'll implement both these changes before posting a new version.
Mark Draper

avatar
Mark1834
Megastar
Posts: 1030
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Issue with Lumped Source Splitter

Post by Mark1834 » 23 Sep 2021 20:12

Updated version attached, with 3 changes
  • A simple progress bar, so it is clear whether the plugin is still working or ground to a halt.
  • Forced garbage collection, which definitely helps in keeping things a bit smoother.
  • Restructured search logic, so it remembers the results of counting citations to focus the linking on only relevant records. This makes the second step much faster - probably not noticeable with fewer than a couple of thousand records, but much more significant for a large project.
Is this version more reliable?
Attachments
Lumped Source Splitter (1.1.2).fh_lua
(20.45 KiB) Downloaded 29 times
Mark Draper

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 24 Sep 2021 13:33

Wow, thanks Mark that has made a difference. I've not got many at the moment to run it against as still trying to tidy up the sources but I did have a 2 citation source. It searched through 91k+ records and then completed within a couple of minutes. Normally even with very few citations it would take about 20 minutes so that does seem to have improved things. I'll run it against others with varying numbers of citations but may be a number of days before can do this.

However, very many thanks for this, it is much appreciated.

Nicola

Edit - have actually run it now against a few more, largest number being 30 citations, and had none of the previous issues and it has completed within a few minutes each time. :D

avatar
NickiP
Famous
Posts: 188
Joined: 26 Feb 2013 12:36
Family Historian: V7
Location: UK

Re: Issue with Lumped Source Splitter

Post by NickiP » 24 Sep 2021 16:31

Hhm, just had a weird result. I had 67 citations for National Probate Calendar, I think it was 33 unique when it did the initial record check. It has created all the sources but only assigned 3 sources to the correct events. There are 61 citations remaining against the original source record and 28 source records not assigned to anything. Can't see why. I know a few have witness events which may be the issue but I didn't think they all did. It is I suppose possible. I will have to go through manually and change the sources to see.

I don't tend to use witness events, only did for a short while when first started using FH. I am in the process of removing them and creating individual events for the witnesses instead.

Post Reply