* Audit RM Direct Import plugin

Importing from another genealogy program? This is the place to ask. Questions about Exporting should go in the Exporting sub-forum of the General Usage forum.
Post Reply
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Audit RM Direct Import plugin

Post by Mark1834 »

There has been a cluster of RM direct import flaws identified recently -
  • Corruption of dual dates
  • Inconsistent and superfluous sort dates
  • Dropped media from shared citations
  • Inconsistent formatting of RootsMagic Import Fact Set
While I have every confidence that CP will correct these in time, they are still a potential irritant for users who have imported their RM database already and started working on it in FH.

I think we need an Audit RM Direct Import plugin that collects together the various fairly simple plugin solutions to identify and fix these problems in active projects. It could be kept up to date as new issues are identified, and linked from the current KB page on RM import.

It will be a few days before I have a draft (desk time is severely limited on long sunny June days ;)), but watch this space...
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

I have a rough draft plugin, so I’ll share the methodology before posting in case somebody spots a potential problem.

Split date handling is just a copy of the earlier dedicated plugin, which I will delete when this one is live. I’ve noted an additional problem in testing, whereby incomplete split dates (e.g. “about Jan 1740/41”) are discarded completely on direct import.

Invalid sort dates as a result of FH not deleting the default sort date from undated events are deleted (identified as sort dates where DatePt1 does not contain a valid year).

Superfluous sort dates matching the first date of a range are deleted (identified where event DatePt2 is not null and event DatePt1 and sort date DatePt1 return the same date).

FH does not have a mechanism for tracking duplicate lumped citations (a long-term grump of mine), so I determine citations as equivalent if a concatenation of the source record number, citation text from source and all citation field name and contents evaluates as a common string. A first pass through the database determines which media files are linked to any occurrence of that citation, then a second pass fills the gaps such that all instances are linked to all media.

I extract the [.index] section from its arbitrary position in the Fact Set file and reconstruct a new index at the top of the file omitting any undefined Facts (such as the RM Alternate Name).

Anything potential traps?
Mark Draper
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

Mark1834 wrote: 25 Jun 2022 11:04
Superfluous sort dates matching the first date of a range are deleted (identified where event DatePt2 is not null and event DatePt1 and sort date DatePt1 return the same date).
If I read this correctly, you plan to delete anything that is not deemed a valid date?

This is not a good thing. There are many of use who use sort 'dates' that are not really dates. For example, my Census sort dates are 4 digit numbers based on the census year such as:

3850 -> 1850 Federal
3860 -> 1860 Federal
3870 -> 1870 Federal
3880 -> 1880 Federal
3885 -> 1885 State Census
etc....

There are a number of people that use such schemes for grouping facts together and not having them scattered through out the other facts.
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Thanks, that's just the type of challenge I'm after.

Could you describe how that technique is used in practice please? RM allows free-form sort dates, but FH does not. You can only enter simple dates, so it does not look like a sustainable strategy for FH as you cannot add any new consistent data, and I'm not sure how FH would handle sorting on invalid dates such as these.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28333
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Audit RM Direct Import plugin

Post by tatewise »

There appear to be two misnomers there!

Firstly, the quoted "Superfluous sort dates..." only apply where Range Dates are used and I doubt if any Census event uses Range Dates.

Secondly, there is absolutely no problem with a Sort Date of the year 3850 or 3860. They are not "free-form dates".
FH is perfectly happy with those (I've tried them) and it won't be considered as a superfluous Sort Date by the plugin.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

My RM7 sort dates came into FH fine, they appear to. When I open the data entry assistant, it has placed my sort date into the 'Separate Sort Date year field. When I entered the 1950 census in the example, I was able to enter 3950 as the sort date in the year field. It works fine in regards to ordering. I would post an image, however I haven't figured out that bit yet.
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

tatewise wrote: 25 Jun 2022 19:15
Firstly, the quoted "Superfluous sort dates..." only apply where Range Dates are used and I doubt if any Census event uses Range Dates.
I apparently missed the 'Date Ranges' part of things.
User avatar
tatewise
Megastar
Posts: 28333
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Audit RM Direct Import plugin

Post by tatewise »

To post screenshots see FHUG Knowledge Base Forum Usage Tips under Attachments and Taking Screenshots.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Cleary a case of brain f**t here! RM sort dates can't be free form, they are stored as a number in the database :?

There are a couple more wrinkles around sort dates that I have checked. RM has a useful facility to sort events on the same day into a defined order by adding a dash and a number after the sort date. FH has no mechanism to support these, so the suffixes are silently discarded. RM also allows incomplete sort dates without the year (e.g. "6 Jun", even though it probably shouldn't). These are retained on import to FH, but a placeholder year of 6383 is added, similar to the text sort date of "63 6383" that FH gives to undated events on import. The plugin will not delete these, as they are valid dates. Clearly that is part of the same FH bug so will be fixed for future imports, but they could be lurking in imported projects.
Mark Draper
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

Mark1834 wrote: 25 Jun 2022 21:17 RM has a useful facility to sort events on the same day into a defined order by adding a dash and a number after the sort date.
Yes, I make use of dates like 2022-1 and 2022-2 to sort things also. I also know that there is a certain point in RM where things go badly if the sort number is too large. As I recall it is somewhere above 5000. I have an old memory but I am thinking that when the numbers were too big, it added odd dates in the 100's or something. so those would be dates you might want to consider invalid. That is in part why my numbering scheme for the Census is what it is. I encountered problems if I used numbers too big.

Edit: Just checked, RM7 actually deletes sort date numbers greater than 4999, so that shouldn't cause any odd dates that need to be considered.
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Nobody picked up any flaws in the basic methodology, so the plugin is ready to go. There is a single menu screen (which respects FH7 custom fonts and zoom), as shown:
Capture.PNG
Capture.PNG (7.32 KiB) Viewed 3609 times
  • Invalid sort dates are not visible in the Date Entry Assistant, but are clutter in your project file resulting from FH not deleting the RM placeholder sort date for undated events. Unnecessary sort dates are sort dates equal to the start of a date range, and cause inconsistencies in how facts are displayed compared with similar data entered directly in FH.
  • Missing citation media appear to arise only from RM8 shared or duplicated citations.
  • Fact Set integrity arises from the way FH imports the RM Alternate Name from version 7.0.9 onwards, and can cause problems for other existing plugins.
  • Dual dates are changed by a year on import, so this options compares FH records with your RM originals (automatic updating is not possible here, as the plugin cannot easily distinguish between a flawed import and a deliberate user updating).
While most or all of these issues will be resolved for new imports in the next minor update of FH, that does not fix existing projects.

The nature of these bugs means that the vast majority of projects imported from RM are likely to be affected by some if not all of the issues noted, so this is a recommended plugin for all ex-RM users.

I have tested it on a number of RM databases, both short test ones created for that purpose and large user files kindly shared by FHUG members, and it appears to behave as expected, but as ever with a new plugin, please ensure that you either backup your project before running, or test it first on a project copy.

Any problems - you know where I am...

Added in edit: Plugin deleted pending replacement by a version with a "list" option.
Last edited by Mark1834 on 27 Jun 2022 09:06, edited 1 time in total.
Mark Draper
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

Is this action logged anywhere? I would love to see what FH is interpreting as dubious sort dates.
Attachments
Capture100.JPG
Capture100.JPG (20.22 KiB) Viewed 3597 times
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Not at the moment, but it is reasonable to be cautious with such a large change to your project. It's not difficult to provide a "list" rather than "delete" option, so I've deleted the first plugin draft as it is only you who has downloaded it so far, and I'll replace it with new version providing a list later today.
Mark Draper
avatar
MFriend
Famous
Posts: 111
Joined: 30 Jan 2021 07:43
Family Historian: V7

Re: Audit RM Direct Import plugin

Post by MFriend »

Hi Mark:
Once you have the revised plugin uploaded I'll do some testing today :)

Thank you for your work...

Matthew
avatar
MFriend
Famous
Posts: 111
Joined: 30 Jan 2021 07:43
Family Historian: V7

Re: Audit RM Direct Import plugin

Post by MFriend »

On a side note, and this is probably not something you can fix in your plugin, there is also the duplicate place name bug that I was told should be fixed in an upcoming update to FH (that is where if you run place clean for instance in RM8, you easily end up with multiple copies of the same location. If a duplicate is missed it imports into FH as blank.)
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

I’d forgotten that one. I’ll have a look later, but places might be a bit tricky.

I have thought of a wrinkle in the way the plugin handles citation media. If there is no distinct text between different citations with different media, all citations get all media (so “false positives” of a missing link).

It’s not worth holding the plugin back for that, as it only affects RM8 users who have used shared or merged citations, but I have temporarily disabled the update option under the Citations button pending a fix.

The rest is ready to go, with a new option to list sort date issues prior to fixing.

(If the plugin has disappeared, there will be an update or explanation later in the thread)
Mark Draper
avatar
MFriend
Famous
Posts: 111
Joined: 30 Jan 2021 07:43
Family Historian: V7

Re: Audit RM Direct Import plugin

Post by MFriend »

Yeah the citation merging issue was one that I reported to RM support the first time about March or so of 2021. And I've mentioned a couple of times in the RM forums. The duplicate citation merge in RM8 does NOT look to see if the weblinks (webtags) or attached media are different. If the citation text or transcription match (like for instance I had hundreds and hundreds of anscestry created citations with missing citation info) then it all becomes one big junk citation covering dozens of unrelated people.

RM Support didn't think it was an issue worth fixing. (to add a check for different webtags/links or different media). It actually turned out for good for me as it forced me to spend several weeks opening each citation in ancestry and copying the citation information into each citation (so my citations are much better). But if someone doesn't know better, it will mess up their citations in RM8.
User avatar
kfunk_ia
Diamond
Posts: 77
Joined: 03 Dec 2019 22:50
Family Historian: V7
Location: Iowa, United States

Re: Audit RM Direct Import plugin

Post by kfunk_ia »

Ok, so if I understand the sort date part of the plugin, it is meant to replace all occurrences of no date with "63 6383" and in the cases where I have a date range, it is going to change the sort date to the first date in that range. So the bits below in the Sort Date column will become my new sort dates, correct?
Attachments
sort.JPG
sort.JPG (105.46 KiB) Viewed 3468 times
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

No - all the listed sort dates are deleted. "63 6383" is how the default RM sort date for undated events gets imported, and is always invalid. For events with a date range, FH always sorts them according to the start of the range anyway, so the sort date is superfluous and gives an inconsistent display in the Properties Box compared with entering the data into FH directly. If you have a range such as "between 1900 and 1910" and set the sort date to say "1902", that's fine. Everything is valid and nothing is changed.

On the issues of blank place names resulting from duplicate RM places, I think all the plugin can do is list the duplicated RM places along with blank FH names. Pairing them up for merging is best done manually, as there is no unambiguous way to do it automatically.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Thanks to Matthew very kindly sharing a copy of his original RM8 database, I now have an updated version of the Audit RM Direct Import plugin. There are a number of enhancements from the original version, which I have now deleted.
  • I have decided that missing citation media is essentially unfixable, as there is no solution that finds and corrects all examples automatically without relinking media that should not actually be linked, as in my pages of a book example. I have therefore limited the plugin to a more extensive test and count, where it lists the number of dropped media items per source. The only scenario that seems to create a problem is direct import from an RM8 database that contains shared or merged citations. Direct RM7 or RM8 import without shared/merged citations is fine, as is a GEDCOM import (where RM8 splits the citation when creating the GEDCOM), but this is not a general solution, as too many other features are left behind in GEDCOM. Fortunately, it is likely to affect only a limited number of users, but for this group reimporting the RM8 file when the bug is fixed in FH 7.0.12 is probably the best solution.
  • The dual date check has been enhanced to prevent date phrases such as "1880/81" being reported as dual dates (but the sort date check does list them, so the user can convert them to a more compliant format if they wish).
  • I have added an additional option to check the original RM database for duplicate place names, which FH does not support. It makes no attempt to correct the missing name on import, but gives the user a list of places to merge manually in FH.
Attachments
Audit RM Direct Import (0.1.2).fh_lua
(19.4 KiB) Downloaded 54 times
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

FH 7.0.14.1/7.0.15 fix most of the RM import flaws checked by this plugin (dual date year corruption, invalid sort dates, blank place names, RM Import fact set structure, dropped citation media), but not all of them.

Deleting sort dates identical to fact dates does not work for double dates, so all double date facts still have a sort date that causes the event to be displayed inconsistently in the Property Box. More significantly, FH silently discards double dates defined only by the year (e.g. "1750/51"). While it is ambiguous what such a date actually means (see the recent thread on double dates), both RM and FH support such a format, so they should be imported correctly. I'll raise a ticket with CP.

I'll leave the plugin attached in the previous post, as it is useful for users who have already imported their RM database (however, only a small minority have downloaded it, probably a visibility issue for a forum discussion plugin).

The sort date check needs modifying, as it doesn't handle all double date sort dates correctly. Rather than update this plugin, I'll spin that feature off into a dedicated Store plugin, as it is potentially useful for all users, not just RM migrants.

Added in edit: CP have "logged for investigation". While it certainly doesn't warrant another emergency version, hopefully it will be sorted for 7.0.16 when its time comes.
Mark Draper
avatar
RandyC
Gold
Posts: 14
Joined: 14 Feb 2021 00:26
Family Historian: V7
Location: North Idaho

Re: Audit RM Direct Import plugin

Post by RandyC »

Hello Mark,
I want to thank you for the work, I look forward to checking my database with this plugin. I have been "Making Hay" and haven't had a chance yet. I am one of those who imported awhile ago and I have done quite bit of editing to the database since then, so re-importing isn't an option.

I think you maybe right about the lack of downloads being due to " a visibility issue for a forum discussion" I only found out about this issue and the plugin because I have signed up for the weekly forum newsletter.

Thanks Again, Randy
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Audit RM Direct Import plugin

Post by Mark1834 »

Thanks Randy. That was a good prompt to upload the Check Sort Dates plugin, which is now available in the Plugin Store. It does a more comprehensive check for incorrect or unnecessary sort dates, and is recommended for use by anybody who has imported a project directly from either RM or TMG.
Mark Draper
User avatar
fhtess65
Megastar
Posts: 634
Joined: 15 Feb 2018 21:34
Family Historian: V7
Location: British Columbia, Canada
Contact:

Re: Audit RM Direct Import plugin

Post by fhtess65 »

I just tried it, Mark - looks great, though does rather create some work...Add that project to my to-do list :lol:

Thanks for this valuable tool ...

Teresa
Mark1834 wrote: 03 Sep 2022 08:14 Thanks Randy. That was a good prompt to upload the Check Sort Dates plugin, which is now available in the Plugin Store. It does a more comprehensive check for incorrect or unnecessary sort dates, and is recommended for use by anybody who has imported a project directly from either RM or TMG.
---
Teresa Basińska Eckford
Librarian & family historian
http://writingmypast.wordpress.com
Researching: Spong, Ferdinando, Taylor, Lawley, Sinkins, Montgomery; Basiński, Hilferding, Ratowski, Paszkiewicz
Post Reply