* Using Excel VBA to convert from lumped to split sources

Importing from another genealogy program? This is the place to ask. Questions about Exporting should go in the Exporting sub-forum of the General Usage forum.
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Having just migrated to FH from Family Tree Maker as a result of recent changes with the latter, I was faced with having to change around 700 or so detailed census records from the “source lumping” that is at the heart of the way FTM is structured (and generally does a pretty good job with) to “source splitting” to fit in with the inherent weaknesses in GEDCOM for complex database relationships.

I couldn’t find a plug-in to do it for me, so have taken a different approach – clean up the GEDCOM file before presenting it to FH. I have dabbled with using Microsoft Excel macros to produce customised reports from raw GEDCOM files, so that seemed like a possible approach here. I couldn’t find much (any?) discussion of this on FHUG Forums, but it was worth a try.

After a few late nights (!) I now have a prototype that seems to work as planned. While doing this, I have also taken the opportunity to tidy up some other differences in GEDCOM format between FTM and FH that were cluttering up the import log.
  • Moved all top-level data from the census records to NOTE tags (“1 CENS data record” to “2 NOTE data record”) as well as removing duplicated sources and changing citations accordingly.

    Ditto for other facts, such as death and burial. There is an issue with how this is handled for facts that already have notes recorded, but left to its own devices FH doesn’t seem to do a particularly neat job anyway (the new and old notes are both there, but not readily visible together).

    Converted custom _FOOT tags to NOTEs, as these were causing duplicated information to appear in report source citations.

    Converted FTM’s @Mnnn@ multimedia tags to FH’s @Onnn@, changed the tags to those supported by FH, and corrected invalid GEDCOM date/time data.

    Only later versions of FTM appended multimedia captions automatically, so my reports were a mix of captioned and uncaptioned images, which looked horrible. My file names were fairly structured (e.g. “Baptism – 1835 – John Smith – Chelsea MDX.jpg”) so I used the filename without the extension as the default caption to match what FTM was doing.
So far so good. I now have split census sources, and a FH import log that is down to a handful of minor issues that I can review individually (mostly around GEDCOM 5.5.1 tags that FH does not seem to support) rather than hundreds or even thousands of repeat comments that stopped me seeing the wood for the trees.

Before going too far, I’d like to ask whether anybody else has experience of a similar approach. Any traps for the unwary that I’m about to fall into (apart of course from carefully auditing both the macro code and the output data to ensure that it is really doing what I think it is doing!)?

Thanks. Looking forward to getting more familiar with FH!

Mark
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

Welcome to the FHUG Mark.

That certainly is a novel approach. I don't recall anyone doing anything similar.

If you are that competent with VBA, then I suspect you would pick up writing Plugins in Lua quite quickly.
That would stand you in good stead for any future data adjustments that may be needed.
Probably all those FTM structure changes you described could be performed by a Plugin.

As a newcomer may I suggest you study how_to:key_features_for_newcomers|> Key Features for Newcomers.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Thanks for the quick reply, Mike. Indeed, plug-ins and the transparency of the data storage in standard(ish) GEDCOM were two big pulls towards FH. I wanted to give FH as clean a dataset as possible in order to minimise dead time from reworking old data, and concentrate on playing with the fun new toys.....!
Mark Draper
User avatar
ColeValleyGirl
Megastar
Posts: 5465
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by ColeValleyGirl »

Mark,

Out of interest, did you have images/media linked to your 'lumped' sources? If so, have you managed to associate that media with the 'split' sources?
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Helen,

Yes, images came across fine. A typical entry is shown below:

As exported from FTM as a lumped source:

1 CENS Retired Policeman, age 62
2 DATE 31 MAR 1901
2 PLAC Walpole St Peter, Norfolk, England
2 SOUR @S7@
3 PAGE RG13/1557 folio 22 p1 (#1)
3 DATA
4 TEXT Appleton Croft, Walpole St Peter, NFK (>4 rooms)
5 CONT Thomas Draper, Head, M, 62, Retired Policeman, Pensioner, b Erpingham
5 CONC NFK
5 CONT Mary Draper, Wife, M, 61, b Winkfield SFK
5 CONT Frank Draper, Son, S, 22, Militia Private, b Walpole St Andrew NFK
3 OBJE @M265@
.....
0 @S7@ SOUR
1 TITL Census, 1901

After processing to split the sources, with changed fields highlighted:

1 CENS
2 DATE 31 MAR 1901
2 PLAC Walpole St Peter, Norfolk, England
2 SOUR @S380@
2 NOTE Retired Policeman, age 62
---
0 @S380@ SOUR
1 TITL Census - 1901 (UK) - Walpole St Peter, Norfolk, England (RG13/1557 folio 22 p1 (#1))
1 _TYPE Census
1 PUBL RG13/1557 folio 22 p1 (#1)
1 TEXT Appleton Croft, Walpole St Peter, NFK (>4 rooms)
2 CONT Thomas Draper, Head, M, 62, Retired Policeman, Pensioner, b Erpingham
2 CONC NFK
2 CONT Mary Draper, Wife, M, 61, b Winkfield SFK
2 CONT Frank Draper, Son, S, 22, Militia Private, b Walpole St Andrew NFK
1 OBJE @O265@

@M/O265@ is the linked image file. I still need to make sure it works ok on more complex examples (multiple media files, existing fact note, etc).
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

Mark, that change is certainly possible with a Plugin.
It could also deal with merging the CENS Value with any existing NOTE text.
OR
In your example, create an Occupation 'Retired Policeman' fact, and set Age fields to 62, but that hinges on whether all such CENS values provide standardised 'occupation, age 99' details.

The side advantage of a Plugin is that only valid FH data structures can be created within the Project environment.

Are the changes discussed here specific to your use of FTM?
Or would every exported FTM GEDCOM file use the CENS value, PAGE and TEXT fields in the same way?
i.e. They would always migrate to NOTE/OCCU, PUBL, TEXT.
One thing that different users may want constructed in different ways is the Source TITL.
Do you see this techniques being applicable only to CENS facts or to others as well?

I ask because such a Plugin could benefit many FTM migrants if it could be generalised.
Whereas the Excel VBA technique requires user to install MS Office Excel.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

Mark, I have given your VBA Source Citation conversion some more thought and have some concerns.

(1) Shared Sources

Your VBA converts each and every Method 2 'lumper' Citation into a separate Method 1 'splitter' Source record.

Taking your Census example, that means Thomas Draper, Mary Draper & Frank Draper will each have a 1901 Census Event Citation that links to a separate Source record. However, those three Source records will be identical, and that violates the database convention of avoiding duplicate data entry. They ought to be just one Source record shared by the three Citations.

Many other similar scenarios can apply where one Source document spawns multiple facts, e.g. Birth Certificate.

A Plugin could construct a dictionary of new Source records as they are created, and when a duplicate is encountered, it would automatically share the existing Source record rather than create a new duplicate. Can your VBA do anything similar?

(2) Source Types

In your example of Census Events it is very obvious that a Census Source record is required and how to construct its Title.

However, other Events may be more complicated. For example, a Birth Event may have Citations for a Birth Certificate, &/or a Parish Baptism record, &/or a Death Certificate that states birth details, &/or Census records that give birth details.

How should each such Citation be converted to a Source record with the appropriate Type and Title?
It would need some analysis of the Citation details and perhaps the Method 2 'lumper' Source record details.
A Plugin can perform that kind of analysis, and could be guided by user supplied parameters.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Mike,

Thanks for the detailed comments. It wasn't my intention to produce a universal script that would be useful across many different record keeping conventions, but really just to tidy up my own data prior to importing into FH. I suspect that for the experienced plug-in author, that would be the preferred method, but for me that is running before I can walk! I want to learn the basics of FH with a clean data set before exploring more esoteric features such as plug-ins. The census sourcing was my main motivation, and the other changes just got added in as they were relatively easy to add with only minor coding changes. The discipline was deciding when to stop! For example, I see there is a good plug-in for tidying up notes already, and I'm happy to use that even if writing something similar is beyond my current skill level.

For me, Excel had two key advantages - it is on my PC and I already know how to write macros (at least at the home hobbyist level)!

To pick up some of your other points, though, as they raise good general issues:

Recording census data is always a bit of a can of worms. The convention I have adopted is to transcribe the census sheet verbatim in the source citation (apart from substituting Chapman codes for full county names), and in the FTM census fact use the general description field to note age and occupation (FTM does not have specific fields for these attributes in the census fact type).

I do not create Occupation records from the census, as I prefer to keep that as a single field for each person, recording summary occupation details such as Police Officer, Medical Practitioner, etc. There are lots of different ways of saying "Agricultural Labourer", and I'd rather not have multiple records that differ just in detailed wording. If it requires more detailed comment or explanation, that goes into the Occupation note.

I will be adding ages specifically as I review the data in FH and progress my research over the coming weeks and months, but that will be done manually.

Shared Sources brings us back to the original reason for writing the macro in the first place. Multiple copies of the same citation data are a big "no no" for me. FTM maintains only one copy of the citation, as its basic philosophy is built around source lumping, not splitting. When it exports to GEDCOM, it does of course produce multiple copies due to the limitations of GEDCOM, but every copy is by definition identical. My macro does indeed create a set of multiple source citations, following the structure of the GEDCOM file, but it processes them to remove duplicates and assign common source numbers. So in the full GEDCOM file, each person listed in the census has the same @Sxxx@ link to one copy of the citation. I assess duplication based on the value of the original PRO citation, so in database terms, that is the required unique primary key for the census sources.

Other source types are indeed food for thought for the future. At the moment, I just use GRO certificates as sources for the primary event being recorded, and do not create formal links to any of occupation, residence, age, etc, etc. But there could indeed be multiple sources for birth, for example. Another one that did occur to me might need more exploration is Probate records, as I do cite those for both death and probate itself (but not for anything else). But they only refer to a single person, so I'm happy to leave as is for the moment and see if they need any attention once in FH. At the moment, they are all "lumped" from master source types such as GRO Birth, Probate Calendar, individual parish registers, etc. But none of these have the many-to-many complexities of the census, where a census entry could contain a dozen individuals, and each individual could be in up to eight censuses, so again a lower priority to review.

By then, of course, I won't be able to use the macro, as the data will now be in the FH version of GEDCOM. That will be my driver to learn how to write plug-ins!

The macro is really just a single use program (actually two, one for each of my family trees). I'll keep a copy even if I never use it again, as much of the code can be re-used in the future, just as I used old code from my Family Group Sheet writer for this exercise. Despite that, it was still quicker than reworking all the census data manually (and a lot more interesting) or putting up with all the untidiness of the imported data while I became familiar with FH.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

Hi Mark, you seem to have the points I raised under control.

Occupations
The FH Tools > Work with Data > Occupations feature allows Occupation titles to be managed, merged, edited, etc.
We recommend that Agricultural Labourer is recorded in the Occupation fact in the same format throughout, regardless of how it is recorded in the Census and captured in the Text From Source transcript. The same goes for all other Occupations.

Shared Sources
You clearly have this under control ~ it just hadn't been mentioned.

Source Types
For consideration later.

BTW: You may not yet have discovered ancestralsources:index|> Ancestral Sources that is a tool for capturing the most popular source documents in a convenient and consistent and customisable way.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

It did raise a more fundamental question in my mind about how FH manages sources, but I'll do some experimenting over the weekend to explore. If necessary, I'll start a new thread in the general FH use bucket, as it is not specific to importing or macros...
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

I understand what is happening now. It's mentioned in the Knowledge Base, but worth highlighting as it is a fundamental difference compared with FTM that could trip up the unwary migrant (me, for instance :) ).

When any citation from a lumped source is used in more than one "Fact", each citation is, and remains permanently, a separate instance. Copy/paste creates a second independent citation, whereas FTM simply links the same citation to additional Events. I can see why FH does that, but GEDCOM storage does come at a cost of poor database design.

It's not just an academic point, as one of my first tasks once I become comfortable with FH is to go through my GRO birth and death index entries and add the new data now online. These indices are a prime candidate for lumping, as there are generally a large number of very simple individual citations, but I will have to remember to update any multiple copies.

I have found the excellent plug-in for listing lumped citations, but it would be even better if it highlighted inconsistencies automatically. Even better if it ran in the background and offered to update additional copies automatically as soon as a change was made, but I don't know if that is even possible.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

The point is that FH fully implements the GEDCOM 5.5 specification ~ it is one of its USP ~ whereas some other products don't.
If they all did, then migration from product to product would be sooo much easier.

It is why we advise studying how_to:key_features_for_newcomers|> Key Features for Newcomers before going too far.

It would help if you identified the Plugin, and defined the inconsistencies you want highlighted.
Two Plugins come to mind; List All Citations for a Selected Source and Where Used Record Links.

Plugins cannot run in the background.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Sorry, was being slightly lazy, as I was doing the Forum on the iPad and couldn't remember what the plug-in was called :) .

I was thinking of List All Citations for a Source. If there were hundreds of entries to check, it might be useful to flag where Where Within Source is the same, but Text From Source is different. If it is in the same place, surely it should have the same content?? Even if it adheres strictly to GEDCOM, the data are still corrupt.

Realise though it might be difficult to generalise this and citations are not updated that often, so it is probably something best done by the individual user according to how they use the data and how big their dataset is.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

In this context there is little difference between the two Plugins.
But Where Used Record Links allows any number of records to be selected, and works for any type of record.
So it shows wherever Media records are used, or wherever Note records or Place records are used, etc, etc.

Regarding inconsistencies between Where within Source and Text From Source (or for that matter Entry Date, Assessment, Note and linked Media) the Result Set can be sorted on any of those columns by clicking its header. Then it would be relatively easy to eyeball them to spot any inconsistencies.

Additionally, you could customise the Plugin to provide that extra feature, or discuss it in these Forums and the Author may add that feature, either as a special Attachment to a thread just for you, or published in the Plugin Store for everyone.

However, if there are that many Citations with identical content it may be better to convert to a Method 1 'splitter' Source for each such identical set of Citations. That could be automated by a Plugin.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
AdrianBruce
Megastar
Posts: 2090
Joined: 09 Aug 2003 21:02
Family Historian: V7
Location: South Cheshire
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by AdrianBruce »

Mark1834 wrote:... it might be useful to flag where Where Within Source is the same, but Text From Source is different. If it is in the same place, surely it should have the same content?? ...
Coming slightly late to this.... But I think that might depend on how you (or I) had used the Text From Source.

If I had a baptism from a PR used to justify (say) residence and parents' trade, then the Where Within Source might be the same (the reference number of the baptism entry) but the Text From Source would be different as I'd just quote the exact words, e.g. "Nantwich" for one and "Cordwainer" for the other.

Similarly a more usual (for me) 'lumping' source - a Directory, say, where Where Within Source might be just the page number but Text From Source could be just one entry from a number on that page.
Adrian
User avatar
David2416
Superstar
Posts: 398
Joined: 12 Nov 2017 16:37
Family Historian: V7
Location: Suffolk UK

Re: Using Excel VBA to convert from lumped to split sources

Post by David2416 »

Hi Mark,

I have exactly this problem importing a Gedcom from FTM 2017. Having worked with Lumped Sources, looks like I need to convert to split sources.

Would your code work in Open Office? And would you be able to share it?

Regards

David
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Hi David,

Unfortunately, Excel macro code only works in "proper" versions of Excel, so not Open Office, not LibreOffice, and even not Excel for the iPad!!

I've been away from this project for a few days as I've been busy with other things, but before I fully commit my data to FH and regard that as the master version I'm trying to set out what other changes might best be made to the data while it is still in FTM. So far, I've come up with

1. Decide which sources will need splitting in FH. Census obviously, as the most complex of the common many-to-many sources that GEDCOM can't handle properly. I've also decided to split out GRO indexes from actual certificates. At the moment, I have single lumped master sources for GRO Births, Deaths and Marriages, and FTM doesn't care whether it is a simple index citation or a more complex certificate image linked to multiple events. I think FH will best handle the certificates if they are eventually split, so I will create new lumped sources for BM&D Certificates, rearrange the citations in FTM, and import them into FH lumped with the intention of splitting later in FH once I learn how to write plug-ins.
2. I think I will also split original parish registers and probate index records, as they generally have multiple citations, but that can come later in FH.
3. Things like FamilySearch and FreeReg, which are not really sources at all, just finding aids pending obtaining the actual source, can stay lumped as we should always aim to minimise reliance on secondary material such as this.
4. Review Death, Burial and Census entries that already have associated notes. Both my macro and the actual FH import move the illegal descriptions to notes, but FH does not seem to cope very well with events that have multiple notes. Only one is visible with no immediate indication that there is more to come. Even when the first note is deleted, the second one doesn't seem to become visible until the database is reloaded. This is contrary to FH's declared objective of having as much as possible visible on screen at once. I did play with concatenating the two notes together, but FTM likes its NOTE records types, and if one is a note record you get something like "Electroplater, age 45 @N123@", which is not very helpful! I've only got a handful of such records, so I will modify those while still in FTM to avoid multiple notes in FH.

My "final" version of the macro reads the GEDCOM file into Excel, and carries out the following substitutions in sequence. A separate worksheet is created after each step so the data can be reviewed manually if required. The final data are then output as a conventional GEDCOM file.

1. Move illegal DEAT and BURI descriptions to NOTEs.
2. Delete orphan NOTE records (FTM duplicates in-line and NOTE records when exporting to GEDCOM so you end up with a pile of orphan notes).
3. Removes custom _FOOT tags that are used within FTM for formatting reports. FH does not process these on import correctly, so you end up with duplicated citations in reports.
4. Coverts FTM's use of @Mxxx@ for linked media to FH's @Oxxx@, and tidies up formatting. Adds the file name as the default media caption where no caption has been specified.
5. Process Census entries to convert from lumped to split. Create new SOUR records as required. Caveat - this is specific to my test database, where all the census data are UK 1841-1911, with no 1939 Register and no overseas census entries, so it may need modifying to be more general.

As discussed before, I recognise that steps 1-4 could all be done within FH after importing, but apart from removing source redundancy, my other objective was to tidy up the import so I can concentrate on any real issues it identifies. My log file now flags only a half dozen or so lines where action was required, rather than hundreds or even thousands.

If you are comfortable with sending me a copy of your GEDCOM file, I'm happy to run it though the macro to see what happens. Any of the intermediate steps can be readily omitted if you prefer just to concentrate on the census splitting. I can return to you the processed GEDCOM file and the intermediate files in Excel that third party programs should be able to read without any problems, as they are simple text based Excel workbooks. I will of course delete your file afterwards, and not retain any copies.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

PS - In principle, it should be possible to generalise the code, such that it reads the input GEDCOM file, presents the user with a list of all the lumped source titles (GRO Birth, Probate Calendar, etc), and asks which ones should be split. Food for thought later......
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

There are several more detailed comments I could make, but the overriding observation is about FH not making things visible.

That is a common mistake made by newcomers focussing on the Facts tab of the Property Box, which is a filtered view of the most popular data, and the Main tab is an even more filtered summary of the same data.

To see everything, use the All tab ( there is a clue in its name :) ) or the Records Window which is in effect a composite of every All tab. There for instance you can see the multiple Notes, and all UDF, and other fields not shown elsewhere, plus they can all be created, edited & deleted there too.

BTW:
Tools > Preferences > File Load/Save > Move invalid data into notes fields if possible can be unticked and the illegal descriptions get created in a dedicated _UNCAT UDF instead of a Note, and can thus be uniquely detected by a Plugin to move the text wherever required depending on whatever conditions are applicable. It would also be possible to prune the associated messages from the Log File.

You imply step 5 could not be performed by a Plugin, but I am certain it could as I have a Plugin that does a similar form of lumped to split conversion.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Thanks Mike - I was aware of the All tab of course, as it is clear on screen and frequently referred to in the documentation, but I still don't like the fact that the summary shows one note without alerting the user to the extra detail available there, and doesn't refresh after editing. So it is not readily visible. But it's only personal views on how the interface is designed (i.e. what we are used to and how we would do it if we were the designer ;) ), so there is no right or wrong answer to that one!

Which plug-in are you thinking of for splitting other source types? I'll give that a try with my GRO Certificates and Probate records, and if it does exactly what is required already there is obviously no point in developing another less generally useful way of doing the same thing.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

You may not be aware how many subsidiary fields and multiple instances of fields are possible for each fact.
Explore the All tab, right-click on every type of expanded item, and see how many Add options there are.
If all those were alerted to the user, then the Facts tab would be extremely cluttered.

I suspect when you delete the 1st of two Notes, you simply delete the text in the Note: box on the Facts tab.
That only deletes the text, leaving an empty Note field, which is only pruned when FH re-opens the Project.
In general that strategy applies to all fields, and remember File > Project Windows allows a Project to be re-opened.

On the All tab you can Delete any Note field (or any other field) directly, and immediately reveal the other Note.
While two Notes exist you can use the arrows top right to reverse them to reveal the 2nd Note and hide the 1st Note.

On the Notes tab, if you X delete the 1st Note, then the 2nd Note is immediately revealed.

BTW: All those edits can be reversed using Edit > Undo Ctrl-Z or Edit > Redo Ctrl-Y commands.

See thread Import ged from Ancestry to FH, general advice needed (15176) that among other things, deals with Event descriptions that would be moved to a Note field by default, but by changing Tools > Preferences moves it to a _UNCAT field instead. The attached Plugin moves those _UNCAT fields to various fields (Note, Cause, Address) depending on the type of Event (Census, Death, other). That should resolve your Event description Note problem, and is easily adapted for other cases if necessary.

Now onto Plugins for Sources, etc.
See Changing a source citation (14701) which refers to the Plugin Store Replace Selected Source Citations and a local Attachment extension to that Plugin to offer multiple replacements.
I have a related work-in-progress Separate BMD Sources Plugin that separates a lumped BMD Source into split BMD Sources but requires the names of the relevant Source records to be edited into the Plugin. See Attachment below.
Neither of these Plugins resolve your specific scenario, but demonstrate that it is feasible.
They could be adapted to cope with generic scenarios by prompting the user for their specific Source details.
One feature that none of the above offer is to transfer Where with Source and Text From Source fields from the Citation into the split Source record, and use those fields to identify replicated Citations.
Last edited by tatewise on 31 Jan 2024 18:53, edited 1 time in total.
Reason: Attachment deleted - contact Mike Tate if needed
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
johnmorrisoniom
Megastar
Posts: 901
Joined: 18 Dec 2008 07:40
Family Historian: V7
Location: Isle of Man

Re: Using Excel VBA to convert from lumped to split sources

Post by johnmorrisoniom »

I was the original user of Mikes "Seperate BMD Sources" plugin, and found it extremely easy to use, although slight tweaking was required.
User avatar
David2416
Superstar
Posts: 398
Joined: 12 Nov 2017 16:37
Family Historian: V7
Location: Suffolk UK

Re: Using Excel VBA to convert from lumped to split sources

Post by David2416 »

Hi Mark,

Thanks for your reply and kind offer to process my Gedcom. I won't take you up on this at the moment as I really need to work out what I want to achieve. The main problem I see with the lumped source is how in Gedcom it leads to multiple copies of the citation data.

There's quite a lot in this thread I need to read and understand before going any further.

Regards

David
User avatar
Mark1834
Megastar
Posts: 2458
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Using Excel VBA to convert from lumped to split sources

Post by Mark1834 »

Indeed - as somebody who's only a couple of pages ahead of you, my impression so far is that FH is strong on GEDCOM compliance, flexibility and transparency of its data structure, but that adherence to GEDCOM comes at a cost, as GEDCOM is not a proper database language so does not handle duplication and redundancy well. Fine if you are just starting out, but if you have a lot of data derived from another application that has different priorities (FTM is weaker on flexibility and transparency, but adheres more closely to traditional database concepts), moving from one to the other can cause problems.

Most of this thread is really about what is the preferred sticking plaster to reconcile these different approaches (not better or worse, that's subjective - just different :) ).
Mark Draper
User avatar
tatewise
Megastar
Posts: 28341
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Using Excel VBA to convert from lumped to split sources

Post by tatewise »

You are absolutely correct.
One of the FH unique selling points is its GEDCOM Release 5.5 100% compliance and use of the GEDCOM file as its database.
BTW: Another unique selling points is its dynamic Diagrams from which virtually all updates can be performed.

To be fair, the only significant area where duplication and redundancy may not be well handled is multiple Citations.
On the other hand, having a separate Citation for each Fact to Source relationship sometimes has advantages.
It means you can set the Assessment confidence independently for each Fact derived from one Source record.
You can also explain & justify in the Citation where & how a particular Fact is derived from a particular Source.
Those details cannot be easily recorded in either the Fact Note nor the Source record Note without having to identify which Fact to Source relationship is being discussed.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Post Reply