Page 1 of 1

Random Spaces introduced into TEXT imported from FTM

Posted: 14 Nov 2017 17:31
by David2416
Hi There,

Have imported a gedcom produced by FTM 2017 build 1343 into FM 6.2.5

The import worked well, however there are random spaces introduced into the TEXT lines.

Any thoughts on how to fix this?

Also having lumped sources need to consider how to split these.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 14 Nov 2017 18:25
by tatewise
Welcome to the FHUG David.
As a newcomer study how_to:key_features_for_newcomers|> Key Features for Newcomers and how_to:index#importing_to_family_historian|> Importing to Family Historian and in particular how_to:import_from_family_tree_maker|> Import from Family Tree Maker (FTM) especially in the Quick Fix Plugins section the Check for word-wrapping errors window. That should fix the random spaces.
[ EDIT: Above changed from TMG to FTM - sorry misread Subject line :oops: ]

Regarding conversion of Method 2 'lumped' to Method 1 'split' Sources, see Using Excel VBA to convert from lumped to split sources (15390) and maybe Mark can lend you his VBA script for FTM Gedcom.
OR
As discussed in that thread, a Plugin needs to be written, as the number of TMG migrants needing that conversion is growing.

After reading that other thread, and noted all its advice, could you be specific about your requirements.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 14 Nov 2017 19:46
by David2416
Hi Mike,

Thank you for your welcome. I had already studied the knowledge base item and post you suggested, before posting.

The check word wrap dialog did not appear when I imported the Gedcom from FTM.

Regarding the lumped/split source issue I really meant to echo what Mark had posted; haven't decided how to approach the problem as yet.

I am convinced that FH is a superior product.

Regards

David

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 14 Nov 2017 20:00
by tatewise
Did you import using File > Project Window > New Project > Import a GEDCOM file or just opened the Gedcom file?
I am certain that the former mode cannot bypass the Check for word-wrapping errors window.

If you study how_to:key_features_for_newcomers|> Key Features for Newcomers under Project Structure & Location it explains the major difference between a Project and a standalone Gedcom.

Migrants from various products adopt various Source review methods, and not necessarily the same for the whole Project.
Usually a mixture of manual reviews using Ancestral Sources and some automatic conversions is adopted.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 15 Nov 2017 09:10
by David2416
Hi Mike,

Thank you for your continued support. I did use the File > Project Window > New Project > Import a GEDCOM file method to import the gedcom. The Check for word-wrapping errors window definitely did not appear as I expected it to. Is this a change in FH 6.2.5 I wonder?

The spaces are not random as I stated, but appear when a TEXT line is followed by a CONC line . I imagine the word wrapping dialog would address this?

I could manually edit the FTM output Gedcom but that would be a pain. I reckon I need a script to remove "<LF>% CONC"

Re the lumped/split sources will most likely use a mixed response and perhaps amend later/

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 15 Nov 2017 11:07
by tatewise
Yes, I have run some experiments, and that Check for word-wrapping errors dialogue no longer appears.
Thus, if the CONC concatenation tag lines have extraneous space characters they are mistakenly retained.
You are correct in thinking that dialogue is designed to resolve such issues.

Further experiments suggest the missing dialogue only occurs with FTM GEDCOM files.
After editing the GEDCOM file with Windows Notepad such that header lines were as follows, the dialogue returned.
( See below for how to use Notepad. )
0 HEAD
1 SOUR OTHER
1 DATE 15 NOV 2017
1 CHAR UTF-8
1 SUBM @SUBM@
1 GEDC
2 VERS 5.5
2 FORM LINEAGE-LINKED

I shall report the missing dialogue to Calico Pie as that must be an error.

The rules for the CONC concatenation tag lines are well defined in the GEDCOM specification, but not always followed.
The line break must always be within a word and not the space between words.
There must be no trailing space on preceding line.
There must be just one space between CONC and the following text.
e.g
3 PAGE He never said why he took this step and it came as a surp
4 CONC rise that it was so popular.

I suspect the extraneous spaces are following CONC.
So use a plain text editor such as Windows Notepad to open the GEDCOM file.
You must change the file types filter bottom right from Text Documents (*.txt) to All Files (*.*) to reveal .ged files.
Perform Edit > Replace with Find what: CONC with 2 spaces after and Replace with: CONC with 1 space after.
Tick Match case and click Replace All.
Finally File > Save.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 15 Nov 2017 17:12
by David2416
Hi Mike,

Thanks again, here is a cut and paste of a line that gives rise to an additional space. I believe it's not a double space that's the problem but a non-printing character.

4 TEXT MUNROE James Allen (27) bachelor married ADAMS Helen Margaret (23) spins
5 CONC ter by licence. Copy of original in Helen's possession

There is only one space I have tried various methods for concatenating the lines, including Notepad++, but unfortunately the non printing <LF> or is it a <LF><CR> aren't included in the Find/Replace. However I have found UltraEdit (trial version) does the job as it can remove the hex 0a 0d So in effect I replaced "<LF><CR>5 CONC " wit h" ", repeating this for 2 CONC

Regards David

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 15 Nov 2017 20:38
by tatewise
Sorry, but I am not sure about the exact details of the problem or the cure.

Are you saying the resultant text after import has spaces in the middle of spinster and if so how many?
MUNROE James Allen (27) bachelor married ADAMS Helen Margaret (23) spins ter by licence. Copy of original in Helen's possession

Is there a space character on the end of the 4 TEXT MUNROE James Allen. . . line after spins?

Furthermore, if you replace "<LF><CR>5 CONC " with a single space the GEDCOM will be:
4 TEXT MUNROE James Allen (27) bachelor married ADAMS Helen Margaret (23) spins ter by licence. Copy of original in Helen's possession
and that still has a space in the middle of spinster.

Please try my suggested change to the first few lines of the GEDCOM file, then the dialogue should return and fix the problem without any other editing.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 16 Nov 2017 10:58
by David2416
Hi Mike,

I am sorry, I have created confusion; which started by saying 'Random Spaces' - I should have said 'Unwanted Spaces'.

To restate the issue:

I created a Gedcom from my project file FTM 2017 build 1343. This included lines such as:

4 TEXT CARROT Jasper (32) bachelor married NIXON formerly known a
5 CONC s MEREDITH Rowena (28) spinster by certificate. Original in my pos
5 CONC session


There are non-printable characters at the end of each line <cr><lf> I have establish this by looking at the hex code which shows 0D 0A (previously I indicated <lf><cr>)

I imported the Gedcom using FH 6.2.5 File > Project Window > New Project > Import a GEDCOM file No Check for word-wrapping errors dialog was encountered. You reported it's absence as an issue.

Looking at the imported Gedcom the above lines have been concatenated as

4 TEXT CARROT Jasper (32) bachelor married NIXON formerly known a s MEREDITH Rowena (28) spinster by certificate. Original in my pos session

As can be seen there are single unwanted spaces in 'a s' and 'pos ssesion'

After various attempts and with your kind advice I have worked around this issue by using UltraEdit (Trial Version) to remove

'<lf><cr>5 CONC ' (note one trailing space) by replacing this phrase with an empty string '' (with no space)

Having edited the Gedcom in this way the line is now imported as

4 TEXT CARROT Jasper (32) bachelor married NIXON formerly known as MEREDITH Rowena (28) spinster by certificate. Original in my possession which is entirely satisfactory.

Hopefully this clarifies my previous clunky attempts at describing the issue and the solution.

I have continued the discussion on lumped/split sources elsewhere. Best to stick to one thing at a time I feel.

Thanks again for your input which has helped clarify my thinking and to find a solution.

Regards David

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 16 Nov 2017 12:02
by tatewise
David, the problem revolves around the way FTM handles the CONCatenation tag.
The <cr><lf> 0D 0A are the standard line terminators that end every line in a plain text file such as GEDCOM.
They are NOT the real cause of the problem as the CONCatenation algorithm discards them.

Previous versions of FTM violated the CONC rules and split lines on word boundaries.
It seems that FH recognises FTM GEDCOM files and automatically deals with the CONC lines assuming the lines are split on word boundaries and inhibits the Check for word-wrapping errors dialogue.
Thus FH ensures a space character appears in the CONCatenated text where the line is split, which is exactly as you describe.

However, the latest FTM 2017 as reported in Import Gedcom from a Mac? (15427) today is now "Exporting concatenation tags correctly". The lines are no longer split on word boundaries, but within words, which is compliant with the GEDCOM rules. That is exactly as you have illustrated. BUT FH does not know that yet and is still inserting a space at each line split.

FOR THE THIRD TIME OF ASKING
Please simply edit the first few lines of the GEDCOM file to remove reference to FTM as illustrated previously.
That will force the Check for word-wrapping errors dialogue and allow the correct CONCatenation mode to be chosen.

I am in discussion with Calico Pie about this issue, which has arisen due to FTM 2017 correcting its CONC implementation.
The workaround is to disguise the FTM origin, and I will amend the how_to:import_from_family_tree_maker|> Import from Family Tree Maker (FTM) advice ASAP.

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 16 Nov 2017 12:31
by David2416
Hi Mike,

FINALLY I have got the message. Your workaround works a treat. Thank you.

Regards

David

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 17 Nov 2017 09:13
by David2416
Hi,

I just revisited the advice for importing from FTM Import from Family Tree Maker (FTM) in the knowledge base.

It's brilliant that this has been updated to take account of changes based on FTM 2017. This is a great encouragement to using FH.


Thanks

David

Re: Random Spaces introduced into TEXT imported from FTM

Posted: 17 Nov 2017 13:31
by tatewise
Calico Pie acknowledge the problem is exactly as described here and will enable the Check for word-wrapping errors dialogue for FTM imports in a future release of FH. They seem to approve the workaround of editing the GEDCOM HEADer.