* Check For Possible Duplicate Media (FH7) plugin - first prototype

For users to report plugin bugs and request plugin enhancements; and for authors to test new/new versions of plugins, and to discuss plugin development (in the Programming Technicalities sub-forum). If you want advice on choosing or using a plugin, please ask in General Usage or an appropriate sub-forum.
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by tatewise »

johnmorrisoniom wrote: 05 Jan 2024 11:11 The new Check for Duplicate Media functionality would do away with the need to run the unlinked media plugin.
That does not explain why you need Check for Duplicate Media to move identical files instead of deleting them.
They will not only have identical content but also have identical filenames apart from the (n) suffix and the merged Media records were identical too. It is mainly intended to deal with the import bug that is now fixed.

Also, Check for Duplicate Media does not intend to auto-merge duplicates where the Media records differ.
So those will need to use File > Compare/Merge Records... manually and the Check for Unlinked Media plugin is still needed to clean up any media files that are unlinked as a result of the merging.

The important point is that Check for Duplicate Media only involves duplicate files which are probably a mistake and redundant, whereas Check for Unlinked Media does not know whether files are duplicates or not.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
jelv
Megastar
Posts: 611
Joined: 03 Feb 2020 22:57
Family Historian: V7
Location: Mere, Wiltshire

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by jelv »

@johnmorrisoniom

Given that there is already a plugin that deals with unlinked files I fail to see the point of adding that functionality to this plugin - far better that Mark concentrates his attention on getting the existing functionality fully implemented, tested and the plugin released.

In any case we are are talking about a process that most people will only do very infrequently (or probably just once only), so again I can't see the point of enhancing the process which will only save minutes over the time taken to run the other plugin. If you think that you will need to run this fairly often it points to a serious defect in your method of working!
John Elvin
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

I have deliberately put this to one side for a while to see if any other issues emerge from the 20-odd downloads so far. Nothing else has come up, so I will go ahead with the final minor changes discussed above later this week.

However, I have discovered a flaw in the way the fork plugin manages merging media records linked from a Rich Text note. If a note contains links to both the master record and the duplicate, after merging the note then has two different link IDs pointing to the same record. FH doesn't object to that immediately, but if the note is subsequently edited, it removes the duplication, breaking one of the links.

I've played with editing the GEDCOM file directly to simulate plugin changes, and I think the only way around this is to edit the actual note to change the link references from the duplicate record to the master one to keep things in sync. It's extra code to cope with a special case, but are there any suggestions for a simpler alternative?
Mark Draper
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by tatewise »

Well spotted! I think the link ID change you propose is the only option.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

Thanks - I can see in principle how to do it, so I’ll play later…
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

Unfortunately, I don’t think it is possible. Any attempt to edit existing Rich Text directly seems to blow away the associated link table, probably because of potential inconsistencies. Rather than get involved in potentially complex workarounds for a rare circumstance, the easiest option will be to flag instances where both the master record and its duplicate are present in the same note (they are easy enough to detect) and leave it to the user to resolve.
Mark Draper
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by tatewise »

In my Search and Replace plugin I similarly found editing Rich Text required the LinkId to be managed.
The following script is what I used. Does that help?

Code: Select all

local isRichText = false				-- Detect plain text v rich text
local arrLink = {}
local ptrLink = fhNewItemPtr()
ptrLink:MoveToFirstChildItem(ptrItem)			-- Save any record link details
while ptrLink:IsNotNull() do
	local strLink = fhGetTag(ptrLink)
	if strLink:match("^_LINK_%u$") then
		local intLink = fhGetValueAsInteger(fhGetItemPtr(ptrLink,'~._LKID'))
		arrLink[intLink] = { Tag=strLink; Ptr=fhGetValueAsLink(ptrLink); }
	end
	isRichText = true				-- Subsidiary _LINK_ or _FMT 1 tag found	
	ptrLink:MoveNext()
end
if isRichText then					-- Update richtext field
	isOK = fhSetValueAsRichText(ptrItem,fhNewRichText(strNewVal))
else
	isOK = fhSetValueAsText(ptrItem,strNewVal)	-- Update plain text field
end
for intLink, dicLink in pairs(arrLink) do
	ptrLink = fhCreateItem(dicLink.Tag,ptrItem)	-- Reconstruct any record link details
	if ptrLink:IsNotNull() then
		fhSetValueAsLink(ptrLink,dicLink.Ptr)
		ptrLink = fhCreateItem('_LKID',ptrLink)
		if ptrLink:IsNotNull() then
			fhSetValueAsInteger(ptrLink,intLink)
		end
	end
end
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

I don’t think it does, unfortunately. The issue here seems to be a mismatch between the _LKID value and the actual text.

It’s not worth delaying the plugin trying to fix an obscure issue like this - it can always be added later if I come up with a fix.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

Came back to it this evening with a fresh brain, and all now sorted. It’s a rather convoluted process, but none of the steps is overly complex: save the link table associated with the Rich Text object as a table of record ID => link ID values - determine the link ID values corresponding to the master Media record and its duplicate - edit the Rich Text to change the link values from duplicate to master (which blows away the Rich Text link table) - clear the now redundant duplicate link ID from the link table - recreate and populate the Rich Text link table as a series of _LINK_O and _LKID values.

All that to do what’s a single line of code for regular record links or Rich Text where the master Media record isn’t also referenced… :o
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

Version 0.4 is attached, with hopefully the final amendments before uploading to the Plugin Store:
  • Tweaking the column widths didn't gain any useful benefit in overall report width and just made the output look cluttered and difficult to read, so I have added an option for an alternative basic report that just displays the duplicate records without the detailed file information. This may be more suitable for users with relatively small displays.
  • Record merging now includes any link from Rich Text as well as more conventional record links.
  • The legacy issue with duplicated file names on import is addressed by optionally including these in the merge process. Rather than merge all record pairs where the only difference is the file name (which may merge distinct records, as discussed above), it is restricted to Media folder files that differ by a numerical suffix in parentheses.
  • Picking up the discussion on what to do with any redundant files, I have chosen to do nothing, as any further action (moving, deleting, renaming, etc) can be achieved by other means.
  • If anybody does try to run a plugin called Check For Possible Duplicate Media (FH7) in FH6, it will now tell the user what they are doing wrong rather than just curl up and die.
Mark Draper
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by BillH »

I ran the latest version of the plugin and have a lot of potential duplicates. Here is one example.

image.png
image.png (8.95 KiB) Viewed 1460 times

I'm not sure why these would be considered duplicates. The records have different names. The file names are different and they are located in different folders. Help me understand why these would be reported.

Thanks,
Bill
Bill Henshaw
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

The files have different names, but is one a copy of the other that you have renamed? Each row in the output represents a pair of records where the media files have the same content, irrespective of file name.

Does Jane’s original plugin find the same pairs?
Mark Draper
User avatar
BillH
Megastar
Posts: 2257
Joined: 31 May 2010 03:40
Family Historian: V7
Location: Washington State, USA

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by BillH »

Thanks for the explanation Mark.

I always store images of census pages in a folder for the head of the household. The file name has the head of household's name in it. If there is more than one household on the same census page then I save it twice, one in each head of households folder. Hence multiple files with the same contents.

I hadn't run Jane's version in years. I was testing yours just because I was curious. I ran the old version just now and it gave 68 duplicates, the same as yours. I didn't check all of them, but the ones I checked were reported by both plugins.

Thanks,
Bill
Bill Henshaw
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

The new version has had a good number of downloads with no new issues raised, so I think it’s ready for publication.

This new plugin replaces the Check for Possible Duplicated Media fork (which has had only three downloads in nearly two months, including me), so I’ll wait for Mike to delete the redundant plugin before submitting.

The only changes will be to complete the online help file and remove the expiry date (all the prototypes stop working at the end of February to prevent continued use of drafts that could have significant flaws).
Mark Draper
User avatar
tatewise
Megastar
Posts: 28436
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by tatewise »

I have deleted the Check for Possible Duplicated Media fork.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

Thanks Mike. The new plugin is now available in the Plugin Store.

I’ve had to delete the help file due to the long-standing but uncorrected store bug where it cannot cope with a help file published before the plugin it relates to, but I’ll reload an updated version when I’m back at the desk this evening.
Mark Draper
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

The help file is now updated, and correctly linked to the plugin. It is accessible either directly from the plugin or via the Plugin Store list of help pages. However, it is only visible from the actual plugin page when I'm logged into the store with my author account. Without this, it still shows as no help available and does not display the download count.

This usually corrects itself after a day or two, but I'm never sure whether that's because store admin do something manually or it's an automatic process that runs periodically. I suspect CP don't know either, as it's a long-standing issue that hasn't been fixed (and I get told off here if I report it again :)).
Mark Draper
avatar
wellera
Diamond
Posts: 50
Joined: 07 Mar 2011 09:09
Family Historian: V7
Location: Bristol, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by wellera »

Hi

I've arrived late to the party on this interesting thread....

I just tried the new Plugin (thanks Mike for all your work), but it says it only works on FH Projects. My work isn't in a Project structure (I appreciate folks will have opinions on whether it should be).

Could the Plugin be tweaked to run without this restriction?

Thanks again
User avatar
Mark1834
Megastar
Posts: 2519
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by Mark1834 »

As they say on the BBC, other plugin authors are available… :)

I currently use the project file as a convenient tool for optimising how the plugin runs under WINE/Crossover, but if there is no fundamental reason why it can’t be run on a standalone GEDCOM file (and I can’t think of one off the top of my head), I can probably modify that in the next update.

That may not be for a while, as I’ll give it time to see if any other issues arise. If you want to try the plugin in the meantime, the simplest option would be to just create a temporary project from a copy of your GEDCOM file. It would at least tell you whether there are any issues to be addressed.
Mark Draper
avatar
wellera
Diamond
Posts: 50
Joined: 07 Mar 2011 09:09
Family Historian: V7
Location: Bristol, UK

Re: Check For Possible Duplicate Media (FH7) plugin - first prototype

Post by wellera »

Hi Mark

Thanks for the suggestion - I created a temporary Project using my existing GED file.

The plugin ran perfectly and picked up a few duplicates, which I checked and confirmed.

A really useful plugin, thanks!

Andrew
Post Reply