* Error when renaming file with accented characters
Error when renaming file with accented characters
I have a plugin that changes the names of the media linked to a selection of sources to the auto title of the source.
Works very well but not with accented characters. As soon as ë or é etc. are in the title os.rename throws an error.
Correcting it manually is no problem (the filename, media name and auto tile all support accented characters).
Any suggestions on how to fix this are very welcome.
Works very well but not with accented characters. As soon as ë or é etc. are in the title os.rename throws an error.
Correcting it manually is no problem (the filename, media name and auto tile all support accented characters).
Any suggestions on how to fix this are very welcome.
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
See https://stackoverflow.com/questions/541 ... unicode-ch for some ideas.
The winapi library would be the best solution, but it's not been made available by Calico Pie (and when I asked some years ago, they were unwilling to do so. If however, you're writing for your own personal use you could consider it: https://stackoverflow.com/a/36719757/1943174
The winapi library would be the best solution, but it's not been made available by Calico Pie (and when I asked some years ago, they were unwilling to do so. If however, you're writing for your own personal use you could consider it: https://stackoverflow.com/a/36719757/1943174
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
Yes, that is a tricky issue and is only likely to get worse as more accented characters get used in filenames.
I assume that your Plugin File > Encoding is UTF-8
The problem is that Lua file handling tools like os.rename and the lfs library only understand ANSI.
So, use the fhConvertUTF8toANSI(...) function on the filename before supplying to the os.rename function.
If the names include UTF-8 characters outside the ANSI set then Helen's solution is an option and another is:
https://github.com/cloudwu/luawinfile luawinfile has UTF-8 filename alternatives for lfs and Lua functions.
I raised that with CP last June/July under Unicode in Folder/File Paths in Plugins log #138290.
I assume that your Plugin File > Encoding is UTF-8
The problem is that Lua file handling tools like os.rename and the lfs library only understand ANSI.
So, use the fhConvertUTF8toANSI(...) function on the filename before supplying to the os.rename function.
If the names include UTF-8 characters outside the ANSI set then Helen's solution is an option and another is:
https://github.com/cloudwu/luawinfile luawinfile has UTF-8 filename alternatives for lfs and Lua functions.
I raised that with CP last June/July under Unicode in Folder/File Paths in Plugins log #138290.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Error when renaming file with accented characters
Indeed tricky.
I now have os.rename working with Helen's solution (https://stackoverflow.com/a/36719757/1943174) as well as Mike's (fhConvertUTF8toANSI). As far as I can see both conversion results are the same.
The tricky thing is that I need to store the filename in the media record without the ANSI conversion and when using it to test if the file exist (os.open) I need to convert it before the test. Saving the ANSI filename in the media record doesn't work and will also break the link. Don't really understand why. But it works and that's something.
Thanks Helen and Mike.
I now have os.rename working with Helen's solution (https://stackoverflow.com/a/36719757/1943174) as well as Mike's (fhConvertUTF8toANSI). As far as I can see both conversion results are the same.
The tricky thing is that I need to store the filename in the media record without the ANSI conversion and when using it to test if the file exist (os.open) I need to convert it before the test. Saving the ANSI filename in the media record doesn't work and will also break the link. Don't really understand why. But it works and that's something.
Thanks Helen and Mike.
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
The explanation is entirely down to the character encoding.
The binary code used to represent characters depends on the character encoding.
ASCII encoding defines 7-bit numbers for the popular characters 0-9, A-Z, a-z, and most symbols on a keyboard.
Most other encodings include these initial 128 characters.
With ANSI encoding, the character encoding increases to an 8-bit number, i.e. one byte.
The extra 128 characters provide some accented letters and extra symbols.
However, different countries have variants that substitute different characters.
Unicode with its UTF-8 and UTF-16 encodings is a worldwide standard for characters from every language and a vast array of symbols but requires more bits to represent them.
So the binary codes for ë and é in ANSI are different from their codes in UTF-8.
Internally in all its records including Media FH uses UTF-8.
Lua functions such as os.rename require ANSI codes.
So in Plugins, you must convert from one to the other.
But even that will fail if the Media record has filename UTF-8 characters that are not supported by ANSI because no conversion is possible and a ? will get substituted.
The binary code used to represent characters depends on the character encoding.
ASCII encoding defines 7-bit numbers for the popular characters 0-9, A-Z, a-z, and most symbols on a keyboard.
Most other encodings include these initial 128 characters.
With ANSI encoding, the character encoding increases to an 8-bit number, i.e. one byte.
The extra 128 characters provide some accented letters and extra symbols.
However, different countries have variants that substitute different characters.
Unicode with its UTF-8 and UTF-16 encodings is a worldwide standard for characters from every language and a vast array of symbols but requires more bits to represent them.
So the binary codes for ë and é in ANSI are different from their codes in UTF-8.
Internally in all its records including Media FH uses UTF-8.
Lua functions such as os.rename require ANSI codes.
So in Plugins, you must convert from one to the other.
But even that will fail if the Media record has filename UTF-8 characters that are not supported by ANSI because no conversion is possible and a ? will get substituted.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Error when renaming file with accented characters
Clear explanation.
Thanks Mike!
Thanks Mike!
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
I have been looking at http://stevedonovan.github.io/winapi/api.html.
Exactly how did you implement that winapi solution and what functions did you utilise?
Exactly how did you implement that winapi solution and what functions did you utilise?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Error when renaming file with accented characters
Mike,
I didn't use the api, but used the conversion table in the stackoverflow example starting with: utf8_to_cp1252 = (
function(cp1252_description)....
That gives me the same results as using fhConvertUTF8toANSI.
Yesterday I've published the updated version (1.2) of Rename Selected Source Media in the plugin store. Both approached are in the plugin code, but it uses fhConvertUTF8toANSI, the other one is left in there for reference only.
I didn't use the api, but used the conversion table in the stackoverflow example starting with: utf8_to_cp1252 = (
function(cp1252_description)....
That gives me the same results as using fhConvertUTF8toANSI.
Yesterday I've published the updated version (1.2) of Rename Selected Source Media in the plugin store. Both approached are in the plugin code, but it uses fhConvertUTF8toANSI, the other one is left in there for reference only.
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
Understood, but your Plugin needs to warn users that where any Unicode (UTF-8) character in the Source Title does not translate to ANSI Code Page 1252 then a question mark (?) will be substituted in the filename.
If you are interested in eliminating that exception then request CP to support either the luawinfile or winapi library.
My preference is for the luawinfile library that provides substitutes for the Lua os library and lfs module.
If you are interested in eliminating that exception then request CP to support either the luawinfile or winapi library.
My preference is for the luawinfile library that provides substitutes for the Lua os library and lfs module.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
Mike, luawinfile isn't available in Luarocks, doesn't document it's dependencies and there only appears to be a c module available? Have you found a dll anywhere, and documented dependencies? Without those, it's a non-starter I believe. If you've used it, do you have a dll and where did you get it from?
winapi is in Luarocks (so can be compiled) and the package documents its dependencies. It has a wider set of functions, including UTF-16 support, as well as registry access functionality, process and window handling, and some others.
I'd be willing to put in the work to get winapi supported, but there's no starting point for luawinfile... And just asking for a library without doing the prep work is not going to get any priority attention.
winapi is in Luarocks (so can be compiled) and the package documents its dependencies. It has a wider set of functions, including UTF-16 support, as well as registry access functionality, process and window handling, and some others.
I'd be willing to put in the work to get winapi supported, but there's no starting point for luawinfile... And just asking for a library without doing the prep work is not going to get any priority attention.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
It seems luawinfile has sunk without trace. The links I had no longer work. I never had any dll. So winapi is the solution.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Error when renaming file with accented characters
@Mike I'll put the warning in the plugin and also check for when it happens.
@Helen/@Mike Can I help to get winapi supported by CP and if so how?
@Helen/@Mike Can I help to get winapi supported by CP and if so how?
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
JoopvB, I'll get the library compiled and post it here for testing, if you (and maybe Mike) would be willing to test it. Once we know it works, I can raise a request with Calico Pie and mention you (and Mike) as supporting it, and we'll see what they say.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
I can't attach the file because it's a dll, but here's a zip:
It needs to be dropped into C:\Program Files (x86)\Family Historian\Program\Lua and required by
Documentation at http://stevedonovan.github.io/winapi/api.html
UTF8 filenames: refer to the short_path function; there's an example piece of code using it.
It needs to be dropped into C:\Program Files (x86)\Family Historian\Program\Lua and required by
Code: Select all
require('winapi')UTF8 filenames: refer to the short_path function; there's an example piece of code using it.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: Error when renaming file with accented characters
Can't get it from Dropbox either; zip will do.
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
It hasn't finished synching. Try the zip attached.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
I have extracted and copied winapi.dll to
C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll Created: 27 Feb 2021, 17:37:44 Size: 41.0 KB (41,984 bytes)
It has the same Security Permissions as other similar dll.
But require('winapi') says:
An error has occurred - plugin failed to complete
error loading module 'winapi' from file 'C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll':
The specified module could not be found.
Does it need something else like winapi.lua or a winapi folder?
C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll Created: 27 Feb 2021, 17:37:44 Size: 41.0 KB (41,984 bytes)
It has the same Security Permissions as other similar dll.
But require('winapi') says:
An error has occurred - plugin failed to complete
error loading module 'winapi' from file 'C:\Program Files (x86)\Family Historian\Program\Lua\winapi.dll':
The specified module could not be found.
Does it need something else like winapi.lua or a winapi folder?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: Error when renaming file with accented characters
what size is your winapi.dll?
it needs static linking with some other dlls not dynamic linking
some I remember are:
msvcrt.dll (must be the same version)
libgcc_s_dw2.dll
lua5.1-32.dll
it has been so long since I read about it, it might need some googling.?
it needs static linking with some other dlls not dynamic linking
some I remember are:
msvcrt.dll (must be the same version)
libgcc_s_dw2.dll
lua5.1-32.dll
it has been so long since I read about it, it might need some googling.?
FH V.6.2.7 Win 10 64 bit
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
I've had it working in the past. I'll have a look at the problem tomorrow
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
OK, or rather, not OK.
It doesn't look as if the author of winapi (Steve Donovan) has updated it for Lua 5.3.
Steve is also the original author of the Penlight libraries, for which maintenance seems has been taken over by Thijs Schreijer/Tieske, so I'm not confident that winapi will be updated (last updates were in 2014, before 5.3 was released). Steve's attention seems to have transferred to Rust (not the metallic kind).
There are a load of forks for winapi, but none by names I recognise or trust. I can also find a binary (dll) which purports to be Lua 5.3 compatible, but that doesn't work either.
We may be at a dead end -- unless we have a c programmer amongst us who is willing to fork the original code and update it for 5.3...
Which is a pity, because it's an incredibly useful library.
It doesn't look as if the author of winapi (Steve Donovan) has updated it for Lua 5.3.
Steve is also the original author of the Penlight libraries, for which maintenance seems has been taken over by Thijs Schreijer/Tieske, so I'm not confident that winapi will be updated (last updates were in 2014, before 5.3 was released). Steve's attention seems to have transferred to Rust (not the metallic kind).
There are a load of forks for winapi, but none by names I recognise or trust. I can also find a binary (dll) which purports to be Lua 5.3 compatible, but that doesn't work either.
We may be at a dead end -- unless we have a c programmer amongst us who is willing to fork the original code and update it for 5.3...
Which is a pity, because it's an incredibly useful library.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: Error when renaming file with accented characters
Helen, thanks for looking into it. Would have been nice, but fhConvertUTF8toANSI will do for most accented characters.
I think I'll change the plugin to replace "invalid" characters with '~' and report it in the result set.
I think I'll change the plugin to replace "invalid" characters with '~' and report it in the result set.
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
It seems that being able to support UTF-8 filenames in Lua is fated to fail. That is a shame and surprising.
Am I correct in thinking the winapi.dll should work with FH V6.2?
Replacing ? with ~ is necessary because ? is not an allowed character in Windows filenames.
Presumably, you are doing the same for all the other disallowed characters \ / : * " < > |
Am I correct in thinking the winapi.dll should work with FH V6.2?
Replacing ? with ~ is necessary because ? is not an allowed character in Windows filenames.
Presumably, you are doing the same for all the other disallowed characters \ / : * " < > |
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 4853
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: Error when renaming file with accented characters
Winapi does work with fh6. 2 or at least it did some time ago. I used it to access utf16 files and also to do window handling when fh hid popup windows behind the main program window. Only for private use of course
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: Error when renaming file with accented characters
@Mike
When using fhConvertUTF8toANSI it converts the characters in the range (0-255) as expected and the plugin reports the conversion. For characters outside this range fhConvertUTF8toANSI will convert to something as close as possible (according to the help; I tested it and it's doing a nice job). The plugin reports the conversion and I've put in a warning as well.
When using fhConvertUTF8toANSI it converts the characters in the range (0-255) as expected and the plugin reports the conversion. For characters outside this range fhConvertUTF8toANSI will convert to something as close as possible (according to the help; I tested it and it's doing a nice job). The plugin reports the conversion and I've put in a warning as well.
- tatewise
- Megastar
- Posts: 27080
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: Error when renaming file with accented characters
I presume by "characters in the range (0-255)" you mean the UTF-8 characters derived from ANSI codes 0 - 255.
Only the codes 0 - 127 represent the same characters in both UTF-8 and ANSI.
The ANSI characters with codes 128 - 255 have completely different codes in UTF-8.
Yes, the fhConvertUTF8toANSI function does its best to convert UTF-8 characters to similar ANSI characters.
Except for exact ANSI equivalents, all forms of accented A become plain A, all forms of accented b become plain b, and so on.
But there are plenty of UTF-8 characters where there is no such obvious mapping and they do become ?.
It is necessary to convert all disallowed filename characters \ / : * ? " < > | to a valid character such as ~.
Only the codes 0 - 127 represent the same characters in both UTF-8 and ANSI.
The ANSI characters with codes 128 - 255 have completely different codes in UTF-8.
Yes, the fhConvertUTF8toANSI function does its best to convert UTF-8 characters to similar ANSI characters.
Except for exact ANSI equivalents, all forms of accented A become plain A, all forms of accented b become plain b, and so on.
But there are plenty of UTF-8 characters where there is no such obvious mapping and they do become ?.
It is necessary to convert all disallowed filename characters \ / : * ? " < > | to a valid character such as ~.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry