* text wrap
text wrap
I am trying to conceive of an algorithm of efficient text wrapping.
say I have a string of text 2048 characters long and want to chop it into 80 char lengths.
function WRAP(str, plen)
--plen must be the maximum length to split lines at
local wrap = {}
slen = #tostring(str)
llen = math.floor(slen/plen +.5) -- (this often will be enough with one line left over) not even sure I need it.
here is the tricky part, that I need help with
txtstr = 1
while txtstr < slen do
txt = str:sub(txtstr, plen)
--[[characters to split on are ' ' = space, '.' = a period, ',' = comma, ':' = colon, '-'= hyphen , '?'= question maybe more]]
here is the tricky part, that I need help with:
from the reverse find the first occurrance of any of those characters ([' . ,:;-?']) -- capture any of these(not sure how to write that), get index
fnd = that index in the string.
table.insert(wrap, txt:sub(txtstr, fnd))
txtstr= txtstr - (fnd - plen) -- adjust next start to what you didnt take to insert.
end
Issues are I am unable to code it.
biggest issue is someone is already doing this very elegantly and efficiently, and I dont know how to find the code.
thanks, for any help on this.
say I have a string of text 2048 characters long and want to chop it into 80 char lengths.
function WRAP(str, plen)
--plen must be the maximum length to split lines at
local wrap = {}
slen = #tostring(str)
llen = math.floor(slen/plen +.5) -- (this often will be enough with one line left over) not even sure I need it.
here is the tricky part, that I need help with
txtstr = 1
while txtstr < slen do
txt = str:sub(txtstr, plen)
--[[characters to split on are ' ' = space, '.' = a period, ',' = comma, ':' = colon, '-'= hyphen , '?'= question maybe more]]
here is the tricky part, that I need help with:
from the reverse find the first occurrance of any of those characters ([' . ,:;-?']) -- capture any of these(not sure how to write that), get index
fnd = that index in the string.
table.insert(wrap, txt:sub(txtstr, fnd))
txtstr= txtstr - (fnd - plen) -- adjust next start to what you didnt take to insert.
end
Issues are I am unable to code it.
biggest issue is someone is already doing this very elegantly and efficiently, and I dont know how to find the code.
thanks, for any help on this.
FH V.6.2.7 Win 10 64 bit
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
I have done something like that in the Export Gedcom File plugin when deciding how much text will fit in one 250 character line and wrap on the last appropriate non-space character.
Let me dig out that code and adapt it to your requirements tomorrow.
Let me dig out that code and adapt it to your requirements tomorrow.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
You suggested characters to split on are ' ' = space, '.' = a period, ',' = comma, ':' = colon, '-'= hyphen , '?'= question
But consider some text such as:
"The FH v7.0.5 has 12,345 place records with Lat./Long. values such as 52.97894, -0.026577 ..."
Would you want that to wrap on '.' = a period, ',' = comma, '-'= hyphen ?
That might wrap in 7.0.5 or 12,000 or -0.0 which would not be sensible.
Now '.' = a period, ',' = comma, ':' = colon, '?'= question will usually be followed by a space, so they are not needed.
'-'= hyphen might need to wrap in long hyphenated words but that will need a complex conditional test.
So, the function to wrap on space characters only would as shown below.
The wrapped lines are more even in length if the line length (llen) has the number of lines (lnum) added.
But consider some text such as:
"The FH v7.0.5 has 12,345 place records with Lat./Long. values such as 52.97894, -0.026577 ..."
Would you want that to wrap on '.' = a period, ',' = comma, '-'= hyphen ?
That might wrap in 7.0.5 or 12,000 or -0.0 which would not be sensible.
Now '.' = a period, ',' = comma, ':' = colon, '?'= question will usually be followed by a space, so they are not needed.
'-'= hyphen might need to wrap in long hyphenated words but that will need a complex conditional test.
So, the function to wrap on space characters only would as shown below.
The wrapped lines are more even in length if the line length (llen) has the number of lines (lnum) added.
Code: Select all
function WRAP(str, plen)
local wrap = {}
local slen = #tostring(str)
local lnum = math.floor( slen/plen + 0.5 ) -- Number of lines
local llen = math.floor( slen/lnum ) + lnum -- Length of lines
repeat
local len = llen
local txt = str:sub(1,len) -- Next text line
if #str > llen then
len = txt:find("([^ ]-)$") - 1 -- Find trailing non-spaces
txt = str:sub(1,len)
end
table.insert(wrap,txt) -- Save wrap line
str = str:sub(len+1) -- Extract tail of text
until #str == 0
return wrap
end
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
Re: text wrap
Yes, absolutely sensible, I was only considering note fields, not any wrap of alphameric 'text' -- space is the gravamen.
But it raises the question, should I 'gsub \t, space' ? and 'gsub \n, space ' or treat it as a break regardless of line length?
I guess my lengths will have to be checked against utf8len because I have Norwegian, and French relations.
-- get length of utf8 string
function UTF8len(str)
str = tostring(str) or ''
if str == '' then
return 0
end
local isFlag = fhIsConversionLossFlagSet()
str = fhConvertUTF8toANSI(str)
fhSetConversionLossFlag(isFlag)
return string.len(str)
end -- fn UTF8len
I havent even considered printing for rnotes since I have not arrived into the 90's yet and am still on 6.7.2
what haven't I thought of, or what am I not considering that will hurt me here? in other words, what social and political faux paus am I about to make?
thanks.
But it raises the question, should I 'gsub \t, space' ? and 'gsub \n, space ' or treat it as a break regardless of line length?
I guess my lengths will have to be checked against utf8len because I have Norwegian, and French relations.
-- get length of utf8 string
function UTF8len(str)
str = tostring(str) or ''
if str == '' then
return 0
end
local isFlag = fhIsConversionLossFlagSet()
str = fhConvertUTF8toANSI(str)
fhSetConversionLossFlag(isFlag)
return string.len(str)
end -- fn UTF8len
I havent even considered printing for rnotes since I have not arrived into the 90's yet and am still on 6.7.2
what haven't I thought of, or what am I not considering that will hurt me here? in other words, what social and political faux paus am I about to make?
thanks.
FH V.6.2.7 Win 10 64 bit
- Mark1834
- Megastar
- Posts: 2511
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: text wrap
If I were coding that, I would do it a little differently. I’d just have a loop that located the last space in the first n + 1 characters of the input string (where n is the max line length), and split the line at that point.
It is significantly less code, and easier to follow. It might be slightly less efficient, but whether that is of any practical significance probably depends on whether it has to format 100 strings, or 10,000!
Neither version is right or wrong - just different relative weighting of clarity, efficiency, and technical “elegance”.
It is significantly less code, and easier to follow. It might be slightly less efficient, but whether that is of any practical significance probably depends on whether it has to format 100 strings, or 10,000!
Neither version is right or wrong - just different relative weighting of clarity, efficiency, and technical “elegance”.
Mark Draper
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
And if I was coding it I'd use the compat53 and utf8 libraries and use utf8.len
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: text wrap
If I was going to use those, I would have to find a compat53.lua and a 'utf8 libraries' in either the store or in knowledge base), I found a kepler on github that required c++ compiling, and a bunch of junk with it--- (loadrequire was found, in knowledge base, but not as an lua, a copy and paste in knowledge base) it requires an ephemeral or hard to find compat53.lua
after some sloshing around here is the documentation for the 'utf8 libraries' from the link 'Documentation':
https://github.com/Stepets/utf8.lua/blo ... t/test.lua
utf8.len ----- hmmmmmmmm.
I will wait until the api documentation and code has reached its infancy.
You; Helen and Mike are way above my level and think in this stuff it is prima facie intuitive to you........I am but a mere nouveau dilettante in pc coding of any kind.
I am afraid the learning curve to get the length of a UTF8 with several libraries, while not even being able to determine if fh uses luaJIT is simply beyond my meager ken.
love to do it, but yaml and luarocks and sticking c++ code in my libs is a far piece down the road for me.
after some sloshing around here is the documentation for the 'utf8 libraries' from the link 'Documentation':
https://github.com/Stepets/utf8.lua/blo ... t/test.lua
utf8.len ----- hmmmmmmmm.
I will wait until the api documentation and code has reached its infancy.
You; Helen and Mike are way above my level and think in this stuff it is prima facie intuitive to you........I am but a mere nouveau dilettante in pc coding of any kind.
I am afraid the learning curve to get the length of a UTF8 with several libraries, while not even being able to determine if fh uses luaJIT is simply beyond my meager ken.
love to do it, but yaml and luarocks and sticking c++ code in my libs is a far piece down the road for me.
FH V.6.2.7 Win 10 64 bit
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
If there are possible Norwegian and French foreign characters then the compat53 and utf8 libraries are helpful.
See FHUG KB Lua References and Library Modules and https://github.com/Stepets/utf8.lua e.g. utf8.len().
If there are only a few foreign characters then I suspect ignoring UTF8 will do little harm.
Try my solution without the libraries.
Mark, a loop that located the last space in the first n + 1 characters would be as below but not sure it is any clearer:
Ron, layout characters such as \t and \n should be considered.
It would be reasonable to convert \t to a single space.
It would be best to retain \n and wrap each line separately as shown below with a few refinements.
See FHUG KB Lua References and Library Modules and https://github.com/Stepets/utf8.lua e.g. utf8.len().
If there are only a few foreign characters then I suspect ignoring UTF8 will do little harm.
Try my solution without the libraries.
Mark, a loop that located the last space in the first n + 1 characters would be as below but not sure it is any clearer:
Code: Select all
local len = llen
if #str > llen then
while str:sub(len,len) ~= " " do -- Find last space
len = len - 1
if len <= 0 then -- Cater for no spaces
len = llen
break
end
end
end
local txt = str:sub(1,len)
table.insert(wrap,txt) -- Save wrap line
str = str:sub(len+1) -- Extract tail of text
It would be reasonable to convert \t to a single space.
It would be best to retain \n and wrap each line separately as shown below with a few refinements.
Code: Select all
function WRAP(str, plen)
str = tostring(str)
local wrap = {}
local lines = {}
str:gsub( "([^\n]+)", function(txt) lines[#lines+1] = txt end ) -- Split lines
for _, str in ipairs (lines) do
local slen = #str
local lnum = math.ceil( slen/plen + 0.5 ) -- Number of lines
local llen = math.ceil( slen/lnum + 0.5 ) + lnum -- Length of lines
repeat
local len = llen
local txt = str:sub(1,len) -- Next text line
if #str > llen then
len = txt:find("([^ ]-)$") - 1 -- Find trailing non-spaces
if len <= 0 then len = llen end
txt = str:sub(1,len)
end
table.insert(wrap,txt) -- Save wrap line
str = str:sub(len+1) -- Extract tail of text
until #str == 0
end
return wrap
end
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
All you need do is loadrequire them -- I've done all the hard work with Luarocks etc. and Jane has hosted them. As documented at Lua References and Library Modules (and already pointed to by Mike):If I was going to use those, I would have to find a compat53.lua and a 'utf8 libraries' in either the store or in knowledge base),
Code: Select all
loadrequire("utf8")
loadrequire("compat53")
utf8 = require(".utf8"):init()
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: text wrap
I used loadrequire and it got compat53, and got utf8 with no utf8.len that I can find
there is an artifact copy of util in utf8.
is it one of those, I have to use it before I know whats in it? it grabs modules as I use them?
there is an artifact copy of util in utf8.
is it one of those, I have to use it before I know whats in it? it grabs modules as I use them?
FH V.6.2.7 Win 10 64 bit
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
Have you added all three lines that Helen posted?
utf8 = require(".utf8"):init() is crucial. Without that there will be no utf8.len()
There should be a global table called utf8 and in that table will be len and char etc...
utf8 = require(".utf8"):init() is crucial. Without that there will be no utf8.len()
There should be a global table called utf8 and in that table will be len and char etc...
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
Did you use the up-to-date version of loadrequire at Module Require With Load that handles subdirectories?
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2511
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: text wrap
You're right Mike - the way you have implemented my comment is not particularly clear, but that is not what I would do!
Not necessarily fully bomb-proofed against all strings that could upset it, but I'm just making the point that sometimes simple is perfectly adequate. It's not "better" than more sophisticated versions, but it is easier for just about any coder to understand, no matter what their expertise level. Too much coding here goes off at the deep end when it's not necessary - just because an experienced author knows how to make it complicated doesn't mean that they should!
As a test of how practical it is, I converted a copy of my main GEDCOM file to one long 2 MB string in Notepad++ by converting all the CR/LF combinations to spaces. A simple plugin using this code parsed it into about 33,000 80 character lines in about 10 seconds of processing. That's good enough for me!
Code: Select all
function wrap(S, LineLength)
local T = {}
while S:len() > LineLength do
local j
for i = 1, LineLength, 1 do
if S:sub(i, i) == ' ' then j = i end
end
table.insert(T, S:sub(1, j - 1))
S = S:sub(j + 1)
end
table.insert(T, S)
return T
end
As a test of how practical it is, I converted a copy of my main GEDCOM file to one long 2 MB string in Notepad++ by converting all the CR/LF combinations to spaces. A simple plugin using this code parsed it into about 33,000 80 character lines in about 10 seconds of processing. That's good enough for me!
Mark Draper
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
That ...\Plugins\utf8 folder contents looks just like the FH v7.0 equivalent, so the utf8 library has been loaded and the compat53 library is listed too.
In Tools > Plugins use the New... option and enter the following statements into the New plugin window:
local utf8 = require('.utf8'):init()
local leng = utf8.len('abcd')
print(utf8,leng)
Use Go to run them and it should print table: 0435FE80 _ 4 lower left to show the utf8 table exists and utf8.len() is working.
Does that work or do you get an error message?
In Tools > Plugins use the New... option and enter the following statements into the New plugin window:
local utf8 = require('.utf8'):init()
local leng = utf8.len('abcd')
print(utf8,leng)
Use Go to run them and it should print table: 0435FE80 _ 4 lower left to show the utf8 table exists and utf8.len() is working.
Does that work or do you get an error message?
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
Minor correction to Mike's suggested test code:
require will work here because you've already download the two libraries.
Code: Select all
require("utf8")
require("compat53")
local utf8 = require('.utf8'):init()
local leng = utf8.len('abcd')
print(utf8,leng)
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
Helen, I tested my suggested statements in FH v6.2.7 and they work fine without those require() statements.
For the purposes of testing the existence of utf8.len() they are sufficient.
For the purposes of testing the existence of utf8.len() they are sufficient.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
I wouldn't dare question that you'd tested it, Mike. However, for the benefit of anyone who finds this topic, I believe it's best to be consistent about the sequence needed to fully exploit the extended utf8 library.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: text wrap
maybe I am missing something very fundamental here. but I am not.
where is the documentation of utf8.whatevers and how to use them?
where is the code? if its only binaries, and since I don't know good old Steve-o, we are at the
end of any discussion of using this scrap, because it looks flaky to me.
what table is it in?
what is the address of utf8.invisible divided by utf8.imadethisup and multiplied by utf8.blks?
apparently this library is a dessert topping and a floor wax, since you can name anything anything you need, and it does what you want, since no documentation exists in human readable format, as I have sent you the link I got for that documentation. And both of you are talking magic.
let's start simple, where is the function utf8.len and its args, and rtx values?
utf8.dll is in compat53. whats in it? how do I use it? the api is not documented in the link for documentation.
utf8.whatareyoutalkingabout?
where is the documentation of utf8.whatevers and how to use them?
where is the code? if its only binaries, and since I don't know good old Steve-o, we are at the
end of any discussion of using this scrap, because it looks flaky to me.
what table is it in?
what is the address of utf8.invisible divided by utf8.imadethisup and multiplied by utf8.blks?
apparently this library is a dessert topping and a floor wax, since you can name anything anything you need, and it does what you want, since no documentation exists in human readable format, as I have sent you the link I got for that documentation. And both of you are talking magic.
let's start simple, where is the function utf8.len and its args, and rtx values?
utf8.dll is in compat53. whats in it? how do I use it? the api is not documented in the link for documentation.
utf8.whatareyoutalkingabout?
Last edited by Ron Melby on 06 Jun 2021 11:21, edited 1 time in total.
FH V.6.2.7 Win 10 64 bit
- tatewise
- Megastar
- Posts: 28410
- Joined: 25 May 2010 11:00
- Family Historian: V7
- Location: Torbay, Devon, UK
- Contact:
Re: text wrap
Ron, the utf8 documentation is at https://github.com/Stepets/utf8.lua which displays the README guide.
Under Usage it says "It also provides all functions from Lua 5.3 UTF-8 module except utf8.len (s [, i [, j]])"
Click on that module link and it takes you to the Lua 5.3 UTF-8 documentation:
https://www.lua.org/manual/5.3/manual.html#6.5 that describes all the functions.
I don't understand why it says "except utf8.len (s [, i [, j]])" which seems to work the same in FH v6.2 and FH v7.0.
In fact, I cannot get the parameters i and j to work in either. They are ignored completely.
For compat53 the documentation is at https://github.com/keplerproject/lua-compat-5.3.
Under What's implemented it explains what it offers.
Under Usage it says "It also provides all functions from Lua 5.3 UTF-8 module except utf8.len (s [, i [, j]])"
Click on that module link and it takes you to the Lua 5.3 UTF-8 documentation:
https://www.lua.org/manual/5.3/manual.html#6.5 that describes all the functions.
I don't understand why it says "except utf8.len (s [, i [, j]])" which seems to work the same in FH v6.2 and FH v7.0.
In fact, I cannot get the parameters i and j to work in either. They are ignored completely.
For compat53 the documentation is at https://github.com/keplerproject/lua-compat-5.3.
Under What's implemented it explains what it offers.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
Yes, you are. You're not reading what you're directed to.
Follow the documentation links that Mike has repeatedly linked to (they're also in the KnowledgeBase).
The code for the stepets library is at https://github.com/Stepets/utf8.lua -- it's a pure lua library, so the code is available to you.
The documentation is linked to from the stepets library. utf8.dll is part of lua 5.3 backported to 5.1, so not flaky at all. It's a c library, so you can't inspect the code (or you can but you may not understand it), but then it's the same for the code that makes up lua but you seem happy to use it.utf8.dll is in compat53. whats in it? how do I use it? the api is not documented in the link for documentation.
By now, I'd hope you had come to understand that Mike rarely leads you astray; I certainly shan't be doing so in future because I'm out-of-here -- I have less patience than Mike with people who don't listen to what they're told and then complain things don't work.
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
Re: text wrap
mike, ok, well that is the long way around the barn.
since I am not writing to the gedcom much as of yet, I have the code here, and it looks like when fh 7 gets some more of its bugs out if I get the upgrade, then this sort of thing is wrapped in.
I think my UTF8len works sufficiently to write displayable text.
helen, this link was provided but no further instructions: https://github.com/Stepets/utf8.lua
Now, I want you to go down to the readme there, and find the link 'Documentation' and click on it.
Oh, foolish me to believe that link pointed to something akin to explication.
It is only after mike provided additional USEFUL information did I understand the situation. and that Documentation does not mean Documentation.
since I am not writing to the gedcom much as of yet, I have the code here, and it looks like when fh 7 gets some more of its bugs out if I get the upgrade, then this sort of thing is wrapped in.
I think my UTF8len works sufficiently to write displayable text.
helen, this link was provided but no further instructions: https://github.com/Stepets/utf8.lua
Now, I want you to go down to the readme there, and find the link 'Documentation' and click on it.
Oh, foolish me to believe that link pointed to something akin to explication.
It is only after mike provided additional USEFUL information did I understand the situation. and that Documentation does not mean Documentation.
FH V.6.2.7 Win 10 64 bit
- ColeValleyGirl
- Megastar
- Posts: 5499
- Joined: 28 Dec 2005 22:02
- Family Historian: V7
- Location: Cirencester, Gloucestershire
- Contact:
Re: text wrap
The readme is the documentation. That's what a readme file is. Which is why it's linked as Online Documentation from the KB article on libraries. https://en.wikipedia.org/wiki/READMERon Melby wrote: ↑06 Jun 2021 11:50 helen, this link was provided but no further instructions: https://github.com/Stepets/utf8.lua
Now, I want you to go down to the readme there, and find the link 'Documentation' and click on it.
Similarly the KB links to the readme file for compat53.
readme files may be outwith your previous experience, but you could at least read them!
Helen Wright
ColeValleyGirl's family history
ColeValleyGirl's family history
- Mark1834
- Megastar
- Posts: 2511
- Joined: 27 Oct 2017 19:33
- Family Historian: V7
- Location: South Cheshire, UK
Re: text wrap
Without wishing to take sides in this squabble, I am sympathetic to the view that useful documentation of some of the more advanced features of Lua is very thin on the ground. It’s a highly technical subject, and most of the stuff I’ve seen is written by experts for other experts. There’s very little tutorial type material around compared with say Python.
Fortunately, the overwhelming majority of FH users don’t have to worry about it. Only a small subset of users write plugins, and only a small subset of that subset would go beyond the standard libraries and FH API.
We’ve come a long way from the original question of how to parse a text string... .
Fortunately, the overwhelming majority of FH users don’t have to worry about it. Only a small subset of users write plugins, and only a small subset of that subset would go beyond the standard libraries and FH API.
We’ve come a long way from the original question of how to parse a text string... .
Mark Draper