* patterns: finding the nil where positional parameters are important

For plugin authors to discuss plugin programming
Post Reply
User avatar
Ron Melby
Megastar
Posts: 928
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

patterns: finding the nil where positional parameters are important

Post by Ron Melby »

Code: Select all

 
 loc = 'Bruce,, Rusk, WI, USA'
--[[ another:]]
 -- loc = ', Valley, Marshall, MN, USA'

  local FLD = {}
  local idx = 0
  for field in loc:gmatch('([^,]+)') do
    if field == nil then field = ' ' end
    idx = idx + 1
    FLD[idx] = field:gsub('^ ', '') or ' ' 
  end
**NB: ' ' is a space character
my ADDR and PLAC fields are csv and positionaly important
loc is:
City, township, county, state, nation
as you see, I do not have the township in Bruce but I need to know that it is not there by placing a 'space character' in FLD[2]
I have tried everything I can think of (I am not good at patterns) to capture the situation, but to no avail.
Thanks.
Last edited by Ron Melby on 12 Oct 2023 13:33, edited 1 time in total.
FH V.6.2.7 Win 10 64 bit
User avatar
ColeValleyGirl
Megastar
Posts: 5520
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: patterns: finding the nil where positional parameters are important

Post by ColeValleyGirl »

I'm no good at patterns either, but have you tried splitting the string on commas with the results saved in an array and testing the values in the array. It might not be elegant but it should work.
User avatar
Ron Melby
Megastar
Posts: 928
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

Re: patterns: finding the nil where positional parameters are important

Post by Ron Melby »

thats what I think I am doing, in that code
FH V.6.2.7 Win 10 64 bit
User avatar
ColeValleyGirl
Megastar
Posts: 5520
Joined: 28 Dec 2005 22:02
Family Historian: V7
Location: Cirencester, Gloucestershire
Contact:

Re: patterns: finding the nil where positional parameters are important

Post by ColeValleyGirl »

http://lua-users.org/wiki/SplitJoin has a discussion of the problems and some suggestions for code that will work with empty values. Your code treats multiple separator characters as a single one.
User avatar
Mark1834
Megastar
Posts: 2535
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: patterns: finding the nil where positional parameters are important

Post by Mark1834 »

The key is being able to find zero occurrences. In Lua patterns, + means one or more matches, while zero or more matches is *. The following code works if there are no spaces either side of the comma delimiters, which is probably simpler and more elegant than having spaces after the commas:

Code: Select all

loc = 'Bruce,,Rusk,WI,USA'

city, township, county, state, nation = loc:match('([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)')

print(city, township, county, state, nation)
Mark Draper
avatar
jelv
Megastar
Posts: 646
Joined: 03 Feb 2020 22:57
Family Historian: V7
Location: Mere, Wiltshire

Re: patterns: finding the nil where positional parameters are important

Post by jelv »

Spaces are likely to be after the commas. This discards those if present:

Code: Select all

loc = 'Bruce,, Rusk, WI, USA'

city, township, county, state, nation = loc:match('([^,]*), *([^,]*), *([^,]*), *([^,]*), *([^,]*)')

print(city, township, county, state, nation)
John Elvin
User avatar
Mark1834
Megastar
Posts: 2535
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: patterns: finding the nil where positional parameters are important

Post by Mark1834 »

A good enhancement. I would probably use %s rather than a literal space, as it is easier to proofread, and as a belt-and-braces check, also define the start and end of the string to ensure that there are no additional fields lurking.

Code: Select all

city, township, county, state, nation = loc:match('^([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*)$')
Mark Draper
User avatar
Ron Melby
Megastar
Posts: 928
Joined: 15 Nov 2016 15:40
Family Historian: V6.2

Re: patterns: finding the nil where positional parameters are important

Post by Ron Melby »

perusing the link that ColeValleyGirl gave me (and understanding only one line of code in the whole thing), it appears that a straightforward solicitation of the occurence is de minimis:

Code: Select all

  local FLD = {}
  local idx = 0
  local str
  for field, pos in loc:gmatch('(.-),()') do 
    idx = idx + 1
    if field == '' then field = ' ' end
    FLD[idx] = trim(field)
    str = pos
  end
  FLD[#FLD + 1] = string.sub(loc, str)
*NB: trim (field)
if field == ' ' then return field end
yadda yadda yadda blah blah blah

thanks for all the ideas, I will use them here and there.
FH V.6.2.7 Win 10 64 bit
avatar
jelv
Megastar
Posts: 646
Joined: 03 Feb 2020 22:57
Family Historian: V7
Location: Mere, Wiltshire

Re: patterns: finding the nil where positional parameters are important

Post by jelv »

Using Mark's suggestion you can achieve the exact equivalent in two lines:

Code: Select all

local FLD = {}

FLD[1], FLD[2], FLD[3], FLD[4], FLD[5] = loc:match('^([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*)$')
SplitCSV.png
SplitCSV.png (31.39 KiB) Viewed 1347 times
John Elvin
User avatar
tatewise
Megastar
Posts: 28488
Joined: 25 May 2010 11:00
Family Historian: V7
Location: Torbay, Devon, UK
Contact:

Re: patterns: finding the nil where positional parameters are important

Post by tatewise »

Sorry I'm late to this discussion but I've been away from my PC and unable to double-check my solution.

In the OP sample script replace the for loop statement with:

for field in (loc .. ','):gmatch('([^,]-),') do

That uses the pattern '([^,]-),' which matches any non-comma string including the empty string followed by a comma.
It will cope with adjacent commas but needs a trailing comma added to the input text to match the last field.
Unlike some other solutions, it will work for any number of comma-separated fields.

Unlike Ron's latest solution, it does not need a special fix to extract the trailing field.
Mike Tate ~ researching the Tate and Scott family history ~ tatewise ancestry
avatar
jelv
Megastar
Posts: 646
Joined: 03 Feb 2020 22:57
Family Historian: V7
Location: Mere, Wiltshire

Re: patterns: finding the nil where positional parameters are important

Post by jelv »

It has been pointed out to me (thanks Mark) that my suggestion is double the number of lines needed. The optimal complete code to achieve the desired result is:

Code: Select all

local FLD = {loc:match('^([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*),%s*([^,]*)$')}
SplitCSV.png
SplitCSV.png (25.95 KiB) Viewed 1222 times
John Elvin
User avatar
Mark1834
Megastar
Posts: 2535
Joined: 27 Oct 2017 19:33
Family Historian: V7
Location: South Cheshire, UK

Re: patterns: finding the nil where positional parameters are important

Post by Mark1834 »

Welcome back Mike - I didn't think for a second that you would pass up a discussion on patterns without chipping in ;).

If we need a more generic solution to cope with an arbitrary number of null and non-null fields, I'd be tempted to follow Helen's suggestion and resort to traditional string functions:

Code: Select all

loc = 'Bruce,,Rusk,WI,USA'

local tblT = {}
while true do
	local i = loc:find(',')
	if not i then							--	no more delimiters
		table.insert(tblT, loc)
		break
	end
	table.insert(tblT, loc:sub(1,i-1))
	loc = loc:sub(i+1)
end

The advantage of this technique is that it is broadly language independent. I've used similar logic in VBA arrays, Python lists and Lua tables, and it is clear and transparent to just about any author, while the discussion above illustrates that pattern matching can be a bit of a specialist area.

I know string handling is much less memory-efficient than tables in Lua, but if large data arrays are expected, that is easily fixed with appropriate garbage collection. The pattern technique suggested requires modifying the input string anyway.
Mark Draper
Post Reply