DIY-Software-Club

This is where my Regex resource for myself will be. It is the most frustrating of all programming tools/languages. it is the only one that does what it does, and it has to be used in conjunction with all other languages if string searches are part of an application. It needs an upgrade to properly standardize and simplify some things. I'll show examples of that here...

Regex

See DIY-Software club Regex Master Ref File to right

REGEX Master with confusions noted

PROBLEM EXAMPLES 1

EXAMPLE 1: The goal is to get SEND from between the Brackets. With computer programming, this seems like it would be common and easy. If I told you the answer for doing that was \[(.*?)\] would you be inspired to study programming? Then if I told you that that doesn't work in javascript for two reasons, how does that sound?

To the left is the explanation for (.*?) once you know the system it doesn't seem too bad EXCEPT 1) there is seemingly no standard for 'extract' with the () in javascript and 2) apparently the ? doesn't work there either...

RegEx - GDoc (JB)

This is a file/info I got from someone years ago. I changed it from a pgoramming file format to a gDoc a while back for easier reading and editing. This file is being served from the BRICs gDrive. I have a similar one without annotations on my own drive (click here) All other files are shared from there. I won't duplicate those (note to self more so than for others).

Other useful GSheets...

=REGEXEXTRACT(C9,"\|\|(.*)") would get SEND from xxx||SEND and you only need one of the leading | markers

=REGEXEXTRACT(C9,"\|\|(.*)\|\|") would get SEND from xxx||SEND|| and you only need one of the leading and trailing | markers

if you needed to narro the scope add the ?

if it's wrapped in common deliminters, splitting is likely better..

PROBLEM EXAMPLES 2

Gdoc to Gmail - numerous simple Regex problems

Problem Example

On 5/17/2023 I tried to do "something simple". I tried to store text in a gDoc and/or a gMail and then use that text as a template for future emails. The level of regexInterferance I ran into trying to do something that "should have been simple" was startling.

The goal was to store text (and images) in a gDoc. Cal the gDoc, grab the stuff and put it into an email The format for the gDoc was going to be with xml, a very friendly way to store data in text :

---- gDoc File content --

<subject> BRICs Course Offerings 2023</subject>
<attachments> {a file reference somehow} </attachments>
<body>

Hi - This email provides our Course offerings for 2023.

Please support our Work by purchasing just 5 hours of your CE credits from us each renewal period.

Regards

Bryan and Ric (with signature images, preferably)

</body>

---------------------------

This should be a no brainer for regEx, right? Not only not at all... not f--ing at all...

I should have been able to do something like this in JavaScript / google script

pattern = / <body>(.+)</body> /

arrMatches = gDocText.match(pattern)

And I should have gotten just

"BRICs Course Offerings 2023" in the arrMatches[0] position

The parenthesis should have acted as a "capture" and the .+ should have acted as a wild card for all characters that exist until you find the </body> tag. Simple stuff. is not so simple....

Problem 1 - the forward slash as bookends - the Forward slash is the JavaScript delimiter on both ends so obviously it can't show up in the pattern match without an "escape" somehow, BUT with regEx the / slash is not a charcter that requires escape? if you do it, Google Script processer will accept the escape and compile but the match then never seemed to work ... Why not change the start and end pattern stuff somethin g crazy and/or customizable? even better, thus nothing is ever a problem...

objRegex.pattern_bookends = "//$//"

objRegex.pattern = //$// <body>(.+)</body> //$//

{what a novel concept... then the backside code just uses a split command and position 1 for the pattern. and the bookends are never problematic again}

Problem 2 - the capture didn't work - The ( ) brackets have several uses. one was/is supposed to be a subset capture somehow. In many applications the goal is to get info that is surrounded with known info that you do NOT need. You can always add that back if you need that?! Why is there no perfect and always functional capture system for a subset of the match?! I can add back on stuff as needed?!\

Problem 3 - Line feed and carraige returns -- I struggled with LF and cariage returns for a bit when I had the tags on their own lines. I finally did something like .\S\s to get it all. Why not put "commas" in to separate the stuff with a simple comma escape system like the double quotes system... just for readability...

=====================================================================

I spentbetter part of an hour or two trying to get this to work before having to go an entirely different direction. This isn't a useful system.

I ended up having to use <subject> <end_subject>, grab an entire string and then remove the tags. Truthfully having to remove the tags is not a big deal but then I had to contend with white space (line feeds) too. unsure if any trim would have worked easily The capture of a lesser set than the full search criteria seems like it should be a no brainer and is not...

Gdoc to Gmail - the photos

I never even got to trying to figure out if/how I could get the photos from the page and transfer them to the email. Later I did the gmail to gmail and figured out the photo blob for that, so maybe that could work for this but unsure at this time.

Gmail to Gmail - numerous simple Regex problems

Here's the Challenge

The Gmail to Gmail copy was a nightmare too... I didn't mind the fact I had to get parts and resubmit those after that became clear. (I don't want to use APIs that are subject to breaking). I didn't even mind the fact I had to recreate an image blob (object) once I figured out that's what I had to do. What I minded after all of that was "figuring out" how useless regex was again without a functional "capture" system for a simple parsing task. I used 3 primary forum / blog posts to figure this stuff out. they are noted below. Here's how this all plays out with regex only as part of the problem.

To get the data from an existing email or draft and use it to create a new draft is like programming jungle gym 101. Once explained clearly it's okay, but those out there explaning now aren't so clear...

You can get an array of the images from the message template with a Google Script Msg object. It seems there used to be a problem with the order they were given in, but that seems no longer a problem. They seem to be in order of appearance on the page (but maybe not) and ironically these have no meta data with them to tell you what local IDs they are associated with...
You can create an array of image ids from the body of the message template using RegEx
You can put those two together to create a dictionary object for submission to the new message, (assuming the image array is in the order of the names you can get from the document)

Below are the details. We'll start with the end goal (3) then do 1 and 2 and 3.

The goal is to create a "Dictionary Object / Blob "that has the following structure...

images = { imgID1 = imageFile1 , imgID2 = imageFile2 , imgID3 = imageFile3 }

A system such as this must exist in the meta data for the email we are using as a template, but remarkably when you ask for it, all you get is an array of images without access to the IDs ???

When images are pasted into a Gmail with no global reference, an id is created and associated with the image. It's stored in some type of blob dictionary behind the scenes as suggested above, but we can't get to that -- so we have to get all the parts for that and recreate it for submission to creater the new email from this one as a template AND We have to submit as dictionery object because we aren't pasting in the images to allow them to go through that id assignment process AND we are pasting in the email body with the image IDs hard coded...

Step 1 - get the images as an array from the existing message.

Thiis step is "weird" and you can realize that whoever created the google scripting for this was not thinking big enough about how to access stuff when they did it.

Embedded Images are Attachments
Attachments like you are use to thinking of are Attachments

So how are they differentiated?

Ironically it's by asking for all attachments that are NOT

Why didn't they buid a wrapper called "getInlineImages"? Dunno. Maybe b.c Sundar's wife works for Intuit and they are more interested in self serving than global ed, but probably not the root cause of this given this would have happend before his time. Then, when you want attachments, just ask for them, but be sure to tell the system not to give you anything that is NOT an attachment (an inline image) . NOTE, with this example, which I copied from others, they left off the includeAttachments in the second line because obviously it defaults to true BUT leaving it off is a bad move as it creates confusion. I'll add it for easeier reading on my next edit.

Also..

eImages is in fact an array. I made up that name, but 'll rename that arrEmailImages when I go back into this for edits...

eAttachments is in fact an array. I made up that name, but I'll rename that arrEmailAttachments when I go back into this for edits..

NOTE: I have no clue why eAttachments is grey. it did that last nigth for no reason. They have over controlled the insertino of bracketing such that if you open something it automaically ads a closing along with all the coloring and it's too much. it prevented nice flow. working in this interface is like trying to make music plucking sand.

Step 2 - get the imageIDs from the existing message.

A refernece to the imageID is included in the html in the body of the email. It's a CID number... ( cid: xxxxxxx )

The nice identifier like "cid" with an easy split character of : is great.

Regex on the text string to get the unique IDs should be a no brainer. (but its not)

I can in fact get this from one of two attributes of each relevant tag.

I should ahve been able to do sometrhing simple like this...

pattern = / src="cid:(.+)">/

arrMatches = gDocText.match(pattern)

and I should have gotten back arrMatches = [ ii_lhsil6fg0 , ii_lhsil6fn1 ] The brackets should have denoted the capture content and the .+ inside should have been sufficent to cover everything until the next " was found.

Problem 1 - the capture didn't work (again) - The idea of being able to easily capture a subset of a match is almost all I use regEx for? Why is it not core to the langague? Even if I can't get exactly what I need, excluding that which I can use to identify but know I don't need is a no brainer, yet not easily part of this system in a reliable manner?

Problem 2 - the WildCards - I had to eventually go to "any character but", which is fine, although I think I got lucky as I don't believe that coveres the non white spaces? It's like throwing mud at a wall...

The circle 2 below has the final solution for gettin an array with more than I needed. Sure the regex looks "short" because it is. The time spent to create that short line was an hour because more logical stuff didnt' work...

Step 3 - Clean up the image ID array and create the Dictionary in one loop

in step three we took the image ID array (text) and we assigned each item to a temp variable while cutting away the excess text. Then we do the inlineImgeObj [tempID] = eImages[i] === > we need to change temp to tempID to make that clear. Also, this method of assigning items to an object may not look famialair. it's required because the other method of inLineimagesObj.ii_233322 = {image blob} can't work because we ahven't provided the key to the object yet. With this, the key is added to the object along with the value simultaneously.

The result is

inlineImgeOb = { imgID1 = imageFile1 , imgID2 = imageFile2 } (in our case we had two images)

In our case, the images are in the correct order and they did line up. I'm not sure if that will always be the case. There was an open tech support ticket by "hawksey" person above, that seems to be resolved. See his link for a link to that

And all of this plus some is required to use a Gmail email or draft as a template for a new gmail/email draft...

and 100's of ppl work together to figure this stuff out...

Many without true financial incentive to make it make sense...

They do it b/c it feels like it's what they are supposed to do to contribute...

Page updated

Report abuse