littled.net

Web design & development; online and offline ramblings

Lorem

Variables in Python regular expressions: an example

February 11th, 2008 · No Comments

This is something very useful which I picked up from a mailing list, so I’m certainly not claiming to have come up with it.

The problem: basically, you need to replace some text or code using a regular expression in Python, but you need that regular expression to contain a variable reference so it can be run within a loop etc. How do you do this?

The example below is the actual one I was working on — I needed to replace a <div> tag with a changing id with the actual contents of a file within Zope, in order to insert HTML snippets (YouTube video, Google code etc.) into a page. The page editing tool we use (EditonPro) sensibly strips out this kind of stuff, but this was a special case where the site editor really needed to do this. The inserted tags looked something like this:

 <div id="snippet-123456">
   [snippet: YouTube video of something]
 </div>

Then I defined a method (getSnippet.py) with the following code:

import re

def getSnippet(self, content):

   content = content.replace('\n','')
   pattern = re.compile(r"""(<div id="snippet

-)([^"]*)(”>[^<]*)(</div>)”"”, re.MULTILINE)

   reg = re.findall(pattern, content)

   if reg:

      count = len(reg)

      i=0

      while i < count:

         snpid = str(reg[i][1])

         sr =  r”"”<div id=”snippet-%s”[^<]*</div>”"”

         src = re.compile(sr % snpid)

         srch = src.findall(content)

         srch = srch[0]

         try:

           repl = self.aq_parent[snpid].data

         except:

           repl = ”

      content = content.replace(srch, repl)

      i=i+1

   return content

Briefly, this is how this works: in the document_view page template a call is made to getSnippet thus:

<div tal:define="content here/CookedBody"
     tal:replace="structure python: here.getSnippet(content)"/>

The page content is passed to the getSnippet External Method which first does a regex to find all the relevant bits of code. If anything is returned in the variable “reg”, a while loop is initiated.

The div ids, which map to file names of the Zope files, are extracted from “reg” while this loop runs. We use them to construct a regular expression which will match the relevant snippet code. The key lines are these:

sr =  r"""<div id="snippet-%s"[^<]*</div>”"”

src = re.compile(sr % snpid)

We use the %s syntax when defining our regular expression and pass the variable to it when compiling it.

It works, great stuff.

Tags: python · zope

0 responses so far ↓

  • There are no comments yet...Kick things off by filling out the form below.

Leave a Comment