Friday, April 15, 2011

Regex not returning 2 groups

Hello everyone,

I'm having a bit of trouble with my regex and was wondering if anyone could please shed some light on what to do.

Basically, I have this Regex:

\[(link='\d+') (type='\w+')](.*|)\[/link]

For example, when I pass it the string:

[link='8' type='gig']Blur[/link] are playing [link='19' type='venue']Hyde Park[/link]" 

It only returns a single match from the opening [link] tag to the last [/link] tag.

I'm just wondering if anyone could please help me with what to put in my (.*|) section to only select one [link][/link] section at a time.

Thanks!

From stackoverflow
  • Regular Expressions Info a is a fantastic site. This page gives an example of dealing with html tags. There's also an Eclipse plugin that lets you develop expressions and see the matching in realtime.

  • You need to make the wildcard selection ungreedy with the "?" operator. I make it:

    /\[(link='\d+')\s+(type='\w+')\](.*?)\[\/link\]/
    

    of course this all falls down for any kind of nesting, in which case the language is no longer regular and regexs aren't suitable - find a parser

    annakata : I had to change some other aspects of the regex for it to make sense to my ecmascript brain...
    fishkopter : Thanks alot! works perfectly!
    Tomalak : @annakata: I think this question would have been a reasonable candidate for the "regexhtmlparserquestions" tag you once put up. ;-)
    annakata : sigh, I do miss that tag :)
    Tomalak : There is still one question that has it. You can still go for the Taxonomist badge. :-)
  • You need to make the .* in the middle of your regex non-greedy. Look up the syntax and/or flag for non-greedy mode in your flavor of regular expressions.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.