View Single Post
Old 08-16-2006, 08:44 PM   #2 (permalink)
rjplummer
Junior Member
 
Join Date: Aug 2006
Posts: 4
Default Re: regexp implementation help

Your regexp seems overly complex. First, nothing can come between the < and the "a" in an anchor tag.

I'm assuming you're only trying to find normal anchors, e.g., not forms that submit to billiardsforum.info and not anchors where the URL is in the onclick handler.

I'd use a pattern like:

<a[ \x09\x0A\x0D][^>]*href="?([^>?" \x09\x0A\x0D]*[./]billiardsforum.info[^>" \x09\x0A\x0D]*)[>" \x09\x0A\x0D]

[ \x09\x0A\x0D] is valid whitespace
Yoy may not want to exclude "?" in the text preceding billiardsforum.info since that would keep you from catching links via redirectors, but then you'd have to allow "%2f" as well as "." and "/" before billiardsforum.info, i.e. "(%2f|\.|/)"
rjplummer is offline   Reply With Quote
Sponsored Links