Webmaster Forums - Webmaster forum for HTML, PHP, ASP, CSS and more

Go Back   Webmaster Forums - Webmaster forum for HTML, PHP, ASP, CSS and more > Web Programming > Other Programming - Perl, C++, Java, ASP, .NET Development
User Name
Password

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 08-13-2006, 04:53 PM   #1 (permalink)
grobar
Junior Member
 
Join Date: May 2006
Posts: 7
Default regexp implementation help

Okay, so ive got a regexp that works perfectly in an online tester. I am having trouble implementing in my existing code.

It takes a variable (xmlhttp.responseText) that holds an external page's source code, and checks it for the existance of a link to a specified domain. It should then show ONLY the HTML for that link, and then break it up in the sub strings.

What it DOES do currently, is return the whole string, EXCEPT the link it should, in effect doing the opposite.

Any help is appreciated. Source code i'm checking against to test is robarspages.ca/index.asp and looking for the link to billiardsforum.info.

The VB code thus far:

Code:
url = objRSanc("la_blt_link_url") set xmlhttp = server.CreateObject("MSXML2.ServerXMLHTTP") on error resume next xmlhttp.open "GET", url, false xmlhttp.send "" if err.number <> 0 then VerifyUrl=false response.redirect("google.com/?0000000000000000000000") else set regizzo = New RegExp regizzo.Pattern = "(<[^>]*?a[^>]*?(?:billiardsforum.info)[^>]*>)((?:.*?(?:<[ \r\t]*a[^>]*>?.*?(?:<.*?/.*?a.*?>)?)*)*)(<[^>]*?/[^>]*?a[^>]*?>)" regizzo.Global = True regizzo.IgnoreCase = True dim stipulation stipulation = regizzo.replace(xmlhttp.responseText, "$2") response.write stipulation Set regizzo = Nothing dim updiz Set updiz = Server.CreateObject("ADODB.Connection") updiz.Open strdbee Set objRSupd = Server.CreateObject("ADODB.RecordSet") strsqlu = "UPDATE la_blt SET la_blt_link_anchor='somevar' WHERE la_blt_id =44" objRSupd.Open strSQLu, updiz, adLockOptimistic end if response.write("done.") %>
grobar is offline   Reply With Quote
Sponsored Links
Old 08-16-2006, 08:44 PM   #2 (permalink)
rjplummer
Junior Member
 
Join Date: Aug 2006
Posts: 4
Default Re: regexp implementation help

Your regexp seems overly complex. First, nothing can come between the < and the "a" in an anchor tag.

I'm assuming you're only trying to find normal anchors, e.g., not forms that submit to billiardsforum.info and not anchors where the URL is in the onclick handler.

I'd use a pattern like:

<a[ \x09\x0A\x0D][^>]*href="?([^>?" \x09\x0A\x0D]*[./]billiardsforum.info[^>" \x09\x0A\x0D]*)[>" \x09\x0A\x0D]

[ \x09\x0A\x0D] is valid whitespace
Yoy may not want to exclude "?" in the text preceding billiardsforum.info since that would keep you from catching links via redirectors, but then you'd have to allow "%2f" as well as "." and "/" before billiardsforum.info, i.e. "(%2f|\.|/)"
rjplummer is offline   Reply With Quote
Old 08-19-2006, 11:13 AM   #3 (permalink)
grobar
Junior Member
 
Join Date: May 2006
Posts: 7
Default Re: regexp implementation help

Thanks for the detailed reply.

The one I was using returned the anchor text, as well as the whole html tag of the link. (helpful for other tests i run, such as checking for the nofollow tag)

I'm not too savy with regex, how could I add to your suggestion so that it returns the anchor, and teh entire <a> string?
grobar is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Points Per Thread View: 1.00
Points Per Thread: 11.00
Points Per Reply: 5.00



» Sponsors

» Links

» Affiliates
Web Hosting
Online Backup Reviews
Marketing Find
Merchant Select
SiteMap Builder
Host Compare
Dedicated Servers

» Links

» Sports Network
Paintball Forum
Football Forum
Hockey Forum
Golf Forum
Boxing Forum
Lacrosse Forum
Baseball Forum
SnowBoarding Forum
Soccer Forum
MMA Forum


All times are GMT -4. The time now is 06:01 AM.



LinkBacks Enabled by vBSEO 3.0.0 RC8
Webmaster Forums