Lấy toàn bộ url trong một webpage với ASP pot - Pdf 21

Lấy toàn bộ url trong một webpage với ASP
Chúng ta sẽ xây dựng một class đơn giản để lấy toàn bộ urls trong một web
page

Class này có một public method: RetrieveUrls, method này lại gọi 2 private
mothods khác: RetrieveContents và GetAllUrls


RetrieveContents sẽ phát đi một request tới web page, và nhận lại nội
dung của page.


GetAllUrls method sẽ dùng một expression đơn giản để tìm tất cả urls
trong page, sau đó in toàn bộ ra screen, đồng thời cũng lưu vào file log.

Dưới đây là toàn bộ code của class:
using System;
using System.Collections.Generic;
using System.Text;
using System.Net;
using System.IO;
using System.Text.RegularExpressions;

namespace FindAllUrls
{
class GetUrls
{

//public method called from your application
public void RetrieveUrls( string webPage )
{

response.Close();
respStream.Close();
}
}

//using a regular expression, find all of the href or urls
//in the content of the page
private void GetAllUrls( string content )
{
//regular expression
string pattern =
@”(?:href\s*=)(?:[\s”"‘]*)(?!#|mailto|location.|javascript|.*css|.*this\.)(?
.*?)(?:[\s>”"‘])”;

//Set up regex object
Regex RegExpr = new Regex(pattern, RegexOptions.IgnoreCase);

//get the first match
Match match = RegExpr.Match(content);

//loop through matches
while (match.Success)
{

//output the match info
Console.WriteLine(”href match: ” + match.Groups[0].Value);
WriteToLog(”C:\matchlog.txt”, “href match: ” + match.Groups[0].Value +
“\r\n”);

Console.WriteLine(”Url match: ” + match.Groups[1].Value);

Nhờ tải bản gốc

Tài liệu, ebook tham khảo khác

Lấy toàn bộ url trong một webpage với ASP pot - Pdf 21

Tài liệu, ebook tham khảo khác

Học thêm