Evaluating whether URLs are valid or not

Top Japanese page




Overview

This script checks whether given URL link is valid or not using LWP::UserAgent.

Flow

  1. Load LWP::UserAgent
  2. Put the URL to be check into an array
  3. Make an object of LWP::UserAgent
  4. Set seconds number for timeout
  5. Check the URL one by one by accessing to the site
  6. Display the results

A sample code

 use LWP::UserAgent;
 
 my @urllist = qw(
               http://www.hidekik.com/
               http://www.hidekik.com/cookbook/
               http://www.fakefortest.abc
                 );
 
 my $ua = LWP::UserAgent->new;
 $ua->timeout(5);
 
 print "Content-Type:text/html\n\n";
 
 foreach (@urllist) {
    my $response = $ua->get($_);
    if ($response->is_success){
        print "$_ : is valid<br>";
    } else {
        print "$_ : is invalid<br>";
    }
    print $response->status_line;
    print "<p>";
 }

Descriptoin of the code

 use LWP::UserAgent;

Load LWP::UserAgent.

 my @urllist = qw(
               http://www.hidekik.com/
               http://www.hidekik.com/cookbook/
               http://www.fakefortest.abc
                 );

Put the URLs to be checked into an array. This script will access to these URLs.

 my $ua = LWP::UserAgent->new;

Make an object of LWP::UserAgent. In this example, just use default setting without setting any options.

 $ua->timeout(5);

Set a timeout seconds number. In this example, it is set as 5 seconds. If accessing URL does not respond for 5 seconds, it is recognized as a timeout.

 print "Content-Type:text/html\n\n";

Just display HTTP header for result displaying.

 foreach (@urllist) {

Get each URL from the array.

 my $response = $ua->get($_);

Access the URL using get method. If the page cannot be accessed through the get method, it becomes an error. It is better to access through the head method. However, some pages does not accept an access through the head method and it is treated as an error. Therefore, in this example use the get method. If to use the head method, it looks as follows.

 my $response = $ua->head($_);

$response will contain an response object.

 if ($response->is_success){
     print "$_ : is valid<br>";
 } else {
     print "$_ : is invalid<br>";
 }

Check if the get method could get the page correctly using the $response->is_access method. If success, display ``URL is valid'' on the browser, if fail, display ``URL is invalid''. Refer to HTTP::Response for the is_access method.

 print $response->status_line;

Display the message from the get.