Author Topic: cURL help  (Read 457 times)

0 Members and 1 Guest are viewing this topic.

Offline rrrick8

  • Senior Contributor
  • ****
  • Posts: 233
    • Vermilion weather
cURL help
« on: March 12, 2019, 09:22:56 AM »
Background... for years I've parsed 3 local news agencies headlines with a little snippet of the story and then a link to full story on the news agency website. This is done with their permission as it increases their hits. One is being done with a rss feed parser and the other 2 via php $data = file_get_contents.

Problem... A couple of weeks ago one of the scripts began failing with an error of
Code: [Select]
failed to open stream: Redirection limit reached, aborting in...Doing research, I've found some possible causes of this error and quite a few state that changing to cURL to get the page data will correct this (along with possible other remedies if that doesn't work, including un-proper php handling on the source site, meaning the news agency).
I use the exact same php script to get the data from 2 of the sources but I only get the error on one while the other works fine. The news agency's tech help has said that nothing changed on their site that they can find and they are confounded by this as well.
So this is my script to fetch the data along with some preg replace to correct characters and styling to fit my page...
Code: [Select]
<?php
error_reporting
(E_ALL E_NOTICE); // Report all errors except E_NOTICE warnings
ini_set('display_errors'0); // turn error reporting on
ini_set('log_errors'1); // log errors
ini_set('error_log'dirname(__FILE__) . '/error_log.txt'); // where to log errors


$data file_get_contents("http://www.vermilioncountyfirst.com/category/more-news");

if (preg_match('/More News<\/span>(.+?)<div id=\"nav-below\"/s' $data$n)){ 

$html trim($n[1]);
                         
$html preg_replace("/\’/" "'" $html);  
                         
$html preg_replace("/\‘/" "'" $html);   
                         
$html preg_replace("/\“/" " " $html);   
                         
$html preg_replace("/\”/" " " $html);  
                         
$html preg_replace("/\é/" "e" $html);  
                         
$html preg_replace("/\—/" " " $html); 
                         
$html preg_replace("/\medium”/" "1" $html);       
                         
$html preg_replace("/\—/" " - " $html);    
                         
$html preg_replace("/\–/" " - " $html); 
                         
$html preg_replace("/\Â\;/" " " $html);  
                         
$html preg_replace("/\*/" " " $html); 
                         
$html preg_replace("/x\-large/" "1" $html); 
                         
$html preg_replace("/\#000000/" "khaki" $html); 
                         
$html preg_replace("/\~/" " " $html);    
                         
$html preg_replace("/\&\#8221\;/" " " $html); 
                         
$html preg_replace("/color\: black/" "color\: khaki" $html);
                         
                }


echo $html;



?>

I can do basic php enough to get by on most items I need but my cURL knowledge is very limited. So my request is can someone help me with a cURL script to fetch the data in place of the php file_get method.
This is the webpage that the 3 scripts run on and display. You notice that the first on "VCF" is vacant.
http://vermilionweather.com/news.php

Thanks for any help in advance,
Rick

Severe Weather Manager-Vermilion County EMA
CWOP-CW9931 KILDANVI5

Offline the beteljuice

  • the beteljuice
  • Forecaster
  • *****
  • Posts: 316
    • test site
Re: cURL help
« Reply #1 on: March 12, 2019, 10:17:55 AM »
Try something like:
EDIT: Missing last line added !!!
Code: [Select]
  $ch = curl_init();                                           // initialize a cURL session
  curl_setopt($ch, CURLOPT_URL, "http://www.vermilioncountyfirst.com/category/more-news");  // connect to provided URL
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);                 // don't verify peer certificate
  curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);  //  connection timeout
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);              // return the data transfer
  curl_setopt($ch, CURLOPT_NOBODY, false);                     // set nobody
  curl_setopt($ch, CURLOPT_HEADER, true);                      // include header information
 
  curl_setopt($ch, CURLOPT_TIMEOUT, 4);         //  overall timeout (start to finish)

  $data = curl_exec($ch);                                      // execute session
  curl_close($ch);
« Last Edit: March 12, 2019, 08:42:10 PM by beteljuice »
Imagine what you will KNOW tomorrow !

Offline rrrick8

  • Senior Contributor
  • ****
  • Posts: 233
    • Vermilion weather
Re: cURL help
« Reply #2 on: March 12, 2019, 10:22:34 AM »
Thanks. I'll try it in a bit. Stepped out to help a neighbor with a problem.

Sent from my SAMSUNG-SM-G928A using Tapatalk

Severe Weather Manager-Vermilion County EMA
CWOP-CW9931 KILDANVI5

Offline rrrick8

  • Senior Contributor
  • ****
  • Posts: 233
    • Vermilion weather
Re: cURL help
« Reply #3 on: March 12, 2019, 11:16:04 AM »
Tried it but not getting anything on it.
I just replaced file_get_contents with the cURL but get nothing but also do not get any errors.
This is what I'm trying...
Code: [Select]
<?php
error_reporting
(E_ALL E_NOTICE); // Report all errors except E_NOTICE warnings
ini_set('display_errors'0); // turn error reporting on
ini_set('log_errors'1); // log errors
ini_set('error_log'dirname(__FILE__) . '/error_log.txt'); // where to log errors


$ch curl_init();                                           // initialize a cURL session
  
curl_setopt($chCURLOPT_URL"http://www.vermilioncountyfirst.com/category/more-news");  // connect to provided URL
  
curl_setopt($chCURLOPT_SSL_VERIFYPEER0);                 // don't verify peer certificate
  
curl_setopt($chCURLOPT_USERAGENT$_SERVER['HTTP_USER_AGENT']);
  
curl_setopt($chCURLOPT_CONNECTTIMEOUT3);  //  connection timeout
  
curl_setopt($chCURLOPT_RETURNTRANSFERtrue);              // return the data transfer
  
curl_setopt($chCURLOPT_NOBODYfalse);                     // set nobody
  
curl_setopt($chCURLOPT_HEADERtrue);                      // include header information
 
  
curl_setopt($chCURLOPT_TIMEOUT4);         //  overall timeout (start to finish)

  
$data curl_exec($ch);                                      // execute session

if (preg_match('/More News<\/span>(.+?)<div id=\"nav-below\"/s' $data$n)){ 

$html trim($n[1]);
                         
$html preg_replace("/\’/" "'" $html);  
                         
$html preg_replace("/\‘/" "'" $html);   
                         
$html preg_replace("/\“/" " " $html);   
                         
$html preg_replace("/\”/" " " $html);  
                         
$html preg_replace("/\é/" "e" $html);  
                         
$html preg_replace("/\—/" " " $html); 
                         
$html preg_replace("/\medium”/" "1" $html);       
                         
$html preg_replace("/\—/" " - " $html);    
                         
$html preg_replace("/\–/" " - " $html); 
                         
$html preg_replace("/\Â\;/" " " $html);  
                         
$html preg_replace("/\*/" " " $html); 
                         
$html preg_replace("/x\-large/" "1" $html); 
                         
$html preg_replace("/\#000000/" "khaki" $html); 
                         
$html preg_replace("/\~/" " " $html);    
                         
$html preg_replace("/\&\#8221\;/" " " $html); 
                         
$html preg_replace("/color\: black/" "color\: khaki" $html);
                         
                }


echo $html;



?>
Severe Weather Manager-Vermilion County EMA
CWOP-CW9931 KILDANVI5

Offline SteveFitz1

  • Forecaster
  • *****
  • Posts: 521
    • Tyler Texas Weather
Re: cURL help
« Reply #4 on: March 12, 2019, 06:20:08 PM »
I don't believe your "preg_match" is coming up "true". Did something in the html following "More News" change that would cause the "new" expression not to match your code?

Steve

Offline rrrick8

  • Senior Contributor
  • ****
  • Posts: 233
    • Vermilion weather
Re: cURL help
« Reply #5 on: March 12, 2019, 06:27:50 PM »
I don't believe your "preg_match" is coming up "true". Did something in the html following "More News" change that would cause the "new" expression not to match your code?

Steve

Not that I noticed. Their IT person said there were no changes recently.
Severe Weather Manager-Vermilion County EMA
CWOP-CW9931 KILDANVI5

Offline SteveFitz1

  • Forecaster
  • *****
  • Posts: 521
    • Tyler Texas Weather
Re: cURL help
« Reply #6 on: March 12, 2019, 06:37:21 PM »
When I do a "View Source" on the page you're going after, I don't see any line with "More News" that is followed by the remaining characters you're trying to match.

Steve

Offline the beteljuice

  • the beteljuice
  • Forecaster
  • *****
  • Posts: 316
    • test site
Re: cURL help
« Reply #7 on: March 12, 2019, 08:39:51 PM »
Couple of things .....

First of all an  :oops:

Last line of code MISSING !!

curl_close($ch);

.... but ....
The url you are after is 'protected' (that's why you were getting redirect limits)
Quote
The website you are visiting is protected and accelerated by Incapsula. Your computer may have been infected by malware and therefore flagged by the Incapsula network. Incapsula displays this page for you to verify that an actual human is the source of the traffic to this site,

I think I've found what they are doing.

Cookie: visid_incap_150465
value: yVBnMcYHTXuSnpFKLeG6EQJCiFwAAAAAQkIPAAAAAACA582KAVImBNp1Ta5vFybeG85FQn35PYBw
Expiry: 11 March 2020, 09:14:44

Try the code with the missing closing line first, if that still doesn't do the business ..

So to put that in cURL
Code: [Select]
  $ch = curl_init();                                           // initialize a cURL session
  curl_setopt($ch, CURLOPT_URL, "http://www.vermilioncountyfirst.com/category/more-news");  // connect to provided URL
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);                 // don't verify peer certificate
  curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER['HTTP_USER_AGENT']);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);  //  connection timeout
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);              // return the data transfer
  curl_setopt($ch, CURLOPT_NOBODY, false);                     // set nobody
  curl_setopt($ch, CURLOPT_HEADER, true);                      // include header information
  curl_setopt($ch, CURLOPT_COOKIE, 'visid_incap_150465=yVBnMcYHTXuSnpFKLeG6EQJCiFwAAAAAQkIPAAAAAACA582KAVImBNp1Ta5vFybeG85FQn35PYBw');
  curl_setopt($ch, CURLOPT_TIMEOUT, 4);         //  overall timeout (start to finish)

  $data = curl_exec($ch);                                      // execute session
  close_curl($ch);

If it works, It probably means sometime within the next year you may have to visit the site through your browser and search for an updated cookie  :roll:

If it doesn't work, sorry - at the limits of beteljuice cURL knowledge base !
« Last Edit: March 12, 2019, 08:43:09 PM by beteljuice »
Imagine what you will KNOW tomorrow !

Offline the beteljuice

  • the beteljuice
  • Forecaster
  • *****
  • Posts: 316
    • test site
Re: cURL help
« Reply #8 on: March 13, 2019, 03:14:54 AM »
Just tried a more thorough test - It FAILS  #-o

cURL is picking up a 302 page with a single line saying "Loading"

That is the 'security' page I think as it is trying to set a session cookie:
Set-Cookie: incap_ses_534_150465=i701XY8a0z1PPKDkDydpBx2riFwAAAAAFDOX7wBK0/o8mMlBC7+YVQ==;

I sent that back to them, with a more positive 'response' but exactly the same single word content came back  :roll:

HTTP/1.1 302 Found
Cache-Control: no-cache
Content-Type: text/html
Connection: close
Content-Length: 122
X-Iinfo: 9-25749619-0 0NNN RT(1552461775820 0) q(0 -1 -1 1) r(0 -1) B13(11,151897,0) U18
Set-Cookie: visid_incap_150465=SPP/sU3RTjClVnrv4NOEPM+viFwAAAAAQUIPAAAAAAAqBgILilwNRCcBlVamviFV; expires=Wed, 11 Mar 2020 13:29:27 GMT; path=/; Domain=.vermilioncountyfirst.com
Set-Cookie: incap_ses_534_150465=yrJ+OZj9SEljEKTkDydpB8+viFwAAAAAiqyhKkuw4lSxz52eoCXTAg==; path=/; Domain=.vermilioncountyfirst.com
Set-Cookie: ___utmvmSauZfmF=IYYfGljOteg; path=/; Max-Age=900
Set-Cookie: ___utmvaSauZfmF=uqgnVqQ; path=/; Max-Age=900
Set-Cookie: ___utmvbSauZfmF=aZY
    XUqOwals: FtB; path=/; Max-Age=900
Location:

So what needs to be sent to them I don't know.

Of course the whole point of the 'captcha' page is to stop you doing what you want to do !

You'll have to try the provider again and see if there is somesort of 'back-door'.

Sorry ........
« Last Edit: March 13, 2019, 08:30:47 AM by beteljuice »
Imagine what you will KNOW tomorrow !

Offline rrrick8

  • Senior Contributor
  • ****
  • Posts: 233
    • Vermilion weather
Re: cURL help
« Reply #9 on: March 13, 2019, 08:16:28 AM »
Thank you so much for your effort. I tried to use your codes last night but had no luck either.

Severe Weather Manager-Vermilion County EMA
CWOP-CW9931 KILDANVI5

 

anything