有冇咩工具,軟件,程式 可以做到 crawling of id of a hyperlink?

本帖最後由 thken 於 2015-5-27 12:15 編輯

Say, if i have a link, for example:
http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-000000

where I wish to detect which website is useful from a list:
http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-000000
http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-000001
...
http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-999999
without saying "no this user"

Any tools/ program/ software can achieve the crawling of id of a hyperlink, or detect the useful id?

麻煩各位大大!!!

回覆 1# thken


    write a php and use curl to check the return of each url

TOP

唔係太識編程, CHING 可唔可以比多少少HINT??

TOP

回覆 3# thken
  1. set_time_limit(0);
  2. $result=array();
  3. for($i=0;$i<=5;$i++){
  4. if(file_get_contents("http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-".str_pad($i, 6, "0", STR_PAD_LEFT)) != "No this user!")
  5.         $result[]=$i;
  6. }
  7. echo '<pre>'.print_r($result,1).'</pre>';
複製代碼

TOP

python open source project
http://scrapy.org/

TOP

本帖最後由 7h1r733n 於 2015-6-2 06:44 編輯

回覆 5# hongkong_netcop

Life is short, use Python
  1. import urllib.request as r

  2. for i in range(200100, 200107):
  3.     if r.urlopen("http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=mag-c" + str(i).zfill(6)).read() != b"No this user!":
  4.          print("found " + str(i))
複製代碼

TOP

點樣 將ID 轉到可以夾埋英文? e.g. 1234aa

TOP

http://www.hkedcity.net/ihouse_tools/ihouse.phtml?id=cyc-03041a

因爲其中一個開到既網可能夾埋英文

TOP

因爲其中一個開到既網可能夾埋英文
thken 發表於 2015-6-7 00:58


似係hex....

TOP

可以試試Beautiful Soup,更易於使用  

http://www.crummy.com/software/B ... 4/doc/index.zh.html

TOP