beautifulsoup - Python scraping href iinks -




my goal scrape href links on base_url site.

my code:

from bs4 import beautifulsoup selenium import webdriver import requests, csv, re  game_links = [] link_pages = [] base_url = "http://www.basket.fi/sarjat/ohjelma_tulokset/?season_id=93783&league_id=4#mbt:2-303$f&stage=177155:$p&0="   browser = webdriver.phantomjs() browser.get(base_url) table = beautifulsoup(browser.page_source, 'lxml') game in table.find_all("a", {'game_id': re.compile('\d+')}):     href=game.get("href")     print(href) 

result:

http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4  ...... 

the problem can't understand why in result href links come 2 times?

as notice in image there same game_id 2 links

modified code: this 1 link

for game in table.find_all("a", {'game_id': re.compile('\d+')}):     if game.children:         href=game.get("href")         print(href) 




wiki

Comments

Popular posts from this blog

Asterisk AGI Python Script to Dialplan does not work -

python - Read npy file directly from S3 StreamingBody -

kotlin - Out-projected type in generic interface prohibits the use of metod with generic parameter -