beautifulsoup - Python scraping href iinks -
my goal scrape href links on base_url site.
my code:
from bs4 import beautifulsoup selenium import webdriver import requests, csv, re game_links = [] link_pages = [] base_url = "http://www.basket.fi/sarjat/ohjelma_tulokset/?season_id=93783&league_id=4#mbt:2-303$f&stage=177155:$p&0=" browser = webdriver.phantomjs() browser.get(base_url) table = beautifulsoup(browser.page_source, 'lxml') game in table.find_all("a", {'game_id': re.compile('\d+')}): href=game.get("href") print(href)
result:
http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502579&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4 http://www.basket.fi/sarjat/ottelu/?game_id=3502523&season_id=93783&league_id=4 ......
the problem can't understand why in result href links come 2 times?
as notice in image there same game_id 2 links
modified code: this 1 link
for game in table.find_all("a", {'game_id': re.compile('\d+')}): if game.children: href=game.get("href") print(href)
wiki
Comments
Post a Comment