Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Filter by
Sorted by
Tagged with
0
votes
1answer
13 views

How do I get scrap data from web pages using beautifulsoup in python

I am trying to scrap the data from the given link bellow, a link And I an saving it into csv file. I got all movies name, but in other format bellow, please see bellow: I am getting bellow format ...
0
votes
0answers
9 views

How do I create a dataframe of jobs and companies that includes hyperlinks?

I am making a function to print a list of links so I can add them to a list of companies and job titles. However, I am having difficulties navigating tag sub-contents. I am looking to list all the '...
0
votes
1answer
23 views

How do I scrape pdf and html from search results without obvious url

I would like to scrape the pdfs and htmls from the search results from this page: http://www.nas.gov.sg/archivesonline/speeches/search-result?search-type=advanced&speaker=Lee%20Kuan%20Yew and ...
0
votes
1answer
28 views

Scrape Span Text from Google

I'm new to scraping and I'm trying to scrape text from google search results but I keep getting empty results. I have a list of names and I need to get their google search Text results from <span ...
0
votes
0answers
26 views

How to download the excel file which has no link in Beautifulsoup?

I want to download a excel file from a website. However, there is no link for that file after I check the HTML codes. The file I download after I click "Excel Output" button in the web page is ...
0
votes
0answers
18 views

Issue with SSL Certificate with python 3.7

So I have recently started working with beautifulsoup to automate getting data and storing it for specific tasks. Recently, I started getting an SSL Certificate Failed error message. After some ...
1
vote
0answers
46 views

How to web scrape from two websites in one script?

I am currently working on a model and need to gather information not just regarding game results (this link https://www.hltv.org/stats/teams/matches/4991/fnatic?startDate=2019-01-01&endDate=2019-...
2
votes
2answers
39 views

Get the text and remove all tags but retain tags for the titles and bolds

I am extracting the texts from a website using the text = soup.find('div', class_="entity").get_text(" ") , but there are some tags/titles (<p><b>Micro customers:</b></p>) ...
-1
votes
1answer
29 views

Get rid of script text in HTML using beautifulsoup

I want to analyze all visible text from an HTML. Url To get rid of all HTML elements I currently use: from bs4 import BeautifulSoup import re soup = BeautifulSoup(test.content, 'html.parser') ...
0
votes
1answer
25 views

Is there a way to find the exact path of an element in the requests module in Python?

Is there a way to select the exact "div" in a source of a Beautiful Soup object? For example, let's say we have soup like this: <div class="dialog-shadow" id="popupMenu1" onblur="hidePopup();" ...
0
votes
1answer
38 views

How to Scrape Fidelity.com with BeautifulSoup

I am trying to scrape the stock symbol from this page: https://quotes.fidelity.com/mmnet/SymLookup.phtml?reqforlookup=REQUESTFORLOOKUP&productid=mmnet&isLoggedIn=mmnet&rows=50&for=...
1
vote
3answers
47 views

Not able to find a link in a product page

I am trying to make a list of the links that are inside a product page. I have multiple links through which I want to get the links of the product page. I am just posting the code for a single link. ...
1
vote
1answer
17 views

How get the text with BeautifulSoup in this html code: <span id=“pass_0” class=“text-success”>c#</span>

I'm doing a program that crack some hash, through selenium and beautifulsoup with this website: https://hashkiller.co.uk/Cracker from selenium import webdriver from selenium.webdriver.common.keys ...
-2
votes
2answers
23 views

Why is my Scraper Pulling Text as though I'm not Logged In?

I'm using log in credentials to access the pricing of a specific webpage. However, my code is pulling "See My Price" instead of the actual price (as if I'm not logged in). The Chrome session driver ...
1
vote
0answers
44 views

Using Regex or bs4.find through a HTML with Javascript, Whats better performance?

Over these past days my brain have been thinking alot about what's better to scrape in the long term, meaning memory usage, CPU usage etc etc.. Lets say for an example we have a code: <!DOCTYPE ...
1
vote
0answers
26 views

BeautifulSoup4 find_all strange behavior when called from another object

I have a html doc with a form that I read and pass onto BeautifulSoup to find forms on it as follows: soup = BeautifulSoup(html_doc, 'html.parser') soup.find_all('form') This returns a result set ...
2
votes
1answer
27 views

Losing information when using BeautifulSoup

I am following the guide of 'Automate the Boring Stuff with Python' practicing a project called 'Project: “I’m Feeling Lucky” Google Search' but the CSS selector returns nothing import requests,sys,...
0
votes
4answers
46 views

How to get the desired value in BeautifulSoup?

Suppose we have the html code as follows: html = '<div class="dt name">abc</div><div class="name">xyz</div>' soup = BeautifulSoup(html, 'lxml') I want to get the name xyz. ...
0
votes
1answer
19 views

using a for loop for web scraping - cannot “pass” certain data

The following code is supposed to scrape the rating and the date the rating was posted. The issue here is, that employees answers the negative reviews, and the date of their post is scraped as well. ...
0
votes
2answers
33 views

can't select specific html element using beautiful soup

I'm trying to find an element that's a tbody nested inside the all_totals id (it's definitely there, I checked). import requests from bs4 import BeautifulSoup, Comment url = 'https://www.basketball-...
0
votes
1answer
22 views

How can I use the CSS selector to select a form when there are multiple forms on a page?

I am trying to fill out a specific form on a webpage but the CSS selector I am using returns an error I have tried these separately: 1. browser.select_form('form[method="post"]') 2. browser....
0
votes
1answer
36 views

Console returns none 12 times. There are 12 images. Can images not be scraped?

I'm trying to build a scraper to get all the listings images from this site. I figured out how to get all the pages into a .txt file, but while trying to do the first page with this code the console ...
0
votes
1answer
29 views

Scraping Wikipedia Content from Picture of The Day

I am trying to scrape a certain type of Wikipedia page and want to generalize it enough so I can iterate my scraping over multiple pages. You may use this page as an example page: https://en.m....
0
votes
0answers
19 views

I want to download all csv file from a website with http authentification [on hold]

I want to download all csv files from a website protected by http authentication the site has this structure: site/Month_07/Day_18/T_13-46-59-885_dir/dir/file.csv site/Month_08/Day_20/T_13-46-59-...
-2
votes
0answers
34 views

Error in syntaxe when I type two times “for” [on hold]

I try to scrape multiple information in multiple pages of a restaurant. This is a part of my header code: for i in range(260,1231): my_url = "https://www.tripadvisor.fr/Restaurant_Review-...
1
vote
3answers
71 views

Trying to get only the text between two strong tags

I am currently trying to get only the HTML text (a list of names) that is between the first two occurrences of the strong tag. Here is a short example of the HTML I scrapped <h3>Title of ...
3
votes
2answers
37 views

How to efficiently parse html list into a dict?

I am wondering how I can streamline this mess of code and put the output into a nice dictionary instead of list of tuples. Can I use BeautifulSoup in a better way, how? from bs4 import BeautifulSoup ...
1
vote
1answer
50 views

Python bs4: How to Repeat “For” Loop with a Different Scraped Page if a Certain Condition is Met?

I am trying to create a for loop where once it gets to the last search_result attribute in the scraped page, it will repeat the loop but with the data of a new scraped web page. After the for loop ...
1
vote
1answer
32 views

How do I retrieve URLs and data from the URLs from a list of weblinks

"Hello, i am quite new to web-scraping. I recently retrieved a list of web-links and there are URLs within these links containing data from tables. I am planning to scrape the data but can't seem to ...
3
votes
3answers
82 views

Slicing function on <for> loop

I am a beginner coder, using python 3.7.1 on a windows 10 with Visual Code Studio. As exercise I am trying to scrap from a webpage some data organized by a table. Now, I want to extract some ...
0
votes
1answer
63 views

How scrape multiple pages and differents items for each page?

I'm beginner in python, just some weeks trying to do my webscrape. I need to scrape multiple pages of one restaurant on tripadvisor, using beautifulsoup on windows32. In each page, I need to take ...
0
votes
2answers
41 views

Beatiful soup parse page table probelm

I want to get the data (numbers) from this page. With those numbers I want to do some math. My current code: import requests from bs4 import BeautifulSoup result = requests.get("http://www.tsetmc....
1
vote
0answers
26 views

How to parse the <style> element's contents using beautifulsoup or any css parser tools? [duplicate]

I have the following sample page and I would like to parse the contents of the style element: <html> <head> <style> body { background-color: lightblue; } h1 { ...
1
vote
1answer
32 views

Extracting time using beautiful soup with regular expressions

I need help to come with the correct syntax when using beautiful Soup with regular expressions I am using the code below to scrap only the time. The time is located in a DIV that includes a paragraph....
2
votes
2answers
48 views

Why am I getting no output?

I have a pretty simple question really - why am I getting no output? This is the site: https://riven.market/list/PC/Veiled. I thought the problem was the spaces in class name but turns out its natural ...
0
votes
1answer
37 views

Can't find text from page using Python BS4

I am trying to learn how to use BS4 but I ran into this problem. I try to find the text in the Google Search results page showing the number of results for the search but I can't find no text 'results'...
1
vote
1answer
36 views

Speed up BeautifulSoup parsing?

I need to process weather data from this website (https://www.ftp.ncep.noaa.gov/data/nccf/com/gfs/prod/gfs.20190814/06/), each file is around 300MB. Once I download the file, I only need to read in a ...
1
vote
1answer
25 views

Python bs4: How to Repeat “For” Loop with a Different Expression List if a Certain Condition is Met?

I am trying to create a for loop where once it gets to the last comment-index attribute on page 1, it will repeat the loop but with the data of page 2. data_page_1 = '''<div> <div> &...
0
votes
0answers
10 views

How to 'force' render javascript when using HTMLSession.render() during web scraping?

I need to scrape the postcode data from the website. https://www.pos.com.my/postal-services/quick-access/?postcode-finder#postcodeIds=01000 First I started with the usual BeautifulSoup workflow but ...
-1
votes
1answer
14 views

BS4 error 'NoneType' object has no attribute 'find_all'. Cannot parse html data

BS4 error 'NoneType' object has no attribute 'find_all'. Cannot parse html data. import requests from bs4 import BeautifulSoup as bs session = requests.session() def get_sizes_in_stock(): ...
0
votes
1answer
13 views

Scraping a webpage with embedded tweet

I am trying to scrape a web page which has an embedded tweet https://thehill.com/homenews/news/376608-west-virginia-teachers-to-continue-strike-after-state-senate-passes-lower-raise. When I use ...
0
votes
1answer
26 views

BeautifulSoup RSS Feed extract a tab retruning “1”

Using python3, BeautifulSoup, trying to get rss feed, in <description> tag inside there is <a> and <img> tag. I want to get only <a> tag href <img> tag src import ...
-6
votes
0answers
47 views

Python web scraping and parsing using beautifulsoup [on hold]

I am scraping a website using python and parse html through beautifulsoup. This is done switching tab to tab. when i check the requests and response it is proper but the end result gives me error as "...
0
votes
3answers
46 views

How to extract link under a <li> tag with a specific class?

<li class="a-last"><a href="/macbook-pro">Buy Now</a></li> How can you extract the link /macbook-pro inside the class a-last? Efficiency is a consideration.
0
votes
1answer
30 views

Python BeautifulSoup: How to Find Last Occurrence of Tag with Specific Attribute

How can you find the last occurrence of a tag with this attribute: data-index without having the value of it? I have written the code below but it returns IndexError: list index out of range although ...
0
votes
1answer
71 views

Getting a return of [] when using BS4 on a webpage

I'm trying to return data from a website using bs4. I'm not sure if I'm targeting the right classes or using bs4 the wrong way to get the table of information I want. I've tried using different ...
1
vote
2answers
74 views

How to web scrape through multiple pages with BeatuifulSoup

I am trying to web scrape multiple pages with beutifulsoup and I have successfully retrieved data for a single page. Now, I wonder how should I implement some loop to retrieve the data from through ...
0
votes
3answers
42 views

BeautifulSoup: `find_all` and `get_text`

I have some xml that is formatted like this: <Paragraph Type="Character"> <Text> TED </Text> </Paragraph> <Paragraph Type="Dialogue"> <Text> ...
0
votes
1answer
25 views

“How to find correct tags in nested HTML using Beautiful Soup, receiving list index out of range error or empty list”

Printing out the tags under ('a') works perfectly to bring out the description for each of the houses on the website. Trying to replicate this for the price using any tag('price' for example) doesn't ...
-1
votes
1answer
45 views

I am looking for the best way to parsing html code [on hold]

Im working on an application to school. Its task is to read the subject name and div class name. Class name is different than if you were present or not. At the end I have to summarize the attendance ...