Questions tagged [beautifulsoup]

Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.

Filter by
Sorted by
Tagged with
-4
votes
0answers
38 views

How to parse Google search results in Python 3?

I was trying to parse Google search results. I am getting data but I was trying filtering out unneeded URLs. Here is code parser def Praser(Req_content): Link_list = [] for Link_Data in BS(...
0
votes
1answer
39 views

How to fetch Parent and child items completely, Python 3.6

The below image link gives the hierarchy of one provider, similarly we have many providers, can I fetch all the details using API for all the providers?This is the output .csv which I am able to fetch ...
-1
votes
3answers
47 views

In BeautifulSoup, how do I search for an element that contains text but also has an ancestor with a certain class?

I'm using BeautifulSoup 4 with Python 3.7. I want to find an element that has the text " points" in its element, but also has an ancestor DIV whose class attribute contains "article". I have figured ...
0
votes
2answers
56 views

Beautiful Soup does not get full div

BeautifulSoup does something weird and I can't figure out why. import requests from bs4 import BeautifulSoup url = "nsfw" r = requests.get(url) soup = BeautifulSoup(r.text, 'html.parser') cards = ...
1
vote
1answer
49 views

Fetch complete List of Items using BeautifulSoup, Python 3.6

I am learning BeautifulSoup and I have choosen Link https://www.bundesbank.de/dynamic/action/en/statistics/time-series-databases/time-series-databases/743796/743796?treeAnchor=BANKEN&statisticType=...
-1
votes
2answers
36 views

Python Beautiful soup html parsing

I haven html content as below <h3>Features</h3> <ul id="features"> <li>Light weight fabric with fast Wicking technology for quick drying even during heavy sweating.</li&...
0
votes
0answers
28 views

Website Traffic data scraping using Python [closed]

The Serpscrap library only gives the rank details of a given keyword. import serpscrap keywords = ['one', 'two'] scrap = serpscrap.SerpScrap() scrap.init(keywords=keywords) result = scrap....
-2
votes
1answer
30 views

Parsing table to csv Python

I need to parse table from https://ege.hse.ru/rating/2019/81031971/all/?rlist=&ptype=0&vuz-abiturients-budget-order=ge&vuz-abiturients-budget-val=10 import requests from bs4 import ...
0
votes
1answer
40 views

How to scrape same class name data

I was trying to scrape some real estate websites but the one I came across has same class name under one div and that div has also 2 more div which has same class name. I want to scrape child class ...
0
votes
1answer
46 views

Beatiful Soup Extract Information

I'm trying to extract the name of the chemical, its occurrences/uses and date added by using beautiful soup. This is the one example of the chemical in the list https://oehha.ca.gov/chemicals/...
0
votes
2answers
41 views

How can I save output from BeautifulSoup as a csv?

I'm a beginner with python, and I'm trying to use it to scrape data from: https://www.spotrac.com/nfl/arizona-cardinals/sam-bradford-6510/cash-earnings/ (and other such pages) I really just need ...
0
votes
2answers
30 views

Extract Links from HTML In Line with Text with Python/BeautifulSoup

There are many answers to how to convert HTML to text using BeautifulSoup (for example /a/24618186/3946214) There are also many answers on how to extract links from HTML ...
-3
votes
1answer
33 views

Beautiful Soup: can't get the text (price) out of this [duplicate]

[ How can I get the price out of this mess.] <li class="price-current"> <span class="price-current-label"> </span> 36,659 <a class="price-current-...
-1
votes
1answer
35 views

BeautifulSoup unsuccessful request

I'm using urlopen to extract data from a list of websites right now but keep running in problems with unsuccessful requests. Can anyone help me with it? I save the website as HTML file path = "/...
1
vote
1answer
23 views

Get all values of href from a class in HTML snippet using beautifulSoup

I am trying to build a web scrapping tool but stucked in retrieving the css field. Given HTML snippet, how can i get all such values of href using beautifulSoup. Also, the class is uniquely used for ...
1
vote
1answer
35 views

beautifulsoup extracts only first 10 elements

Im trying to extract information from Volkswagen page on kununu. For example "Pro" information. url = 'https://www.kununu.com/de/volkswagen/kommentare' page = requests.get(url) soup = bs(page.text, ...
-1
votes
1answer
24 views

beautifulsoup extract by class value text

I want to extract paragraph data based on h2 class value. Below is html code. <div class="myClass"> <div itemprop="reviewBody" class="review-body"> <h2 class="h3">Test1</h2>&...
0
votes
1answer
34 views

Webscraping returns Variables instead of actual values

I am trying to scrape data from https://sunshinetour.com/stats/ however, if I try to access the anchor tags, it returns a variable instead of the actual value. This is my code: from bs4 import ...
0
votes
1answer
36 views

Requests / BeautifulSoup Facebook language error

I want to scrape facebook companies for their date (if they have). problem is that when I try to retrieve the HTML, I get the Hebrew version of it (I'm located in Israel) this is part of the result: ...
0
votes
0answers
21 views

Getting data from other elative pages after Login using BeautifulSoup

The following is the code that i have written and i am stuck after the login. i want to scrape though the data of the user after login. How can i do that import requests from bs4 import BeautifulSoup ...
0
votes
2answers
26 views

BeautifulSoup returns only text of a tag when selecting any tag by value, was expecting to get full tag returned

When I try to select tags based on the 'string' value of the tag, but without specifying a specific tag, I get only the String value returned and not the full tag. If I specify a tag along with a ...
0
votes
1answer
19 views

Retrieving URLs from href when 'a' is encased within 'strong' using Python 3 / BeautifulSoup

I am having a bit of trouble on this one. Here is an example of the html: <tr data-row="8"> <th scope="row" class="left " data-append-csv="abramjo01" data-stat="player"> &...
-1
votes
1answer
15 views

With BeautifulSoup 4 (lxml parser), how do I extract inner HTML from a tag (decode_contents not working)?

I'm using BeautifulSoup 4 and Python 3.7. I want to extract the inner HTML from a found article. I have this soup = BeautifulSoup(html, features="lxml") ... article_elt = top_article_elt.select('...
0
votes
0answers
20 views

BeautifulSoup not working cannot import name 'BeautifulSoup' from partially initialized module 'bs4'

I keep getting error code: from bs4 import BeautifulSoup ImportError: cannot import name 'BeautifulSoup' from partially initialized module 'bs4' (most likely due to a circular import) Can not find ...
0
votes
1answer
15 views

Get element in a list Beautiful soup

I'm using a css selector via chrome to get an image on a webpage but it returns a list with one elements that contains a long string with one of the element that I'm looking for. How can I get the ...
0
votes
1answer
27 views

Parse Table tag in Python

I am trying to extract data from a HTML file using python. I am trying to extract the table content from the file. Below is the HTML content of the table: <table class="radiobutton" id="...
0
votes
1answer
35 views

Extracting images from multiple urls

I would like to iterate through a list of urls and extract images from each page. However there are certain cases where an image does not exist and the url is different from the pattern of urls I ...
1
vote
1answer
27 views

how can we make our scraping to look like a real person browsing

So, I am scraping a website but every now and then I will get temp-banned for some minutes. I am using headers in my code for scraping but I was wondering if is there is any more stuff we can do to ...
0
votes
2answers
29 views

Getting ID attribute from an Element.tag

What is the best way to get the ID value (2758769 in below example) from a BeautifulSoup Element Tag and assign to a variable? type(an_element) Out[13]: bs4.element.Tag an_element Out[14]: <span ...
0
votes
3answers
53 views

pyhton Request not getting all data

I'm trying to scrape data from Google translate for educational purpose. Here is the code from urllib.request import Request, urlopen from bs4 import BeautifulSoup #https://translate.google.com/#...
0
votes
2answers
27 views

BeautifulSoup python … soup.find(id=“productTitle”) does not return anything

I am new to web scraping and would like to pull some information from amazon. I've wrote these few basic lines, but they are not working... import requests from bs4 import BeautifulSoup URL ='https:/...
-3
votes
1answer
28 views

unable to scrape 'div' from the website through beautifulsoup [closed]

enter image description here Tried to scrape car information from Turo, unable to retrieve the specific 'div'
0
votes
1answer
55 views

BeautifulSoup - find + iterate through a table

I am having some trouble trying to cleanly iterate through a table of sold property listings using BeautifulSoup. In this example Some rows in the main table are irrelevant (like "set search filters"...
-1
votes
1answer
52 views

How to scrape stats from NBA website with python

I am trying to scrape advanced stats from the NBA website, more specifically from this link https://stats.nba.com/leaders/?StatCategory=FG3M&PerMode=Totals. However, I seem to be getting the error ...
1
vote
2answers
41 views

Python Beautifulsoup extract text from different span with same class

As I'm new to datascience I'm trying to webscrape a real estate website in order to create a dataset with the listing, the problem that I run into is that different elements (rooms, surface and number ...
-1
votes
1answer
29 views

Beautiful Soup in Python is not giving me the correct number of links on the page

I am trying to count the number of links on a Web page using the following code: import requests from requests.exceptions import HTTPError from bs4 import BeautifulSoup import pandas as pd webpage =...
0
votes
0answers
18 views

How to scrape m3u8 from Network-Stream after a js reload (Python)

i run a python program using beautifulsoup and requests to scrape embedded videos URL , but to download theses videos i need to bypass a ads popups and javascript reload only then the m3u8 files ...
0
votes
1answer
22 views

how can i get only one of the href using beautifulsoup if there are two href in the same line?

I want to get the href using beautifulsoup from these html code, <a href="first_url" class="class" href="2nd_url" style="15px;">text</a> From here, I want to get first_url But using ...
-1
votes
1answer
54 views

Scraping - find the name of all sub-class

I'm trying to find a way to get number of sub-class and their name contained in a root class. For example, I would like to have in return for the class 'o-container__left u-mt-lg': class= "c-...
2
votes
1answer
29 views

BeautifulSoup cannot work in multithread program of Python

When I try to parse html by BeautifulSoup in multithread, I find it can not work. For presenting the problem, two experiments are run. The first one is used to demonstrate the two subprograms are ...
0
votes
1answer
13 views

how do i import beautifulsoup4 in Pycharm?

i installed bs4 through terminal(pip3 install beautifulsoup4), and as i typed 'from bs4 import beaufitulsoup' it seemed to work but it turns out not how can i solve this problem? i installed bs4 ...
0
votes
1answer
15 views

Retrieving text from twitter account using BeautifulSoup and Splinter

I am trying to retrieve the text from the latest tweet from https://twitter.com/marswxreport?lang=en I have tried the following: twitter_url = 'https://twitter.com/marswxreport?lang=en' ...
0
votes
0answers
45 views

Web scraping with VBA without browser automation [closed]

I need to scrape web data and update in Excel sheet with VBA. I have done it by Selenium Chrome driver. But using Selenium is like automating a browser to fetch the data. Is there any other library ...
0
votes
0answers
36 views

How can I get python to stop skipping cells when writing with xlwt

I have a set a files that I am reading data in from and printing it out into an excel file. For some reason, it is skipping the first 3 things to print and is skipping to the fourth string and cell. I ...
0
votes
2answers
39 views

Python Beautifulsoup : how to find a tag by attribute value without knowing corresponding attribute name?

Let's assume we have an Attribute value "xyz" without knowing the Attribute Name. It means we could match <a href="xyz"> but also <div class="xyz"> Is it possible search for ...
0
votes
2answers
26 views

Change HTML text and saveback to HTML

I am working on a simple way to wrap each sentence of an ebook formatted in HTML in span tags. I am using a trained machine learning model to classify end of sentence punctuation (".!?" ...) and get ...
0
votes
0answers
20 views

lxml etree error when using custom parser target

I am trying to parse the page with lxml using a custom parser target which stores the specific elements in a list and returns the rest. But I am getting an strange error on http://yahoo.com. File "...
-2
votes
0answers
34 views

BeatifulSoup and <script>

I'm using BeautifulSoup to extract some info on OTC from a site, just to learn how to do it, but I'm facing a problem. When I looked up the site I thought it was just getting the right tags with the ...
1
vote
1answer
22 views

BeautifulSoup not handling HTML table inside anchor tag

Consider the sample HTML code: <!DOCTYPE html> <html lang="en"> <head> <title>Testing</title> </head> <body> <a href="https://www.google.com"> ...
1
vote
4answers
39 views

BeautifulSoup finding multiple Categories

I'm trying to scrap some Wiki-page , just for training and I'm stuck, I Want to print Title of the page, last modified date and categories this is my code: from bs4 import BeautifulSoup import ...