Questions tagged [scrapy]

Scrapy is a fast open-source high-level screen scraping and web crawling framework written in Python used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Filter by
Sorted by
Tagged with
0
votes
0answers
20 views

Scrapy. find a tag by its content

How to find a tag by its content? This is how I find the necessary elements, but the structure on some pages is different and this does not always work. yield { ... 'Education'...
0
votes
1answer
18 views

How do I write something in fields with scrapy?

Is it possible to srite something in fields with scrapy ? For example I want to write my username and password in these fields.
0
votes
2answers
30 views

How to scrape information about a specific product using search bar

I'm making a system - mostly in Python with Scrapy - in wich I can, basically, find information about a specific product. But the thing is that the request URL is massive huge, I got a clue that I ...
0
votes
0answers
22 views

Scrapy is not using MY FilesPipeline, even though I've done all to enable it

I was using FilesPipeline for downloading files, but the files got downloaded with a hash in their name, so I decided to change that. I used this instructions, but the spider doesn't seem to ever use ...
0
votes
0answers
9 views

How to hook up scrapy-splash with aquarium

I am trying to crawl a website using scrapy and aquarium, the latter is a load balancer that handles multiple splash instances for rendering javascript. Im running aquarium using docker-compose up and ...
0
votes
0answers
10 views

How to import scrapy-user-agents with conda

The package can't be installed with conda install command and its documentation shows only the pip install. This is the error: PackagesNotFoundError: The following packages are not available from ...
1
vote
1answer
32 views

Scrapy: Extract Dictionary Stored as Text in Script Tag

Subject: Extract Dictionary Stored in Script Tag. Hello, I am trying to scrape this data from the tag. The goal is to be able to extract the data dictionary and get the values for each of the ...
-2
votes
0answers
20 views

Intelligence web scraping [closed]

i'm new to scrapy and i want to crawl an e-commerce web site which it should crawl products and i want an algorithm that crawl products frequently based on Price and Sellers change. For example the ...
0
votes
1answer
23 views

Send data to a url instead of saving when scrapy is done

I have been reading a lot on scrapy and have my code done to scrape a printer web page to get the information I want. Currently I can run the script with -o data.json What I am looking for is one of ...
0
votes
1answer
44 views

How to get href link from in this a tag?

I successfully get href link from http://quotes.toscrape.com/ example by implementing: response.css('div.quote > span > a::attr(href)').extract() and it gives all partial link inside href of ...
0
votes
0answers
20 views

Scrapy + Selenium 3 level depth pagination Spider

Please give me a tip how to solve my problem, I'll be very appreciate! I need to scrape data from the website tirebuyer.com . I want to scrape all info about each tire they have starting from ...
0
votes
2answers
30 views

Using scrapy to query a database for PDFs then download them

I am new to scrapy and python so please bear that in mind :) I am doing a piece of research and I need to download a lot of publicly available PDF docs from a government website. The problem is that ...
0
votes
0answers
19 views

Python Scrapy modify spider from extension

I want to modify my spider start_requests method from custom extension. The main goal is to send url to spider when I get response from my custom api. But method that I handed to spider doesnt run. ...
0
votes
0answers
22 views

Export scrapy as csv without header every re-running

Here is my code: items.py from scrapy import Item, Field class NetmallScrapyItem(Item): # define the fields for your item here like: phoneNum = Field() workTime = Field() spider.py ...
0
votes
0answers
15 views

How do filter out escape sequences while scraping tables using css selectors?

I am trying to scrape a table using CSS Selectors in Scrapy. The method I used is scraping row by row into a single scrapy.Field() in an item object. However, the data scraped contains a "\n\t\t" ...
0
votes
1answer
30 views

Cannot grab value within span within h2 within div

I'm trying to grab a VALUE that is in a span within a h2 tag within a div that has a CLASSNAME <div class="CLASSNAME"> <h2>TITLE</h2> <h3><span title="VALUE">$VALUE</...
0
votes
0answers
14 views

Make the JSON string Unicode string type

Im trying to solve this simple scrapy exercise - https://scrapingclub.com/exercise/detail_json/ But i cant find a way to convert the list returned by re.findall to a json file, i already did a lot of ...
-2
votes
0answers
19 views

Following urls in javascript - Scrapy Splash

I am extremely new to web scraping. I manage to extract information from static websites but am now trying my hand following urls and extracting data (which ofcourse involves some javascript). I have ...
0
votes
0answers
38 views

Cannot scrapy german website using Scrapy

I am doing web scraping using Scrapy, successfully created a spider which will crawl the full website including the internal links having same domain, I have used Link Extractor to achieve this. This ...
-1
votes
0answers
9 views

SyntaxError when running scrapy crawl function on anaconda prompt [duplicate]

When I run this on the Anaconda prompt: scrapy crawl oscars -o oscars.csv I get: In [12]: scrapy crawl oscars -o oscars.csv File "<ipython-input-12-b3c639b05380>", line 1 scrapy crawl ...
0
votes
0answers
13 views

I need help debugging python scrapy custom made request function

So I wanted to test myself and try to make Scrapy more like Requests. mainly for ease of use but I'm running into the Error below and I've read through it like 10 times and even changed some stuff so ...
0
votes
1answer
42 views

Extract Href using scrapy python

I am trying to extract href from below css <a aria-label="Flap Diaper Bag. By Burberry Kids. $1,190.00. Style: Archive Beige. " data-style-id="4851207" itemprop="url" class="Qc" href="/p/burberry-...
0
votes
1answer
45 views

Unsuccesful web scraping with selenium and scrapy

I'm trying to scrape this page (further, main page) using selenium + scrapy. All content here loads with javascript when scrolling down the page. I scrape every particular product page in the parse ...
0
votes
0answers
27 views

Cannot run a spider successfully after finishing scraping data by another spider through running a script

I am following code from this previous stackoverflow posts: How to schedule Scrapy crawl execution programmatically Running Scrapy multiple times in the same process The following script works well ...
0
votes
1answer
13 views

Error response from daemon - scrapinghub/splash

I have installed scrapy-splash and docker toolbox for windows7, in order to be able to scrape data from websites using javascript. Installation seems to be fine since all the checks are giving ...
-1
votes
1answer
17 views

How to solve JavaScript redirect issue on Python Scrapy?

I am fairly new to scrapy and following docs to scrape info on https://pbejobbers.com/abrasives using my script: import scrapy class CrwSpider(scrapy.Spider): name = "Otim" def ...
0
votes
0answers
19 views

Unable to identify the Get parameter for Python get requests

Am trying to get data from a website, but am not able find the following get parameter and h Get parameters runprog:thuk mode:page2 postcode_entered: postcode_entered_2:Enter Location (eg London - ...
0
votes
2answers
32 views

Crawling multiple tables using Scrapy

I need data from different tables. In this case tables [0:17] and table [18]. I don't need a table [17]. How to solve it in one Scrapy spider. This solution does not work. Scrapy currently fetches ...
0
votes
1answer
11 views

How can I yield the current response URL in scrapy_splash

If I try to yeild url using response.request.url in my parse() Method, It Returns: http://192.168.99.100:8050/execute Returning URL in Lua Script works, but I don't know How can I yield it in parse()...
0
votes
0answers
17 views

unable to recognize twisted reactor [duplicate]

I am trying to run a scrapy spider using twisted network. But the reactor is not recognized in my Pylint: from urllib.parse import urlparse from twisted.internet import reactor from scrapy.crawler ...
0
votes
2answers
23 views

Scrapy Extracting text based on a specific pattern in the class

I am trying to extract information based on a specific pattern in the HTML code. Ideally, I would like to extract the text for the div class that mentions "bg-deep-green" only. I am new to regular ...
0
votes
0answers
16 views

Multiple Inheritance in Scrapy's spiders, how to use methods from both parent classes

I have two spiders, each one starts from a different URL and ends in a different final page from the same domain. What I need is in the final page, but they to arrive there following a flow, due to ...
-3
votes
0answers
22 views

Amazon products detail scrapping using python [duplicate]

I want to scrape all details on a defined variable var. But when I create the soup object and use find_all, find or select it gives me an empty list. How can I fix this? import requests from bs4 ...
0
votes
1answer
53 views

How do I target specific element using class

I'm trying to scrap this website called startup-India in which I scrap the URL and Name of a company but to scrap the URL and the Name I have to target them but I don't know which is the right way to ...
0
votes
0answers
19 views

After running my spider(scrapy) for Amazon on the server I get 503 Service Unavailable

After deploying to server my scrapy project for amazon I get this error [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.amazon.com/s?k=rings&i=aps&ref=nb_sb_noss_2&...
1
vote
1answer
21 views

Azure App Service: specifying docker run port on container side

I want to use scrapinghub/splash container on Azure App Service (Web App for Containers) on Linux. But docker run command on deploy randomly changes the binding port of container side (see the log ...
0
votes
0answers
35 views

How to retrieve a URL to a function

I've came lot far but at this point I need help for completing my code which I'm trying to retrieve my URLs to my parse_static_content for scraping the name of the company the only help I'll be ...
1
vote
1answer
25 views

scrapy with payload request

I'm trying to get a POST request, but I don't know what's wrong with my code that the data doesn't come. The following message is displayed: HTTP status code is not handled or not allowed This is ...
1
vote
1answer
28 views

Match multiple user-agents in robots.txt with Scrapy

I am new to Scrapy and I would like to know how to make the spider obey the rules of two or more User-agents in the robots.txt file (for instance, Googlebot and Googlebot-Mobile). I am currently ...
0
votes
0answers
28 views

How to completely exit Scrapy shell?

I run my shell using inspect_response() function. I'd like to exit Scrapy shell, so I use Ctrl-D (or Ctrl-Z in Windows) to do this. However, I cannot completely do this, because Spider crawls ...
0
votes
0answers
21 views

ScrapyD with Django stops running after sometime

I have a whole django(2.2) application with scrapyd deployed on the production server. It seems to work fine but I noticed a trend recently that the crawler just stops running after some days. I run ...
0
votes
3answers
39 views

Retrieve items text inside dropdown list xpath

I have a select like this <select name="super_attribute[93]" data-selector="super_attribute[93]" data-validate="{required:true}" id="attribute93" class="super-attribute-select" aria-required="true"...
0
votes
0answers
8 views

scrapy get builtins.KeyError: -2

In my scrapy crawl i get this error sometimes builtins.KeyError: -2 But what does it mean? I dont find anything about that in scrapy Here is the full log about that 2020-01-20 13:28:08 [twisted] ...
0
votes
1answer
20 views

Decoupling single spider into different spiders in scrapy

I'd like to decouple parsing into a different spiders. Currently I have: class CategoriesSpider(scrapy.Spider): name = 'categories' allowed_domains = ['example.org'] start_urls = ['https:...
-3
votes
0answers
17 views

Scraping Top Videos From Instagram Hashtag [closed]

I am Working on an scraping task in which i have to get top 10 videos from given hashtag but how do i filter only the top videos from the search? in search it returns top posts and recent posts there ...
-2
votes
0answers
14 views

not able to install scrapy pip in visual code [closed]

copying src\twisted\internet\test\fake_CAs\thing1.pem -> build\lib.win32-3.8\twisted\internet\test\fake_CAs copying src\twisted\internet\test\fake_CAs\thing2-duplicate.pem -> build\lib.win32-3.8\...
-2
votes
0answers
17 views

What tool can I use to scrape current dynamic Alibaba's shipping costs? [closed]

I am currently using Python + Scrapy to obtain data from Alibaba site but I encountered a dynamic link which you click on it and shows a dynamic pop-up with different shipping prices. E.g. LINK (Top ...
-1
votes
0answers
25 views

How to get a parameter of a XHR request with scrapy

I need to scrape some documents on the website docplayer. As described here /a/57380171/6548527, it is possible to grab the PDF url which always has the template 'http://...
-1
votes
0answers
31 views

Scrape data from a public page on Facebook [closed]

I have tried different ways to scratch a public Facebook page, using the Facebook API, but I still get an error. With version 5.0 everything is changed. I just want to scratch the content of a public ...
0
votes
1answer
36 views

List via loop not being created

I'm trying to build a list of urls with a loop and then grab a data point from each url, but it only seems to do it for the last item (MMM) of the list and not all of them... what am I doing wrong? ...