Questions tagged [scrapy]

Scrapy is a fast open-source high-level screen scraping and web crawling framework written in Python used to crawl websites and extract structured data from their pages. It can be used for a wide range of purposes, from data mining to monitoring and automated testing.

Filter by
Sorted by
Tagged with
0
votes
1answer
9 views

How to pass a variable from Lua script to Javascript using splash and scrapy?

I was working on a scraping project made of scrapy and splash. I am a newbie in Lua and Javascript. I am in a situation where I need to send a variable from Lua to Javascript. but I am not being able ...
0
votes
1answer
17 views

Scrapy wont connect to MSSQL database

My spider is fully working and I can export the data to JSON, CSV and to a MongoDB. However, since I will be dealing with large chunks of data, I would like to use MSSQL. I've browsed through google ...
1
vote
0answers
16 views

Unable to paginate with Selenium and Scrapy

I scrape a website with Scrapy. My problem is that the pagination is using javascript. So I can't loop through a link. I try to figure that out with Selenium but I have multiple errors with a lot of (...
0
votes
0answers
11 views

scrapping next page using scrapy using css

I am scrapping zomato page i need item name and description from next page. I am comfortable with css tags so using those. I have created anaother function parsse_next to do so but not able to find ...
-2
votes
0answers
19 views

Why I am getting this error in JSON in list price = data[\“offers\”][\“priceCurrency\”]\nTypeError: string indices must be integers

in parse\n price = data[\"offers\"][\"priceCurrency\"]\nTypeError: string indices must be integers\n2019-08-23 17:38:00 [scrapy] INFO: Closing spider (finished)\n2019-08-23 17:38:00 [scrapy] log.msg('...
0
votes
2answers
25 views

writing scrapy logs into a file

I have a scrapy spider with LOG_LEVEL = 'DEBUG',How can I write the log message that appear (while spider is running) into a simple text file instead of reading them directly from terminal. Note: I ...
0
votes
1answer
26 views

Scrapy spider not saving items to PostgreSQL database

I have some Scrapy spiders that get properties advertisement info and stores on database. It was already working when I start in the company, but we had to migrate our DB from GCP to AWS, so I've ...
0
votes
2answers
31 views

Why import scrapy.utils.project can load setting?

Followed the post Reading settings in spider scrapy ,i load settings in middlewares.py successfully. from scrapy.utils.project import get_project_settings settings=get_project_settings() I wonder ...
0
votes
1answer
32 views

Scrapy output empty

I am trying to use Scrapy to extract paper titles from IEEE Xplore by scrapy shell 'https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=5962385' For the first paper title, I used copy ...
0
votes
0answers
34 views

Is there a way to get the class that shows “$0” element when inspecting HTML?

I'm a newbie and I'm trying to scrape postcodes from this website "https://www.doogal.co.uk/UKPostcodes.php?Search=AB". Got stuck when trying to create a variable for a td tag that has a class "$0". ...
0
votes
1answer
14 views

Scraping text between pseudo elements

I am trying to crawl an auction website(https://onlineonly.christies.com/s/first-open-post-war-contemporary-art/massimo-vitali-b-1944-203/43092). I want to use css.selector to select the price of the ...
0
votes
1answer
17 views

scrapy shell unable to open response in firefox

I had some trouble using firefox today, so I removed it then reinstalled it and it worked fine, however, I use as default browser but I can't open any scrapy shell view(response) in it, although the ...
1
vote
0answers
14 views

Python + Scrapy: Issues running “ImagesPipeline” when running crawler from script

I'm brand new to Python so I apologize if there's a dumb mistake here...I've been scouring the web for days, looking at similar issues and combing through Scrapy docs and nothing seems to really ...
0
votes
0answers
14 views

Why did the crawler get 307 status code when to deploy in scrapinghub?

The target url: https://finance.yahoo.com/quote/0002.HK/balance-sheet?p=0002.HK A scrapy project can work locally,extract info in the target url successfully.Now i deploy it into scrapinghub,start ...
0
votes
1answer
22 views

Scrapy concatenate array elements inside div in python

I need to concatenate some text inside a <div> with xpath in Scrapy. The div has the next structure: <div class="col-12 e-description" itemprop="description"> "-Text1" <br> &...
2
votes
3answers
67 views

How to get rid of duplicate links while crawling a website using python scrapy?

I have the following code which crawls the given website address but the problem is that it duplicates the URL while crawling. I need unique and complete list of URL which can be reached from home ...
2
votes
1answer
20 views

Scrapy - “scrapy crawl” catches exceptions internally and hides them from Jenkins' “catch” clause

I'm running scrapy through Jenkins on a daily basis, and I want exceptions to be sent to me in emails. This is an example spider: class ExceptionTestSpider(Spider): name = 'exception_test' ...
1
vote
2answers
36 views

Scrapy crawler gives out KeyError

I am trying to scrape drugs information from a UK site using Scrapy, but I am getting "KeyError: 'Item does not support field: title'". I can't figure out what's the problem here. I have tried ...
0
votes
2answers
33 views

How to conditionally save scrapy results with IF statement?

I want to generate some urls dynamically and crawl the generated addresses. The shell command which I use here, will generate 128 different URLs in every single query and save them in a txt file. ...
0
votes
1answer
29 views

How to parse a link within the fucntion parse, and return locally?( Scrapy ) [on hold]

I run through the def function parse , and within this function i want to return some data from another link. I do not want to exit the function parse, but rather return the value locally within the ...
0
votes
1answer
19 views

How to exclude a tag in certain position within certain class with xpath?

I have this sample tag : <div class='aaa'> <p>aaa</p> <div>bbb</div> <div>ccc</div> <div class='ddd'> <div>ddd</div>...
0
votes
1answer
40 views

Scrapy get's redirected to follow 302 and it does not crawl the site

Scrapy gets 302 redirect to another link. In the link 'https://xxxxxx.queue-it.net?c.....com' Scrapy does not add the '/'. It should be'https://xxxxxx.queue-it.net/?c.....com'. I have tried adding '...
0
votes
0answers
19 views

Scraping AJAX content with Scrapy when page needs POST information first

I'm trying to scrape a site with Scrapy anf the site has Ajax content. I can't post a link because it requires a login to view the content. The content displays a loading message for a few seconds and ...
1
vote
2answers
40 views

Airport Data - How do i scrape data and maintain the heirarchy?

I am trying to scrape information from https://skytraxratings.com/airports/hamad-international-airport-rating I want to make a csv/excel which has the following columns - category, - subcategory - ...
0
votes
0answers
22 views

How to exit scrappy with status 1 running in Airflow DAG

I'm trying to exit from scrapy with the status code 1 on exception. The script is running via DAG. But the task is not exiting with status code 1 try: photo = requests.get(self.img_url + '/' + ...
0
votes
2answers
55 views

How to save all the extracted data from scrapy while crawling?

I am using scrapy to crawl all the links from a website but i am not getting the way to save all the extracted links. Although i am able to add the extracted links in a python set but i am not able ...
0
votes
0answers
21 views

Scrapy two factoar authentication with different crawlers

I am trying to scrap gitlab.com with two factor authentication, which works fine when I have a single spider. I login to the git lab and then take input for otp from console and then submit it. ...
0
votes
3answers
34 views

Why can't this class be found with this CSS selector?

I am trying to use either an xpath shortcut or a css selector to find all objects on the page that fit this: <span class="perWord ng-binding">$0.20</span> I am struggling to understand ...
0
votes
0answers
17 views

Scraping different attributes from html [duplicate]

I am trying to scrape data from html div tag, but I need to get other attributes from it. I understand that I could get it if it was text itself, but now I need the text which says "Max persons:2". ...
0
votes
0answers
22 views

scrapy keeps redirecting (meta refresh)

I am trying to scrape a website. However, the host keeps redirect the spider until it hits max redirections reached. Logs are as follows: 2019-08-21 17:10:56 [scrapy.downloadermiddlewares.redirect] ...
0
votes
1answer
25 views

Scrapy response returning [] but prints in terminal

I'm attempting to scrape Indeed.com and want to get information pertaining to each job in their respective div. The response will print out in the terminal, but when I write to a file or run the ...
0
votes
0answers
25 views

scrapy startproject tutorial command throwing errors

After i downloaded scrapy using pip3 install scrapy in my Ubuntu shell on Windows 10 (using Windows Subsystem for Linux), when i try the command scrapy startproject tutorial I get thrown this error ...
2
votes
1answer
26 views

Issue with scraping href in Python using Scrapy Spyder

I am currently trying to scrape the href from the title on a craiglist page. I am using python scrapy, and have been having trouble with it I have tried several things, I don't understand what is ...
0
votes
0answers
12 views

How to fix HttpErrorMiddleware in scrapy spider

I am trying to so horizontal crawling using Scrapy. Using xpath, I am getting the link of each listing in a real-estate site, as well as the next-page link. However, when I run my spider I keep ...
0
votes
2answers
25 views

Using python requests library on Scrapy

How can I use requests on a spider in Scrapy? import scrapy, requests def parse(self, response): # do things... # then yield requests.get(response.url, callback=self.parse, dont_filter=...
0
votes
1answer
15 views

How to do horizontal crawling with scrapy

I am trying to do horizontal crawling with scrapy. With an Xpath,I am getting the link that is going to lead me to the next page. Then I am trying to concatenate this link to the url of the site ...
0
votes
0answers
13 views

No module named urllib3/faker while using Scrapy

I'm trying to use urllib3 and faker with Scrapy. Without Scrapy it's fine, but when I include them in Scrapy it gives me this error. You can see urllib3 is installed. I wonder if I need to set some ...
-2
votes
0answers
18 views

I need XPath to get details from this sit [on hold]

This link I need XPath to get details from this sit https://www.noon.com/saudi-en/p-12766?limit=150
0
votes
0answers
29 views

Post request do not calling

On the last step filling dropdown menus, post request not sending import scrapy class WisecoSpider(scrapy.Spider): name = 'wiseco' search_url ='http://www.wiseco.com/ProductSearch.aspx' ...
0
votes
1answer
41 views

Scrape dynamic data using scrapy [on hold]

I would like to scrape option chain of stock from nasdaq website using scrapy (along with other data) Nasdaq recently updated their website. Here is the url I am talking about. The data is not ...
0
votes
1answer
16 views

How to define a rule in scrapy which crawls the website link recursively?

I am trying to build an application which uses scrapy to crawls a website to get all the links which are on homepage plus the links which can be reached using the homepage link. But the problem is ...
0
votes
0answers
31 views

Xpath can't find element and return none

im facing a problem with this page : https://www.ouedkniss.com/op%C3%A9rateur-sur-machine-bejaia-boudjellil-algerie-offres-d-emploi-d19820393 i want to scrap this elements: Employeur : SARL UFMATP ...
-2
votes
0answers
32 views

How can my crawler know the right content for the keyword? [on hold]

I want to build a social media monitoring tool that will scrap data from social media. I plan to use Python with Scrapy. I find it possible to store paragraphs with one of the keywords from the ...
0
votes
1answer
31 views

Can't find value of some requests .Aspx website

In the webpage http://www.wiseco.com/ProductSearch.aspx, I'm trying to call the dropdown menu selection result, and I can't find the value of two headers in the post request: ctl00$...
0
votes
1answer
34 views

Scrapy: Duplicate item fields due to multiple for loops

My question is almost identical to: Scrapy - Why Item Inside For Loop Has The Same Value While Accessed in Another Parser Except I have two For loops so creating a new item will cause me to lose the ...
0
votes
1answer
35 views

Trying to log into site with scrapy and response shows login page

I'm new to Scrapy and I'm trying to get a log in working, starting in the shell. This is the site I'm trying to log into: https://www.acdd.com/customer/account/login/ First I did from scrapy.http ...
0
votes
2answers
36 views

If statement to only write new values to PostgreSQL db in Scrapy

I have a Scrapy spider that writes the scraped data to a PostgreSQL database using psycopg2. I have Scrapyd running and item exporters and everything is setup fine. I'm scraping the labor section of ...
0
votes
0answers
21 views

How to response data when scrapy spider's working was done?

I crawled data from web site and I used django for server. When spider's working was done, I want to notice to my server that crawling was done or not. I found signals module that can be used for the ...
-2
votes
2answers
54 views

How to scrape video url (.m3u8) from the website [on hold]

Thanks for choosing to read this question. I am trying to scrape video from this url https://www.elle.com/es/moda/noticias/a28442606/zara-seccion-temporal-vestidos-monos-faldas-tops-baratos/ I have ...
0
votes
1answer
25 views

Scrapy does not extract the text in certain selectors

I am crawling a website using Scrapy but when I select certain selectors, it does not extract the text in them. The website is https://www.chopo.com.mx/estudios/super-quimica-de-35-elementos/# and ...