[Python] Come si fa ad importare una variabile di un file localizzato in un'altra directory?
Marco Ippolito
ippolito.marco a gmail.com
Sab 8 Feb 2014 20:03:49 CET
Ciao Enrico,
ti ringrazio per il tuo aiuto.
Per meglio contestualizzare e circoscrivere quello che sto cercando di
fare, ti dico che sto cercando di "portare" (inteso come porting) in
script uno scraper (con scrapy) funzionante correttamente.
Riporto con un copia-incolla quanto ho scritto oggi nel google group
di scrapy, non avendo purtroppo ad ora ricevuto alcun commento.
Hi everybody,
following the indications
here:http://scrapy.readthedocs.org/en/0.18/topics/practices.html
where: from testspiders.spiders.followall import FollowAllSpider means:
import class "FollowAllSpider" contained in the file followall.py,
which is located in folder testspiders/spiders
I'm trying to transfer into a script my working scraper.
so this is my file:
#!/usr/bin/python
from twisted.internet import reactor
from scrapy.crawler import Crawler
from scrapy.settings import Settings
from scrapy import log, signals
#from sole24ore.sole24ore.spiders.sole import SoleSpider
spider = SoleSpider(domain='sole24ore.com')
crawler = Crawler(Settings())
crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
crawler.configure()
crawler.crawl(spider)
crawler.start()
log.start()
reactor.run() # the script will block here until the spider_closed
signal was sent
but when running the script:
python soleLinksScrapy.py
Traceback (most recent call last):
File "soleLinksScrapy.py", line 25, in <module>
from sole24ore.sole24ore.spiders.sole import SoleSpider
File "/home/ubuntu/ggc/prove/sole24ore/sole24ore/spiders/sole.py",
line 6, in <module>
from sole24ore.items import Sole24OreItem
ImportError: No module named items
The scraperm when typying in its folder scrapy crawl sole works fine:
scrapy crawl sole
2014-02-08 17:00:52+0000 [scrapy] INFO: Scrapy 0.18.4 started (bot: sole24ore)
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Optional features available:
ssl, http11, boto
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Overridden settings:
{'NEWSPIDER_MODULE': 'sole24ore.spiders', 'SPIDER_MODULES':
['sole24ore.spiders'], 'BOT_NAME': 'sole24ore'}
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled extensions: LogStats,
TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled downloader
middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware,
UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware,
MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware,
CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled spider middlewares:
HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
UrlLengthMiddleware, DepthMiddleware
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Enabled item pipelines:
2014-02-08 17:00:52+0000 [sole] INFO: Spider opened
2014-02-08 17:00:52+0000 [sole] INFO: Crawled 0 pages (at 0
pages/min), scraped 0 items (at 0 items/min)
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Telnet console listening on
0.0.0.0:6023
2014-02-08 17:00:52+0000 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-02-08 17:00:53+0000 [sole] DEBUG: Redirecting (301) to <GET
http://www.ilsole24ore.com/> from <GET http://www.sole24ore.com/>
2014-02-08 17:00:53+0000 [sole] DEBUG: Crawled (200) <GET
http://www.ilsole24ore.com/> (referer: None)
[s] Available Scrapy objects:
[s] hxs <HtmlXPathSelector xpath=None data=u'<html
xmlns="http://www.w3.org/1999/xhtm'>
[s] item {}
[s] request <GET http://www.ilsole24ore.com/>
[s] response <200 http://www.ilsole24ore.com/>
[s] settings <CrawlerSettings module=<module 'sole24ore.settings'
from '/home/ubuntu/ggc/prove/sole24ore/sole24ore/settings.pyc'>>
[s] Useful shortcuts:
[s] shelp() Shell help (print this help)
[s] view(response) View response in a browser
In [1]:
Do you really want to exit ([y]/n)? y
2014-02-08 17:00:58+0000 [sole] DEBUG: Scraped from <200
http://www.ilsole24ore.com/>
{'url':
[u'http://www.ilsole24ore.com/ebook/norme-e-tributi/2014/crisi_impresa/index.shtml',
u'http://www.ilsole24ore.com/ebook/norme-e-tributi/2014/crisi_impresa/index.shtml',
u'http://www.ilsole24ore.com/cultura.shtml',
u'http://www.casa24.ilsole24ore.com/',
u'http://www.moda24.ilsole24ore.com/',
u'http://food24.ilsole24ore.com/',
u'http://www.motori24.ilsole24ore.com/',
u'http://job24.ilsole24ore.com/',
u'http://stream24.ilsole24ore.com/',
u'http://www.viaggi24.ilsole24ore.com/',
u'http://www.salute24.ilsole24ore.com/',
u'http://www.shopping24.ilsole24ore.com/',
u'http://www.radio24.ilsole24ore.com/',
the scraper folder is 'sole24ore' folder, which is in ~/ggc/prove/sole24ore...
while the script I would like to make it working is in ~/ggc/prove
Any hints?
Thanks for your help.
Kind regards.
Marco
Maggiori informazioni sulla lista
Python