[Python] Parsing della pagina HTML

Mar 16 Lug 2013 14:34:29 CEST

On Jul 16, 2013, at 12:09 PM, Nicola Larosa <nico a tekNico.net> wrote:

> Qui c'è la doc di text_content:
> 
> <http://lxml.de/lxmlhtml.html#html-element-methods>
> 
> Nello stesso posto trovi le doc di:
> 
> - find_class (se conosci la classe CSS degli elementi che ti
>  interessano);
> - get_element_by_id (se conosci l'id dell'elemento che ti interessa):
> - cssselect (per usare selettori CSS, molto potenti);
> - un accenno a xpath, documentata altrove
>  <http://lxml.de/xpathxslt.html#xpath>, anche molto potente.
> 
> L'esempio usa find_class <http://lxml.de/lxmlhtml.html#examples>.

In alternativa, per queste attività di web scraping io ho sempre utilizzato
BeautifulSoup (http://www.crummy.com/software/BeautifulSoup/)

Btw:
[…]
Beautiful Soup sits on top of popular Python parsers like lxml and html5lib, allowing you to try out different parsing strategies or trade speed for flexibility.
[…]

--
Valerio

-------------- parte successiva --------------
Un allegato HTML è stato rimosso...
URL: <http://lists.python.it/pipermail/python/attachments/20130716/27643e46/attachment.html>