[Python] Manipolazione dati e alberi

Gio 29 Set 2011 16:21:18 CEST

Anche Scrapy è ottima...con BeautifulSoup non riesci a fare xpath più
complessi.

2011/9/28 Daniel Pyrathon <pirosb3 at gmail.com>

> Ciao Balan
>
> Ho scritto un piccolo componente che fa il parsing di un file di testo
> (strutturato come vuoi tu) e ne ricava una lista di dizionari.
>
> Nel caso di:
>  Serie A
> 18:00
> Bologna
> Inter
> 1:3
>  20:45
> Milan
> Cesena
> 1:0
> 20:45
> Napoli
> Fiorentina
> 0:0
> Serie B
>  18:00
> Bologna
> Inter
> 1:3
>  20:45
> Milan
> Cesena
> 1:0
> 20:45
> Napoli
> Fiorentina
> 0:0
>
> ritornerebbe:
>
> [{'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter',
> 'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b':
> 'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli',
> 'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie A'}, {'teams':
> [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time':
> '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena',
> 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b':
> 'Fiorentina', 'time': '20:45'}], 'title': 'Serie B'}]
>
> Script:
>
> import re
>
> class TeamParser(object):
>
>   def __init__(self, file_path):
>
>     self._file_path = file_path
>
>     self._result = None
>
>
>
>   @property
>
>   def result(self):
>
>     if not self._result:
>
>       self._result = self._parse_file()
>
>     return self._result
>
>
>
>   def _parse_file(self):
>
>     file = open(self._file_path, 'r')
>
>     current_series = None
>
>     self._result = []
>
>
>
>     while True:
>
>       line = file.readline().rstrip()
>
>       # if file ended, dispose and finish
>
>
>
>       if len(line) == 0:
>
>         self._result.append(self._parse_team(current_series))
>
>         break
>
>
>
>       # If new series, dispose and reset array
>
>       if re.findall('Serie\s\w$', line):
>
>         if current_series:
>
>           self._result.append(self._parse_team(current_series))
>
>         current_series = []
>
>
>
>       # append new line in array
>
>       current_series.append(line)
>
>
>
>     file.close()
>
>     return self._result
>
>
>
>   def _parse_team(self, series):
>
>       result = {'title' : series[0], 'teams' : []}
>
>       index = 1
>
>       number_games = (len(series) -1) / 4
>
>
>
>       for team_index in xrange(number_games):
>
>         team = series[index: index+4]
>
>         result['teams'].append({'time' : team[0], 'team_a' : team[1],
> 'team_b' : team[2], 'final_score' : team[3]})
>
>         index += 4
>
>       return result
>
>
> x = TeamParser('path del tuo file')
>
> print x.result <-- reuslts
>
> pastebin: http://pastebin.com/JN0pSQ0j
>
> Non penso funzioni con il tuo secondo file, in quel caso fai scraping,
> esistono tante belle librerie tra cui BeautifulSoup che è fantastica e
> interamente scritta in Python
>
> Un saluto, per qualsiasi cosa chiedi pure!
>
> Daniel Pyrathon
>
> Il giorno 28 settembre 2011 10:59, Balan Victor <balan.victor0 at gmail.com>ha scritto:
>
> in cosa consistono queste cose migliori?grazie
>>
>> Il giorno 28 settembre 2011 08:44, Enrico Franchi <
>> enrico.franchi at gmail.com> ha scritto:
>>
>>  Balan Victor wrote:
>>>
>>> penso di essere riuscito a fare cioè che volevo...che ne pensate?
>>>>
>>>
>>> Diciamo che ti ho visto scrivere cose migliori... ;)
>>>
>>>
>>>
>>>
>>> --
>>> .
>>> ..: -enrico-
>>>
>>>
>>> ______________________________**_________________
>>> Python mailing list
>>> Python at lists.python.it
>>> http://lists.python.it/**mailman/listinfo/python<http://lists.python.it/mailman/listinfo/python>
>>>
>>
>>
>> _______________________________________________
>> Python mailing list
>> Python at lists.python.it
>> http://lists.python.it/mailman/listinfo/python
>>
>>
>
>
> --
> *************
>
> PirosB3
>
> http://pirosb3.com
>
>
> _______________________________________________
> Python mailing list
> Python at lists.python.it
> http://lists.python.it/mailman/listinfo/python
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.python.it/pipermail/python/attachments/20110929/c6f618b8/attachment-0001.html>