[Python] Manipolazione dati e alberi
Balan Victor
balan.victor0 a gmail.com
Ven 30 Set 2011 16:35:47 CEST
grazie a tutti per le risposte...
io ho bisogno di fare cut'n paste della sezione quote e per qualche strana
ragione mi viene impaginato in quel modo li!
Il giorno 29 settembre 2011 16:21, Antonio <antonioposta a gmail.com> ha
scritto:
> Anche Scrapy è ottima...con BeautifulSoup non riesci a fare xpath più
> complessi.
>
>
> 2011/9/28 Daniel Pyrathon <pirosb3 a gmail.com>
>
>> Ciao Balan
>>
>> Ho scritto un piccolo componente che fa il parsing di un file di testo
>> (strutturato come vuoi tu) e ne ricava una lista di dizionari.
>>
>> Nel caso di:
>> Serie A
>> 18:00
>> Bologna
>> Inter
>> 1:3
>> 20:45
>> Milan
>> Cesena
>> 1:0
>> 20:45
>> Napoli
>> Fiorentina
>> 0:0
>> Serie B
>> 18:00
>> Bologna
>> Inter
>> 1:3
>> 20:45
>> Milan
>> Cesena
>> 1:0
>> 20:45
>> Napoli
>> Fiorentina
>> 0:0
>>
>> ritornerebbe:
>>
>> [{'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter',
>> 'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b':
>> 'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli',
>> 'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie A'}, {'teams':
>> [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time':
>> '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena',
>> 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b':
>> 'Fiorentina', 'time': '20:45'}], 'title': 'Serie B'}]
>>
>> Script:
>>
>> import re
>>
>> class TeamParser(object):
>>
>> def __init__(self, file_path):
>>
>> self._file_path = file_path
>>
>> self._result = None
>>
>>
>>
>> @property
>>
>> def result(self):
>>
>> if not self._result:
>>
>> self._result = self._parse_file()
>>
>> return self._result
>>
>>
>>
>> def _parse_file(self):
>>
>> file = open(self._file_path, 'r')
>>
>> current_series = None
>>
>> self._result = []
>>
>>
>>
>> while True:
>>
>> line = file.readline().rstrip()
>>
>> # if file ended, dispose and finish
>>
>>
>>
>> if len(line) == 0:
>>
>> self._result.append(self._parse_team(current_series))
>>
>> break
>>
>>
>>
>> # If new series, dispose and reset array
>>
>> if re.findall('Serie\s\w$', line):
>>
>> if current_series:
>>
>> self._result.append(self._parse_team(current_series))
>>
>> current_series = []
>>
>>
>>
>> # append new line in array
>>
>> current_series.append(line)
>>
>>
>>
>> file.close()
>>
>> return self._result
>>
>>
>>
>> def _parse_team(self, series):
>>
>> result = {'title' : series[0], 'teams' : []}
>>
>> index = 1
>>
>> number_games = (len(series) -1) / 4
>>
>>
>>
>> for team_index in xrange(number_games):
>>
>> team = series[index: index+4]
>>
>> result['teams'].append({'time' : team[0], 'team_a' : team[1],
>> 'team_b' : team[2], 'final_score' : team[3]})
>>
>> index += 4
>>
>> return result
>>
>>
>> x = TeamParser('path del tuo file')
>>
>> print x.result <-- reuslts
>>
>> pastebin: http://pastebin.com/JN0pSQ0j
>>
>> Non penso funzioni con il tuo secondo file, in quel caso fai scraping,
>> esistono tante belle librerie tra cui BeautifulSoup che è fantastica e
>> interamente scritta in Python
>>
>> Un saluto, per qualsiasi cosa chiedi pure!
>>
>> Daniel Pyrathon
>>
>> Il giorno 28 settembre 2011 10:59, Balan Victor <balan.victor0 a gmail.com>ha scritto:
>>
>> in cosa consistono queste cose migliori?grazie
>>>
>>> Il giorno 28 settembre 2011 08:44, Enrico Franchi <
>>> enrico.franchi a gmail.com> ha scritto:
>>>
>>> Balan Victor wrote:
>>>>
>>>> penso di essere riuscito a fare cioè che volevo...che ne pensate?
>>>>>
>>>>
>>>> Diciamo che ti ho visto scrivere cose migliori... ;)
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> .
>>>> ..: -enrico-
>>>>
>>>>
>>>> ______________________________**_________________
>>>> Python mailing list
>>>> Python a lists.python.it
>>>> http://lists.python.it/**mailman/listinfo/python<http://lists.python.it/mailman/listinfo/python>
>>>>
>>>
>>>
>>> _______________________________________________
>>> Python mailing list
>>> Python a lists.python.it
>>> http://lists.python.it/mailman/listinfo/python
>>>
>>>
>>
>>
>> --
>> *************
>>
>> PirosB3
>>
>> http://pirosb3.com
>>
>>
>> _______________________________________________
>> Python mailing list
>> Python a lists.python.it
>> http://lists.python.it/mailman/listinfo/python
>>
>>
>
> _______________________________________________
> Python mailing list
> Python a lists.python.it
> http://lists.python.it/mailman/listinfo/python
>
>
-------------- parte successiva --------------
Un allegato HTML è stato rimosso...
URL: <http://lists.python.it/pipermail/python/attachments/20110930/e01be50e/attachment.html>
Maggiori informazioni sulla lista
Python