[Python] Manipolazione dati e alberi

Mer 28 Set 2011 14:11:07 CEST

Ciao Balan

Ho scritto un piccolo componente che fa il parsing di un file di testo
(strutturato come vuoi tu) e ne ricava una lista di dizionari.

Nel caso di:
Serie A
18:00
Bologna
Inter
1:3
20:45
Milan
Cesena
1:0
20:45
Napoli
Fiorentina
0:0
Serie B
18:00
Bologna
Inter
1:3
20:45
Milan
Cesena
1:0
20:45
Napoli
Fiorentina
0:0

ritornerebbe:

[{'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter',
'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b':
'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli',
'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie A'}, {'teams':
[{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time':
'18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena',
'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b':
'Fiorentina', 'time': '20:45'}], 'title': 'Serie B'}]

Script:

import re

class TeamParser(object):

  def __init__(self, file_path):

    self._file_path = file_path

    self._result = None

  @property

  def result(self):

    if not self._result:

      self._result = self._parse_file()

    return self._result

  def _parse_file(self):

    file = open(self._file_path, 'r')

    current_series = None

    self._result = []

    while True:

      line = file.readline().rstrip()

      # if file ended, dispose and finish

      if len(line) == 0:

        self._result.append(self._parse_team(current_series))

        break

      # If new series, dispose and reset array

      if re.findall('Serie\s\w$', line):

        if current_series:

          self._result.append(self._parse_team(current_series))

        current_series = []

      # append new line in array

      current_series.append(line)

    file.close()

    return self._result

  def _parse_team(self, series):

      result = {'title' : series[0], 'teams' : []}

      index = 1

      number_games = (len(series) -1) / 4

      for team_index in xrange(number_games):

        team = series[index: index+4]

        result['teams'].append({'time' : team[0], 'team_a' : team[1],
'team_b' : team[2], 'final_score' : team[3]})

        index += 4

      return result

x = TeamParser('path del tuo file')

print x.result <-- reuslts

pastebin: http://pastebin.com/JN0pSQ0j

Non penso funzioni con il tuo secondo file, in quel caso fai scraping,
esistono tante belle librerie tra cui BeautifulSoup che è fantastica e
interamente scritta in Python

Un saluto, per qualsiasi cosa chiedi pure!

Daniel Pyrathon

Il giorno 28 settembre 2011 10:59, Balan Victor <balan.victor0 at gmail.com> ha
scritto:

> in cosa consistono queste cose migliori?grazie
>
> Il giorno 28 settembre 2011 08:44, Enrico Franchi <
> enrico.franchi at gmail.com> ha scritto:
>
> Balan Victor wrote:
>>
>>  penso di essere riuscito a fare cioè che volevo...che ne pensate?
>>>
>>
>> Diciamo che ti ho visto scrivere cose migliori... ;)
>>
>>
>>
>>
>> --
>> .
>> ..: -enrico-
>>
>>
>> ______________________________**_________________
>> Python mailing list
>> Python at lists.python.it
>> http://lists.python.it/**mailman/listinfo/python<http://lists.python.it/mailman/listinfo/python>
>>
>
>
> _______________________________________________
> Python mailing list
> Python at lists.python.it
> http://lists.python.it/mailman/listinfo/python
>
>

-- 
*************

PirosB3

http://pirosb3.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.python.it/pipermail/python/attachments/20110928/af1188fa/attachment.html>