Anche Scrapy è ottima...con BeautifulSoup non riesci a fare xpath più complessi.<br><br>
<div class="gmail_quote">2011/9/28 Daniel Pyrathon <span dir="ltr"><<a href="mailto:pirosb3@gmail.com">pirosb3@gmail.com</a>></span><br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">Ciao Balan
<div><br></div>
<div>Ho scritto un piccolo componente che fa il parsing di un file di testo (strutturato come vuoi tu) e ne ricava una lista di dizionari.</div>
<div><br></div>
<div>Nel caso di:</div>
<div>
<div class="im">
<div>Serie A</div>
<div>18:00</div>
<div>Bologna</div>
<div>Inter</div>
<div>1:3</div></div>
<div class="im">
<div>20:45</div>
<div>Milan</div>
<div>Cesena</div>
<div>1:0</div>
<div>20:45</div>
<div>Napoli</div>
<div>Fiorentina</div>
<div>0:0</div>
<div>Serie B</div></div>
<div class="im">
<div>18:00</div>
<div>Bologna</div>
<div>Inter</div>
<div>1:3</div></div>
<div class="im">
<div>20:45</div>
<div>Milan</div>
<div>Cesena</div>
<div>1:0</div>
<div>20:45</div>
<div>Napoli</div>
<div>Fiorentina</div>
<div>0:0</div></div></div>
<div><br></div>
<div>ritornerebbe:</div>
<div>
<p>[{'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie A'}, {'teams': [{'final_score': '1:3', 'team_a': 'Bologna', 'team_b': 'Inter', 'time': '18:00'}, {'final_score': '1:0', 'team_a': 'Milan', 'team_b': 'Cesena', 'time': '20:45'}, {'final_score': '0:0', 'team_a': 'Napoli', 'team_b': 'Fiorentina', 'time': '20:45'}], 'title': 'Serie B'}]</p>
<p>Script:</p>
<p></p>
<p>import re</p>
<p>class TeamParser(object):</p>
<p> def __init__(self, file_path):</p>
<p> self._file_path = file_path</p>
<p> self._result = None</p>
<p> </p>
<p> @property</p>
<p> def result(self):</p>
<p> if not self._result:</p>
<p> self._result = self._parse_file()</p>
<p> return self._result</p>
<p> </p>
<p> def _parse_file(self):</p>
<p> file = open(self._file_path, 'r')</p>
<p> current_series = None</p>
<p> self._result = []</p>
<p> </p>
<p> while True:</p>
<p> line = file.readline().rstrip()</p>
<p> # if file ended, dispose and finish</p>
<p> </p>
<p> if len(line) == 0:</p>
<p> self._result.append(self._parse_team(current_series))</p>
<p> break</p>
<p> </p>
<p> # If new series, dispose and reset array</p>
<p> if re.findall('Serie\s\w$', line):</p>
<p> if current_series:</p>
<p> self._result.append(self._parse_team(current_series))</p>
<p> current_series = []</p>
<p> </p>
<p> # append new line in array</p>
<p> current_series.append(line)</p>
<p> </p>
<p> file.close()</p>
<p> return self._result</p>
<p> </p>
<p> def _parse_team(self, series):</p>
<p> result = {'title' : series[0], 'teams' : []}</p>
<p> index = 1</p>
<p> number_games = (len(series) -1) / 4</p>
<p> </p>
<p> for team_index in xrange(number_games):</p>
<p> team = series[index: index+4]</p>
<p> result['teams'].append({'time' : team[0], 'team_a' : team[1], 'team_b' : team[2], 'final_score' : team[3]})</p>
<p> index += 4</p>
<p> return result</p>
<p><br></p>
<p>x = TeamParser('path del tuo file')</p>
<p>print x.result <-- reuslts</p>
<p></p></div>
<div>pastebin: <a href="http://pastebin.com/JN0pSQ0j" target="_blank">http://pastebin.com/JN0pSQ0j</a></div>
<div><br></div>
<div>Non penso funzioni con il tuo secondo file, in quel caso fai scraping, esistono tante belle librerie tra cui BeautifulSoup che è fantastica e interamente scritta in Python</div>
<div><br></div>
<div>Un saluto, per qualsiasi cosa chiedi pure!</div>
<div><br></div>
<div>Daniel Pyrathon</div>
<div><br>
<div class="gmail_quote">Il giorno 28 settembre 2011 10:59, Balan Victor <span dir="ltr"><<a href="mailto:balan.victor0@gmail.com" target="_blank">balan.victor0@gmail.com</a>></span> ha scritto:
<div>
<div></div>
<div class="h5"><br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">in cosa consistono queste cose migliori?grazie<br><br>
<div class="gmail_quote">Il giorno 28 settembre 2011 08:44, Enrico Franchi <span dir="ltr"><<a href="mailto:enrico.franchi@gmail.com" target="_blank">enrico.franchi@gmail.com</a>></span> ha scritto:
<div>
<div></div>
<div><br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">
<div>Balan Victor wrote:<br><br>
<blockquote style="BORDER-LEFT: #ccc 1px solid; MARGIN: 0px 0px 0px 0.8ex; PADDING-LEFT: 1ex" class="gmail_quote">penso di essere riuscito a fare cioè che volevo...che ne pensate?<br></blockquote><br></div>Diciamo che ti ho visto scrivere cose migliori... ;)<br>
<font color="#888888"><br><br><br><br>-- <br>.<br>..: -enrico-</font>
<div>
<div></div>
<div><br><br>______________________________<u></u>_________________<br>Python mailing list<br><a href="mailto:Python@lists.python.it" target="_blank">Python@lists.python.it</a><br><a href="http://lists.python.it/mailman/listinfo/python" target="_blank">http://lists.python.it/<u></u>mailman/listinfo/python</a><br>
</div></div></blockquote></div></div></div><br><br>_______________________________________________<br>Python mailing list<br><a href="mailto:Python@lists.python.it" target="_blank">Python@lists.python.it</a><br><a href="http://lists.python.it/mailman/listinfo/python" target="_blank">http://lists.python.it/mailman/listinfo/python</a><br>
<br></blockquote></div></div></div><font color="#888888"><br><br clear="all">
<div><br></div>-- <br>*************
<div><br>
<div>PirosB3</div>
<div><br></div>
<div><a href="http://pirosb3.com/" target="_blank">http://pirosb3.com</a></div></div><br></font></div><br>_______________________________________________<br>Python mailing list<br><a href="mailto:Python@lists.python.it">Python@lists.python.it</a><br>
<a href="http://lists.python.it/mailman/listinfo/python" target="_blank">http://lists.python.it/mailman/listinfo/python</a><br><br></blockquote></div><br>