[Python] Domanda facile facile su caso manipolazione unicode

Marco Ippolito ippolito.marco a gmail.com
Gio 29 Gen 2015 11:49:35 CET


Il 29 gennaio 2015 11:18, Marco Ippolito <ippolito.marco a gmail.com> ha scritto:
>> ma se sono notizie prese da internet, l'encoding dovresti già saperlo
>> dall'html.
>> Che sorgenti di dati hai?
> L'articolo in questione è questo:
> http://www.ilsole24ore.com/art/english-version/2014-05-29/signs-of-light-the-credit-darkness-032044.shtml?uuid=ABJTc3LB

comunque nell'html della pagina web in questione si legge:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

ho provato a mettere allora:
#!/usr/bin/python
#-*- coding: iso-8859-1 -*-

ma, all'atto di salvare il file per poi eseguirlo, esce questo:

These default coding systems were tried to encode text
  (iso-latin-1-unix (293 . 8212) (298 . 8217) (303 . 8221) (308
  . 8220) (1067 . 8220) (1088 . 8221) (1109 . 8212) (1130 . 8217)
  (2227 . 8220) (2279 . 8220) (2360 . 8221))
However, each of them encountered characters it couldn't encode:
  iso-latin-1-unix cannot encode these: — ’ ” “ “ ” — ’ “ “ ...

Click on a character (or switch to this window by `C-x o'
and select the characters by RET) to jump to the place it appears,
where `C-u C-x =' will give information about it.
Select one of the safe coding systems listed below,
or cancel the writing with C-g and edit the buffer
   to remove or modify the problematic characters,
|or specify any other coding system (and risk losing
   the problematic characters).

  utf-8 euc-jis-2004 euc-jp windows-1256 windows-1258 iso-2022-jp-2004
  next macintosh windows-1254 windows-1252 gb18030 gbk utf-7 utf-16
  utf-16be-with-signature utf-16le-with-signature utf-16be utf-16le
   iso-2022-7bit utf-8-auto utf-8-with-signature eucjp-ms
   georgian-academy georgian-ps japanese-shift-jis-2004
   japanese-iso-7bit-1978-irv utf-7-imap utf-8-emacs


Maggiori informazioni sulla lista Python