Ticket #3660 (new defect)
Opened 12 months ago
Wikipedia: Chars # and " in the article title break data generation process
|Reported by:||godiard||Owned by:||godiard|
|Priority:||Unspecified by Maintainer||Milestone:||Unspecified by Release Team|
After processing pages_parser.py, there are links with '"' and # in the .links file, and after make_selection.py are added to pages_selected-level-1
The " produce errors when trying to insert in the sql database, and the # points to index inside other articles, then should be ignored.
Part of the errors were solved (and other were avoided editing the ages-selected file by hand), but this characters should be removed earlier in the process (probably in make_selection.py)