Opened 11 years ago
#3660 new defect
Wikipedia: Chars # and " in the article title break data generation process
Reported by: | godiard | Owned by: | godiard |
---|---|---|---|
Priority: | Unspecified by Maintainer | Milestone: | Unspecified |
Component: | Wikipedia | Version: | Unspecified |
Severity: | Unspecified | Keywords: | |
Cc: | Distribution/OS: | Unspecified | |
Bug Status: | Unconfirmed |
Description
After processing pages_parser.py, there are links with '"' and # in the .links file, and after make_selection.py are added to pages_selected-level-1
The " produce errors when trying to insert in the sql database, and the # points to index inside other articles, then should be ignored.
Part of the errors were solved (and other were avoided editing the ages-selected file by hand), but this characters should be removed earlier in the process (probably in make_selection.py)
Note: See
TracTickets for help on using
tickets.