Opened 13 years ago
Closed 10 years ago
#2064 closed defect (obsolete)
Sugar dies with SIGABRT when clicking on icons
Reported by: | bernie | Owned by: | alsroot |
---|---|---|---|
Priority: | Unspecified by Maintainer | Milestone: | Unspecified |
Component: | Sugar | Version: | 0.88.x |
Severity: | Critical | Keywords: | r+ dextrose |
Cc: | aa, tch, tonyforster | Distribution/OS: | Unspecified |
Bug Status: | Needinfo |
Description (last modified by bernie)
Sometimes, clicking on journal entries or activity icons in a seemingly innocent way, causes Sugar to die instantly.
This has been reported against F11-0.88 by other testers as well.
I've only seen it twice, It's very hard to trigger. It may very well be a platform bug (DBus, GTK, Python bindings...).
1277655977.154171 WARNING root: No gtk.AccelGroup in the top level window. /usr/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters super(Connection, self).__init__(*args, **kwargs) ** (sugar-session:3202): DEBUG: accept_ice_connection() ** (sugar-session:3202): DEBUG: New client '0xa541c60' ** (sugar-session:3202): DEBUG: Initializing client 0xa541c60 ** (sugar-session:3202): DEBUG: Client '0xa541c60' received RegisterClient(NULL) ** (sugar-session:3202): DEBUG: Adding new client (null) to session ** (sugar-session:3202): DEBUG: Sending RegisterClientReply to '0xa541c60 [105cc12965cbe6b7b4127765671187810100000032020000]' ** (sugar-session:3202): DEBUG: Sending initial SaveYourself ** (sugar-session:3202): DEBUG: Set properties from client '0xa541c60 [105cc12965cbe6b7b4127765671187810100000032020000]' ** (sugar-session:3202): DEBUG: Program = 'sugar-activity' ** (sugar-session:3202): DEBUG: CloneCommand = 'sugar-activity' ** (sugar-session:3202): DEBUG: RestartCommand = 'sugar-activity' '--sm-client-id' '105cc12965cbe6b7b4127765671187810100000032020000' ** (sugar-session:3202): DEBUG: UserID = 'olpc' ** (sugar-session:3202): DEBUG: ProcessID = '3692' ** (sugar-session:3202): DEBUG: RestartStyleHint = 0 ** (sugar-session:3202): DEBUG: Client '0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000]' received SaveYourselfDone(success = True) ** (sugar-session:3202): DEBUG: sms_error_handler (0xa69e160, FALSE, 3, 9, 32771, 0) ** (sugar-session:3202): DEBUG: ice_io_error_handler (0xa55c3b8) ** (sugar-session:3202): DEBUG: IceProcessMessagesIOError on '0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000]' ** (sugar-session:3202): DEBUG: xsmp_finalize (0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000]) 1277656858.845542 WARNING root: No gtk.AccelGroup in the top level window. 1277656858.848256 WARNING root: No gtk.AccelGroup in the top level window. 1277656858.886509 ERROR root: set_active() failed: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.24 was not provided by any .service files python: Modules/gcmodule.c:262: update_refs: La declaración `gc->gc.gc_refs != 0' no se cumple.
Attachments (2)
Change History (32)
comment:1 follow-up: ↓ 2 Changed 13 years ago by aa
comment:2 in reply to: ↑ 1 Changed 13 years ago by bernie
Replying to aa:
I couldn't find any destructors in our C extensions, so this may not be our bug.
I'd blame DBus:
1277656858.886509 ERROR root: set_active() failed: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.24 was not provided by any .service files
Any idea what this error means?
comment:3 Changed 13 years ago by alsroot
Looks like sugar dies on trying to call dbus method on activity dbus object. Is this issue reproducible with arbitrary activity?
comment:4 Changed 13 years ago by alsroot
- Bug Status changed from Unconfirmed to Needinfo
Also try to reproduce issue with 1st patch attached
Changed 13 years ago by alsroot
comment:5 Changed 13 years ago by bernie
I have not yet been able to reproduce the issue (without the patch attached).
In all cases, I was definitely clicking on some journal icon, but I can't remember which one. I tried various tricks, including clicking twice quickly and clicking on an already open activity.
comment:6 Changed 13 years ago by bernie
This seems like the same bug:
comment:7 Changed 13 years ago by bernie
- Cc tch added
We found out that this bug is _much_ easier to trigger while you're downloading a file to the journal. Probably, frequent dbus communication to update the progress bar causes the error.
Martin, while I'm away, could you: apply this patch to the Sugar spec file, regenerate the rpm and install it on Martin Oesterreich's laptop. He knows how to reproduce this bug all the time (he does it by downloading TuxPaint-3 from ASLO).
comment:8 Changed 13 years ago by bernie
I haven't seen this bug yet after applying alsroot's patch.
Could #2154 be related to this bug?
comment:9 follow-up: ↓ 10 Changed 13 years ago by tomeu
I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.
That said, Aleksey's patch looks like a good one to me.
comment:10 in reply to: ↑ 9 ; follow-up: ↓ 11 Changed 13 years ago by bernie
- Keywords r? added
- Resolution set to fixed
- Status changed from new to closed
Replying to tomeu:
I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.
That said, Aleksey's patch looks like a good one to me.
Yeah, I think we could commit alsroot's patch and close this bug as fixed.
comment:11 in reply to: ↑ 10 ; follow-up: ↓ 15 Changed 13 years ago by tomeu
Replying to bernie:
Replying to tomeu:
I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.
That said, Aleksey's patch looks like a good one to me.
Yeah, I think we could commit alsroot's patch and close this bug as fixed.
Sorry, what was the reasoning to think that the warning fixed by Aleksey's patch is the cause of the crash?
comment:12 Changed 13 years ago by tomeu
- Keywords r+ added; r? removed
+ if (old and len(old)) and (not new and not len(new)):
Why not just "if old and not new"?
+ logging.debug('Activity %s went away', name)
I would add the name of the function to the string so it's easier to grep
Please push with these changes, thanks!
(but don't close this ticket just now)
comment:13 Changed 13 years ago by tomeu
- Resolution fixed deleted
- Status changed from closed to reopened
comment:14 Changed 13 years ago by alsroot
comment:15 in reply to: ↑ 11 Changed 13 years ago by bernie
Replying to tomeu:
Sorry, what was the reasoning to think that the warning fixed by Aleksey's patch is the cause of the crash?
After several days of testing with aleksey's patch applied, this particular bug was not seen any more.
However, I have seen this bug elsewhere, #2154, so it's not clear that it's gone.
comment:16 Changed 13 years ago by bernie
Today I managed to make Sugar crash once by clicking on the "sort by..." icon on the right of the Journal toolbar!
The journal sorting icon is a Dextrose-specific feature, but the last line in the log is the same as this bug:
update_refs: Assertion `gc->gc.gc_refs != 0' failed.
Note that I was just clicking on the icon that pops up the menu, not on a particular menu item.
comment:17 Changed 13 years ago by bernie
- Severity changed from Unspecified to Critical
comment:18 Changed 13 years ago by bernie
- Keywords dextrose added
comment:19 Changed 13 years ago by bernie
Attaching a debugger is hard... with debug info loaded, gdb eats up too much memory on the XO-1. The bug is also very hard to trigger.
comment:20 Changed 13 years ago by seeta
- Status changed from reopened to assigned
comment:21 Changed 13 years ago by bernie
To help diagnose this, I've added this custom script on the dextrose-2 builds:
cat >"$INSTALL_ROOT/home/olpc/.xsession" <<__EOF__ ulimit -c unlimited __EOF__
I did not test if this is sufficient to get the core of the sugar process dumped in the user's home. Perhaps we also have to tweak /proc/sys/kernel/core_pattern .
comment:22 Changed 13 years ago by bernie
- Description modified (diff)
- Summary changed from Sudden death while clicking around on the journal to Sugar dies with SIGABRT when clicking on icons
comment:23 Changed 13 years ago by bernie
Reproduced in Fedora 14 too: http://bugzilla.redhat.com/show_bug.cgi?id=640972
comment:24 Changed 13 years ago by bernie
gc->gc.gc_refs really is 0:
(gdb) p generations[0].head
$8 = {gc = {gc_next = 0x1e336f0, gc_prev = 0x1c66770, gc_refs = 0}, dummy = <invalid float value>}
Full coredump available here: http://people.sugarlabs.org/bernie/sugar/sl2064-coredump
Resume it with:
gdb /usr/bin/python sl2064-coredump
comment:25 Changed 13 years ago by alsroot
Was this issue reproduced only on XO, maybe it is free memory lack issue?
comment:26 Changed 13 years ago by tonyforster
May be also #2437
comment:27 Changed 13 years ago by tonyforster
- Cc tonyforster added
comment:28 Changed 12 years ago by cjl
Possibly resolved by dsd patch for redhat:640972?
comment:29 Changed 10 years ago by dnarvaez
- Component changed from journal to sugar
comment:30 Changed 10 years ago by godiard
- Resolution set to obsolete
- Status changed from assigned to closed
I haven't seen this issue in the last two years.
Closing as obsolete, reopen if needed.
This may be tough to debug without reproducing it in a gdb run. Here's what Python/Modules/gcmodule.c has to say about this:
I couldn't find any destructors in our C extensions, so this may not be our bug.