Opened 9 years ago

Closed 6 years ago

#2064 closed defect (obsolete)

Sugar dies with SIGABRT when clicking on icons

Reported by: bernie Owned by: alsroot
Priority: Unspecified by Maintainer Milestone: Unspecified
Component: Sugar Version: 0.88.x
Severity: Critical Keywords: r+ dextrose
Cc: aa, tch, tonyforster Distribution/OS: Unspecified
Bug Status: Needinfo

Description (last modified by bernie)

Sometimes, clicking on journal entries or activity icons in a seemingly innocent way, causes Sugar to die instantly.

This has been reported against F11-0.88 by other testers as well.

I've only seen it twice, It's very hard to trigger. It may very well be a platform bug (DBus, GTK, Python bindings...).

1277655977.154171 WARNING root: No gtk.AccelGroup in the top level window.
/usr/lib/python2.6/site-packages/dbus/connection.py:242: DeprecationWarning: object.__init__() takes no parameters
  super(Connection, self).__init__(*args, **kwargs)
** (sugar-session:3202): DEBUG: accept_ice_connection()
** (sugar-session:3202): DEBUG: New client '0xa541c60'
** (sugar-session:3202): DEBUG: Initializing client 0xa541c60
** (sugar-session:3202): DEBUG: Client '0xa541c60' received RegisterClient(NULL)
** (sugar-session:3202): DEBUG: Adding new client (null) to session
** (sugar-session:3202): DEBUG: Sending RegisterClientReply to '0xa541c60 [105cc12965cbe6b7b4127765671187810100000032020000]'
** (sugar-session:3202): DEBUG: Sending initial SaveYourself
** (sugar-session:3202): DEBUG: Set properties from client '0xa541c60 [105cc12965cbe6b7b4127765671187810100000032020000]'
** (sugar-session:3202): DEBUG:   Program = 'sugar-activity'
** (sugar-session:3202): DEBUG:   CloneCommand = 'sugar-activity' 
** (sugar-session:3202): DEBUG:   RestartCommand = 'sugar-activity' '--sm-client-id' '105cc12965cbe6b7b4127765671187810100000032020000' 
** (sugar-session:3202): DEBUG:   UserID = 'olpc'
** (sugar-session:3202): DEBUG:   ProcessID = '3692'
** (sugar-session:3202): DEBUG:   RestartStyleHint = 0
** (sugar-session:3202): DEBUG: Client '0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000]' received SaveYourselfDone(success = True)
** (sugar-session:3202): DEBUG: sms_error_handler (0xa69e160, FALSE, 3, 9, 32771, 0)
** (sugar-session:3202): DEBUG: ice_io_error_handler (0xa55c3b8)
** (sugar-session:3202): DEBUG: IceProcessMessagesIOError on '0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000]'
** (sugar-session:3202): DEBUG: xsmp_finalize (0xa541c60 [sugar-activity 105cc12965cbe6b7b4127765671187810100000032020000])
1277656858.845542 WARNING root: No gtk.AccelGroup in the top level window.
1277656858.848256 WARNING root: No gtk.AccelGroup in the top level window.
1277656858.886509 ERROR root: set_active() failed: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.24 was not provided by any .service files
python: Modules/gcmodule.c:262: update_refs: La declaración `gc->gc.gc_refs != 0' no se cumple.

Attachments (2)

0001-Listen-for-NameOwnerChanged-DBus-singall-all-time-fo.patch (1.3 KB) - added by alsroot 9 years ago.
sl2064-full-backtrace.txt (98.9 KB) - added by bernie 9 years ago.
Fully detailed backtrace of the crash from rh#640972

Download all attachments as: .zip

Change History (32)

comment:1 follow-up: Changed 9 years ago by aa

This may be tough to debug without reproducing it in a gdb run. Here's what Python/Modules/gcmodule.c has to say about this:

update_refs(PyGC_Head *containers)
{
    PyGC_Head *gc = containers->gc.gc_next;
    for (; gc != containers; gc = gc->gc.gc_next) {
        assert(gc->gc.gc_refs == GC_REACHABLE);
        gc->gc.gc_refs = Py_REFCNT(FROM_GC(gc));
        /* Python's cyclic gc should never see an incoming refcount
         * of 0:  if something decref'ed to 0, it should have been
         * deallocated immediately at that time.
         * Possible cause (if the assert triggers):  a tp_dealloc
         * routine left a gc-aware object tracked during its teardown
         * phase, and did something-- or allowed something to happen --
         * that called back into Python.  gc can trigger then, and may
         * see the still-tracked dying object.  Before this assert
         * was added, such mistakes went on to allow gc to try to
         * delete the object again.  In a debug build, that caused
         * a mysterious segfault, when _Py_ForgetReference tried
         * to remove the object from the doubly-linked list of all
         * objects a second time.  In a release build, an actual
         * double deallocation occurred, which leads to corruption
         * of the allocator's internal bookkeeping pointers.  That's
         * so serious that maybe this should be a release-build
         * check instead of an assert?
         */
        assert(gc->gc.gc_refs != 0);
    }
}

I couldn't find any destructors in our C extensions, so this may not be our bug.

comment:2 in reply to: ↑ 1 Changed 9 years ago by bernie

Replying to aa:

I couldn't find any destructors in our C extensions, so this may not be our bug.

I'd blame DBus:

1277656858.886509 ERROR root: set_active() failed: org.freedesktop.DBus.Error.ServiceUnknown: The name :1.24 was not provided by any .service files

Any idea what this error means?

comment:3 Changed 9 years ago by alsroot

Looks like sugar dies on trying to call dbus method on activity dbus object. Is this issue reproducible with arbitrary activity?

comment:4 Changed 9 years ago by alsroot

  • Bug Status changed from Unconfirmed to Needinfo

Also try to reproduce issue with 1st patch attached

comment:5 Changed 9 years ago by bernie

I have not yet been able to reproduce the issue (without the patch attached).

In all cases, I was definitely clicking on some journal icon, but I can't remember which one. I tried various tricks, including clicking twice quickly and clicking on an already open activity.

comment:6 Changed 9 years ago by bernie

comment:7 Changed 9 years ago by bernie

  • Cc tch added

We found out that this bug is _much_ easier to trigger while you're downloading a file to the journal. Probably, frequent dbus communication to update the progress bar causes the error.

Martin, while I'm away, could you: apply this patch to the Sugar spec file, regenerate the rpm and install it on Martin Oesterreich's laptop. He knows how to reproduce this bug all the time (he does it by downloading TuxPaint-3 from ASLO).

comment:8 Changed 9 years ago by bernie

I haven't seen this bug yet after applying alsroot's patch.

Could #2154 be related to this bug?

comment:9 follow-up: Changed 9 years ago by tomeu

I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.

That said, Aleksey's patch looks like a good one to me.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 9 years ago by bernie

  • Keywords r? added
  • Resolution set to fixed
  • Status changed from new to closed

Replying to tomeu:

I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.

That said, Aleksey's patch looks like a good one to me.

Yeah, I think we could commit alsroot's patch and close this bug as fixed.

comment:11 in reply to: ↑ 10 ; follow-up: Changed 9 years ago by tomeu

Replying to bernie:

Replying to tomeu:

I'm not so convinced of (python-)dbus being a suspect here. Thousands of python instructions could have been executed since that message was printed. And normally when we get that message there's no crash.

That said, Aleksey's patch looks like a good one to me.

Yeah, I think we could commit alsroot's patch and close this bug as fixed.

Sorry, what was the reasoning to think that the warning fixed by Aleksey's patch is the cause of the crash?

comment:12 Changed 9 years ago by tomeu

  • Keywords r+ added; r? removed

+ if (old and len(old)) and (not new and not len(new)):

Why not just "if old and not new"?

+ logging.debug('Activity %s went away', name)

I would add the name of the function to the string so it's easier to grep

Please push with these changes, thanks!

(but don't close this ticket just now)

comment:13 Changed 9 years ago by tomeu

  • Resolution fixed deleted
  • Status changed from closed to reopened

comment:15 in reply to: ↑ 11 Changed 9 years ago by bernie

Replying to tomeu:

Sorry, what was the reasoning to think that the warning fixed by Aleksey's patch is the cause of the crash?

After several days of testing with aleksey's patch applied, this particular bug was not seen any more.

However, I have seen this bug elsewhere, #2154, so it's not clear that it's gone.

comment:16 Changed 9 years ago by bernie

Today I managed to make Sugar crash once by clicking on the "sort by..." icon on the right of the Journal toolbar!

The journal sorting icon is a Dextrose-specific feature, but the last line in the log is the same as this bug:

 update_refs: Assertion `gc->gc.gc_refs != 0' failed.

Note that I was just clicking on the icon that pops up the menu, not on a particular menu item.

comment:17 Changed 9 years ago by bernie

  • Severity changed from Unspecified to Critical

comment:18 Changed 9 years ago by bernie

  • Keywords dextrose added

comment:19 Changed 9 years ago by bernie

Attaching a debugger is hard... with debug info loaded, gdb eats up too much memory on the XO-1. The bug is also very hard to trigger.

comment:20 Changed 9 years ago by seeta

  • Status changed from reopened to assigned

comment:21 Changed 9 years ago by bernie

To help diagnose this, I've added this custom script on the dextrose-2 builds:

cat >"$INSTALL_ROOT/home/olpc/.xsession" <<__EOF__
ulimit -c unlimited
__EOF__

I did not test if this is sufficient to get the core of the sugar process dumped in the user's home. Perhaps we also have to tweak /proc/sys/kernel/core_pattern .

comment:22 Changed 9 years ago by bernie

  • Description modified (diff)
  • Summary changed from Sudden death while clicking around on the journal to Sugar dies with SIGABRT when clicking on icons

comment:23 Changed 9 years ago by bernie

Changed 9 years ago by bernie

Fully detailed backtrace of the crash from rh#640972

comment:24 Changed 9 years ago by bernie

gc->gc.gc_refs really is 0:

(gdb) p generations[0].head
$8 = {gc = {gc_next = 0x1e336f0, gc_prev = 0x1c66770, gc_refs = 0}, dummy = <invalid float value>}

Full coredump available here: http://people.sugarlabs.org/bernie/sugar/sl2064-coredump

Resume it with:

gdb /usr/bin/python sl2064-coredump

comment:25 Changed 9 years ago by alsroot

Was this issue reproduced only on XO, maybe it is free memory lack issue?

comment:26 Changed 9 years ago by tonyforster

May be also #2437

comment:27 Changed 9 years ago by tonyforster

  • Cc tonyforster added

comment:28 Changed 8 years ago by cjl

Possibly resolved by dsd patch for redhat:640972?

comment:29 Changed 6 years ago by dnarvaez

  • Component changed from journal to sugar

comment:30 Changed 6 years ago by godiard

  • Resolution set to obsolete
  • Status changed from assigned to closed

I haven't seen this issue in the last two years.

Closing as obsolete, reopen if needed.

Note: See TracTickets for help on using tickets.