#4061 closed defect (fixed)
Global gesture: can crash Sugar under certain circumstances
Reported by: | erikos | Owned by: | garnacho |
---|---|---|---|
Priority: | Immediate | Milestone: | |
Component: | Sugar | Version: | 0.97.x |
Severity: | Blocker | Keywords: | r+ olpc-test-passed |
Cc: | garnacho, dsd | Distribution/OS: | OLPC |
Bug Status: | Assigned |
Description
Seen in: os7
Steps to reproduce:
- start Sugar
- tap on the canvas in the Home View
- tap on the Owner (XO) icon
- tap on the canvas in the Home View
Attachments (2)
Change History (18)
comment:1 Changed 9 years ago by erikos
- Cc garnacho added
comment:2 Changed 9 years ago by dsd
- Cc dsd added
This is not a crash as such - the process exits with code 1 after the final touch is made. No signal is raised.
Nothing obvious in the logs. I estimated that libX11 might be calling exit and indeed this is the case:
Breakpoint 3, _XIOError (warning: Unable to fetch general register. warning: Unable to fetch general register. dpy=dpy@entry=<unavailable>) at XlibInt.c:1601 1601 { (gdb) bt #0 _XIOError (dpy=dpy@entry=<unavailable>) at XlibInt.c:1601 #1 0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>) at xcb_io.c:365 #2 0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>) at xcb_io.c:365 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Setting a breakpoint on filter_function in sugar-gesture-grabber shows that this function is not called on the "crash touch" - the process exits before sugar-gesture-grabber is called into.
comment:3 Changed 9 years ago by garnacho
Apparently we've been looking things from the wrong perspective... I traced that error so completely legit pipe errors due to X crashing :). From what I can see it always crashes around libc memory functions (malloc/calloc/free) in random (although related to touch) places.
From my experience this is a memory corruption issue, it would be great to be able to start X under valgrind, although I need to learn 1) how to disable X autorestart and 2) start manually everything as if running from the session script, I'll be trying to replicate on my development machine too, looks like the presence of both a passive touch grab and an active grab have this triggered.
comment:4 Changed 9 years ago by dsd
Yes, its an X crash. I got this trace with debuginfo installed:
(gdb) bt #0 0xb6a4ee38 in raise () from /lib/libc.so.6 #1 0xb6a50478 in abort () from /lib/libc.so.6 #2 0xb6a8b4e0 in ?? () from /lib/libc.so.6 #3 0xb6a91c08 in ?? () from /lib/libc.so.6 #4 0xb6a94788 in ?? () from /lib/libc.so.6 #5 0xb6a97428 in calloc () from /lib/libc.so.6 q #6 0x0013a9b4 in ProcXIQueryPointer (client=0x3961c8) at xiquerypointer.c:153 #7 0x0013011c in ProcIDispatch (client=<optimized out>) at extinit.c:406 #8 0x000388b8 in Dispatch () at dispatch.c:428 #9 0x00028138 in main (argc=8, argv=0x28138 <main+1052>, envp=<optimized out>) at main.c:295
The crash site (calloc) plus the fact that the other trace I took being different agrees - this must be some kind of memory corruption.
To stop X restarting you can do:
systemctl stop olpc-dm.service
To start X just once via the normal display-manager path (integrating with PAM and all that):
olpc-dm
To run Sugar in a more bare bones environment:
X # on one terminal DISPLAY=:0 sugar-session # on another terminal
(I can't reproduce the crash under that environment)
Running X under valgrind doesn't seem like it is going to work on ARM. Take a simple test:
bash-4.2# valgrind X ==2638== Memcheck, a memory error detector ==2638== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==2638== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==2638== Command: X ==2638== disInstr(arm): unhandled instruction: 0xECECA102 cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2) ==2638== valgrind: Unrecognised instruction at address 0x4018638. ==2638== at 0x4018638: ??? (in /usr/lib/ld-2.16.so) ==2638== Your program just tried to execute an instruction that Valgrind ==2638== did not recognise. There are two possible reasons for this. ==2638== 1. Your program has a bug and erroneously jumped to a non-code ==2638== location. If you are running Memcheck and you just saw a ==2638== warning about a bad jump, it's probably your program's fault. ==2638== 2. The instruction is legitimate but Valgrind doesn't handle it, ==2638== i.e. it's Valgrind's fault. If you think this is the case or ==2638== you are not sure, please let us know and we'll try to fix it. ==2638== Either way, Valgrind will now raise a SIGILL signal which will ==2638== probably kill your program.
Maybe you will have more luck debugging this on your development machine (x86, I assume).
comment:5 Changed 8 years ago by garnacho
I've made some mild progress throughout the day, was able to get valgrind logs of the crashing operation, and there are certainly invalid writes on dix/touch.c and Xi/exevents.c, concretely around the TouchListener array in the TouchPointInfoPtr struct, possibly something's going out of bounds and stepping over other memory. I'm compiling an xserver with extra debugging traces
Changed 8 years ago by garnacho
excerpt of valgrind logs for reference, all entries happen during the crashing operation
comment:6 Changed 8 years ago by garnacho
This patch fixes the memory corruption, has been sent already to the xorg-devel mailing list for review
comment:7 Changed 8 years ago by garnacho
comment:8 Changed 8 years ago by erikos
- Keywords r+ added
Patch accepted upstream: http://lists.x.org/archives/xorg-devel/2012-October/034166.html
Will hopefully be available soon.
comment:9 Changed 8 years ago by erikos
Not merged: http://cgit.freedesktop.org/xorg/xserver/ Carlos please nag them to actually include the fix as well.
comment:10 Changed 8 years ago by dsd
Shipped in https://admin.fedoraproject.org/updates/xorg-x11-server-1.13.0-7.fc18 - just waiting for an ARM build.
Also, Peter H maintains his own input tree and merges regularly with master. So I would not worry about the lack of visibility in master yet, it should happen soon.
comment:11 Changed 8 years ago by erikos
- Owner changed from erikos to garnacho
- Status changed from new to assigned
comment:12 Changed 8 years ago by dsd
- Keywords olpc-test-pending added
Fixed in xorg-x11-server-1.13.0-7.fc18, leaving the ticket open until the patch appears in master.
comment:13 Changed 8 years ago by dsd
- Resolution set to fixed
- Status changed from assigned to closed
This is now in xserver master and proposed for inclusion in 1.13.
comment:14 Changed 8 years ago by greenfeld
- Keywords olpc-test-passed added; olpc-test-pending removed
xorg-x11-server-Xorg-1.13.0-11.fc18 is in 13.1.0 os20 for XO-4, and I could not reproduce this issue.
comment:15 Changed 8 years ago by dnarvaez
- Component changed from sugar-toolkit-gtk3 to Sugar
The Sugar crash does not happen if I disable the global gesture code.