#4061 closed defect (fixed)
Global gesture: can crash Sugar under certain circumstances
Reported by: | erikos | Owned by: | garnacho |
---|---|---|---|
Priority: | Immediate | Milestone: | |
Component: | Sugar | Version: | 0.97.x |
Severity: | Blocker | Keywords: | r+ olpc-test-passed |
Cc: | garnacho, dsd | Distribution/OS: | OLPC |
Bug Status: | Assigned |
Description
Seen in: os7
Steps to reproduce:
- start Sugar
- tap on the canvas in the Home View
- tap on the Owner (XO) icon
- tap on the canvas in the Home View
Attachments (2)
Change History (18)
comment:1 Changed 11 years ago by erikos
- Cc garnacho added
comment:2 Changed 11 years ago by dsd
- Cc dsd added
This is not a crash as such - the process exits with code 1 after the final touch is made. No signal is raised.
Nothing obvious in the logs. I estimated that libX11 might be calling exit and indeed this is the case:
Breakpoint 3, _XIOError (warning: Unable to fetch general register. warning: Unable to fetch general register. dpy=dpy@entry=<unavailable>) at XlibInt.c:1601 1601 { (gdb) bt #0 _XIOError (dpy=dpy@entry=<unavailable>) at XlibInt.c:1601 #1 0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>) at xcb_io.c:365 #2 0xb5d36258 in _XEventsQueued (dpy=<unavailable>, mode=<optimized out>) at xcb_io.c:365 Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Setting a breakpoint on filter_function in sugar-gesture-grabber shows that this function is not called on the "crash touch" - the process exits before sugar-gesture-grabber is called into.
comment:3 Changed 11 years ago by garnacho
Apparently we've been looking things from the wrong perspective... I traced that error so completely legit pipe errors due to X crashing :). From what I can see it always crashes around libc memory functions (malloc/calloc/free) in random (although related to touch) places.
From my experience this is a memory corruption issue, it would be great to be able to start X under valgrind, although I need to learn 1) how to disable X autorestart and 2) start manually everything as if running from the session script, I'll be trying to replicate on my development machine too, looks like the presence of both a passive touch grab and an active grab have this triggered.
comment:4 Changed 11 years ago by dsd
Yes, its an X crash. I got this trace with debuginfo installed:
(gdb) bt #0 0xb6a4ee38 in raise () from /lib/libc.so.6 #1 0xb6a50478 in abort () from /lib/libc.so.6 #2 0xb6a8b4e0 in ?? () from /lib/libc.so.6 #3 0xb6a91c08 in ?? () from /lib/libc.so.6 #4 0xb6a94788 in ?? () from /lib/libc.so.6 #5 0xb6a97428 in calloc () from /lib/libc.so.6 q #6 0x0013a9b4 in ProcXIQueryPointer (client=0x3961c8) at xiquerypointer.c:153 #7 0x0013011c in ProcIDispatch (client=<optimized out>) at extinit.c:406 #8 0x000388b8 in Dispatch () at dispatch.c:428 #9 0x00028138 in main (argc=8, argv=0x28138 <main+1052>, envp=<optimized out>) at main.c:295
The crash site (calloc) plus the fact that the other trace I took being different agrees - this must be some kind of memory corruption.
To stop X restarting you can do:
systemctl stop olpc-dm.service
To start X just once via the normal display-manager path (integrating with PAM and all that):
olpc-dm
To run Sugar in a more bare bones environment:
X # on one terminal DISPLAY=:0 sugar-session # on another terminal
(I can't reproduce the crash under that environment)
Running X under valgrind doesn't seem like it is going to work on ARM. Take a simple test:
bash-4.2# valgrind X ==2638== Memcheck, a memory error detector ==2638== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. ==2638== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info ==2638== Command: X ==2638== disInstr(arm): unhandled instruction: 0xECECA102 cond=14(0xE) 27:20=206(0xCE) 4:4=0 3:0=2(0x2) ==2638== valgrind: Unrecognised instruction at address 0x4018638. ==2638== at 0x4018638: ??? (in /usr/lib/ld-2.16.so) ==2638== Your program just tried to execute an instruction that Valgrind ==2638== did not recognise. There are two possible reasons for this. ==2638== 1. Your program has a bug and erroneously jumped to a non-code ==2638== location. If you are running Memcheck and you just saw a ==2638== warning about a bad jump, it's probably your program's fault. ==2638== 2. The instruction is legitimate but Valgrind doesn't handle it, ==2638== i.e. it's Valgrind's fault. If you think this is the case or ==2638== you are not sure, please let us know and we'll try to fix it. ==2638== Either way, Valgrind will now raise a SIGILL signal which will ==2638== probably kill your program.
Maybe you will have more luck debugging this on your development machine (x86, I assume).
comment:5 Changed 11 years ago by garnacho
I've made some mild progress throughout the day, was able to get valgrind logs of the crashing operation, and there are certainly invalid writes on dix/touch.c and Xi/exevents.c, concretely around the TouchListener array in the TouchPointInfoPtr struct, possibly something's going out of bounds and stepping over other memory. I'm compiling an xserver with extra debugging traces
Changed 11 years ago by garnacho
excerpt of valgrind logs for reference, all entries happen during the crashing operation
comment:6 Changed 11 years ago by garnacho
This patch fixes the memory corruption, has been sent already to the xorg-devel mailing list for review
comment:7 Changed 11 years ago by garnacho
comment:8 Changed 11 years ago by erikos
- Keywords r+ added
Patch accepted upstream: http://lists.x.org/archives/xorg-devel/2012-October/034166.html
Will hopefully be available soon.
comment:9 Changed 11 years ago by erikos
Not merged: http://cgit.freedesktop.org/xorg/xserver/ Carlos please nag them to actually include the fix as well.
comment:10 Changed 11 years ago by dsd
Shipped in https://admin.fedoraproject.org/updates/xorg-x11-server-1.13.0-7.fc18 - just waiting for an ARM build.
Also, Peter H maintains his own input tree and merges regularly with master. So I would not worry about the lack of visibility in master yet, it should happen soon.
comment:11 Changed 11 years ago by erikos
- Owner changed from erikos to garnacho
- Status changed from new to assigned
comment:12 Changed 11 years ago by dsd
- Keywords olpc-test-pending added
Fixed in xorg-x11-server-1.13.0-7.fc18, leaving the ticket open until the patch appears in master.
comment:13 Changed 11 years ago by dsd
- Resolution set to fixed
- Status changed from assigned to closed
This is now in xserver master and proposed for inclusion in 1.13.
comment:14 Changed 10 years ago by greenfeld
- Keywords olpc-test-passed added; olpc-test-pending removed
xorg-x11-server-Xorg-1.13.0-11.fc18 is in 13.1.0 os20 for XO-4, and I could not reproduce this issue.
comment:15 Changed 10 years ago by dnarvaez
- Component changed from sugar-toolkit-gtk3 to Sugar
The Sugar crash does not happen if I disable the global gesture code.