#2132 closed defect (fixed)
Datastore index corruption
Reported by: | bernie | Owned by: | tomeu |
---|---|---|---|
Priority: | Urgent | Milestone: | Unspecified |
Component: | Sugar | Version: | Git as of bugdate |
Severity: | Blocker | Keywords: | dextrose, r+ |
Cc: | tch, alsroot | Distribution/OS: | Unspecified |
Bug Status: | Unconfirmed |
Description (last modified by bernie)
Today we figured out one of the possibly many ways in which the index of the datastore can get corrupted in Sugar.
Here's an almost infallible recipe to reproduce it:
- open Write
- type something
- close Write
- wait a few seconds
- kill -9 the datastore process
- restart sugar (ctrl-alt-del)
Your saved entry is gone. It still takes up space on disk, but it's no longer visible until you rebuild the index.
Step 5 is just an artificial way to reproduce the problem. Other equally effective ways to trigger this data loss issue in the real-world include running out of battery, holding the power button for 4 seconds, and triggering the kernel out-of-memory killer.
Attachments (2)
Change History (13)
comment:1 follow-up: ↓ 2 Changed 13 years ago by alsroot
- Cc alsroot added
comment:2 in reply to: ↑ 1 Changed 13 years ago by bernie
Replying to alsroot:
The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.
This might indeed help a lot... We'll test it today.
However, I feel there are so many more cases of index corruption that we cannot do anything about... Until we fix *all* of them, we need a way to force index rebuild when shit happens.
comment:3 Changed 13 years ago by bernie
Aleksey, your patch seems to work well. However, it exposes the current bug even more, by triggering a datastore reindex every time we reset Sugar shortly after closing an activity.
We figured out that the datastore delays writes to disk by about 60 seconds. This braindead behavior appears to be a *designed* optimization!
Is this in Rarian or in the datastore code itelf?
Changed 13 years ago by alsroot
comment:4 follow-up: ↓ 5 Changed 13 years ago by alsroot
It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.
comment:5 in reply to: ↑ 4 Changed 13 years ago by bernie
Replying to alsroot:
It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.
Yes, many kids hold the power button for 4 seconds to shut down their machine. Your patch seems good.
comment:6 Changed 13 years ago by bernie
- Keywords r? added
I've attached a patch to reduce the timeout from 60 seconds to 5 seconds.
Along with your patch, it should considerably lower the chance of data loss in the common case in which users immediately perform a hard-shutdown after quitting an activity.
comment:7 Changed 13 years ago by alsroot
- Keywords r+ added; r? removed
comment:8 Changed 13 years ago by erikos
This one can be closed, right?
comment:9 Changed 13 years ago by alsroot
at least 0001-Set-index_updated-flag-on-ds-shutting-down.patch was commited
comment:10 Changed 13 years ago by bernie
- Description modified (diff)
- Resolution set to fixed
- Status changed from new to closed
Both patches are committed.
This hopefully closes one of the nastiest data loss bugs that plagued Sugar for several years!
comment:11 Changed 10 years ago by dnarvaez
- Component changed from sugar-datastore to Sugar
The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.