Ticket #2132 (closed defect: fixed)

Opened 3 years ago

Last modified 3 years ago

Datastore index corruption

Reported by: bernie Owned by: tomeu
Priority: Urgent Milestone: Unspecified by Release Team
Component: sugar-datastore Version: Git as of bugdate
Severity: Blocker Keywords: dextrose, r+
Cc: tch, alsroot Distribution/OS: Unspecified
Bug Status: Unconfirmed

Description (last modified by bernie) (diff)

Today we figured out one of the possibly many ways in which the index of the datastore can get corrupted in Sugar.

Here's an almost infallible recipe to reproduce it:

1. open Write
2. type something
3. close Write
4. wait a few seconds
5. kill -9 the datastore process
6. restart sugar (ctrl-alt-del)

Your saved entry is gone. It still takes up space on disk, but it's no longer visible until you rebuild the index.

Step 5 is just an artificial way to reproduce the problem. Other equally effective ways to trigger this data loss issue in the real-world include running out of battery, holding the power button for 4 seconds, and triggering the kernel out-of-memory killer.

Attachments

0001-Set-index_updated-flag-on-ds-shutting-down.patch Download (1.0 KB) - added by alsroot 3 years ago.
sl2132-reduce-_FLUSH_TIMEOUT-to-5sec.patch Download (1.0 KB) - added by bernie 3 years ago.
Mitigate the bug by reducing the timeout

Change History

follow-up: ↓ 2   Changed 3 years ago by alsroot

  • cc alsroot added

The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.

in reply to: ↑ 1   Changed 3 years ago by bernie

Replying to alsroot:

The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.

This might indeed help a lot... We'll test it today.

However, I feel there are so many more cases of index corruption that we cannot do anything about... Until we fix *all* of them, we need a way to force index rebuild when shit happens.

  Changed 3 years ago by bernie

Aleksey, your patch seems to work well. However, it exposes the current bug even more, by triggering a datastore reindex every time we reset Sugar shortly after closing an activity.

We figured out that the datastore delays writes to disk by about 60 seconds. This braindead behavior appears to be a *designed* optimization!

Is this in Rarian or in the datastore code itelf?

Changed 3 years ago by alsroot

follow-up: ↓ 5   Changed 3 years ago by alsroot

It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.

in reply to: ↑ 4   Changed 3 years ago by bernie

Replying to alsroot:

It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.

Yes, many kids hold the power button for 4 seconds to shut down their machine. Your patch seems good.

Changed 3 years ago by bernie

Mitigate the bug by reducing the timeout

  Changed 3 years ago by bernie

  • keywords dextrose, r? added; dextrose removed

I've attached a patch to reduce the timeout from 60 seconds to 5 seconds.

Along with your patch, it should considerably lower the chance of data loss in the common case in which users immediately perform a hard-shutdown after quitting an activity.

  Changed 3 years ago by erikos

This one can be closed, right?

  Changed 3 years ago by alsroot

at least 0001-Set-index_updated-flag-on-ds-shutting-down.patch was commited

 http://git.sugarlabs.org/projects/sugar-datastore/repos/mainline/commits/7faf55c9466b13c2e16f6242373a6e9bf5a27f8e

  Changed 3 years ago by bernie

  • status changed from new to closed
  • resolution set to fixed
  • description modified (diff)

Both patches are committed.

This hopefully closes one of the nastiest data loss bugs that plagued Sugar for several years!

Note: See TracTickets for help on using tickets.