Opened 14 years ago

Closed 14 years ago

Last modified 11 years ago

#2132 closed defect (fixed)

Datastore index corruption — at Version 10

Reported by: bernie Owned by: tomeu
Priority: Urgent Milestone: Unspecified
Component: Sugar Version: Git as of bugdate
Severity: Blocker Keywords: dextrose, r+
Cc: tch, alsroot Distribution/OS: Unspecified
Bug Status: Unconfirmed

Description (last modified by bernie)

Today we figured out one of the possibly many ways in which the index of the datastore can get corrupted in Sugar.

Here's an almost infallible recipe to reproduce it:

  1. open Write
  2. type something
  3. close Write
  4. wait a few seconds
  5. kill -9 the datastore process
  6. restart sugar (ctrl-alt-del)

Your saved entry is gone. It still takes up space on disk, but it's no longer visible until you rebuild the index.

Step 5 is just an artificial way to reproduce the problem. Other equally effective ways to trigger this data loss issue in the real-world include running out of battery, holding the power button for 4 seconds, and triggering the kernel out-of-memory killer.

Change History (12)

comment:1 follow-up: Changed 14 years ago by alsroot

  • Cc alsroot added

The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.

comment:2 in reply to: ↑ 1 Changed 14 years ago by bernie

Replying to alsroot:

The fix that could be useful here was landed by #2095 to ds-0.89.1. It would be useful to test it with this ticket scenario.

This might indeed help a lot... We'll test it today.

However, I feel there are so many more cases of index corruption that we cannot do anything about... Until we fix *all* of them, we need a way to force index rebuild when shit happens.

comment:3 Changed 14 years ago by bernie

Aleksey, your patch seems to work well. However, it exposes the current bug even more, by triggering a datastore reindex every time we reset Sugar shortly after closing an activity.

We figured out that the datastore delays writes to disk by about 60 seconds. This braindead behavior appears to be a *designed* optimization!

Is this in Rarian or in the datastore code itelf?

comment:4 follow-up: Changed 14 years ago by alsroot

It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.

comment:5 in reply to: ↑ 4 Changed 14 years ago by bernie

Replying to alsroot:

It is by design (I attached patch to set index_updated on regular ds close) i.e. if you power-off your box w/o shuting it down, on the first boot, system will check fs since unmount wasn't done. If hard power-off (w/o choosing "Shutdown" in menu) is popular in the field, we can expose flush() dbus method, and sugar will catch power button pressing via ACPI and will call it before XO will be de-energized.

Yes, many kids hold the power button for 4 seconds to shut down their machine. Your patch seems good.

Changed 14 years ago by bernie

Mitigate the bug by reducing the timeout

comment:6 Changed 14 years ago by bernie

  • Keywords r? added

I've attached a patch to reduce the timeout from 60 seconds to 5 seconds.

Along with your patch, it should considerably lower the chance of data loss in the common case in which users immediately perform a hard-shutdown after quitting an activity.

comment:8 Changed 14 years ago by erikos

This one can be closed, right?

comment:9 Changed 14 years ago by alsroot

at least 0001-Set-index_updated-flag-on-ds-shutting-down.patch was commited

http://git.sugarlabs.org/projects/sugar-datastore/repos/mainline/commits/7faf55c9466b13c2e16f6242373a6e9bf5a27f8e

comment:10 Changed 14 years ago by bernie

  • Description modified (diff)
  • Resolution set to fixed
  • Status changed from new to closed

Both patches are committed.

This hopefully closes one of the nastiest data loss bugs that plagued Sugar for several years!

Note: See TracTickets for help on using tickets.