Ticket #1160 (reopened defect)

Opened 4 years ago

Last modified 3 years ago

no way to wait for index to finish updating/(re)building

Reported by: sascha_silbe Owned by: tomeu
Priority: Unspecified by Maintainer Milestone: 0.90
Component: sugar-datastore Version: Git as of bugdate
Severity: Unspecified Keywords: r!
Cc: Distribution/OS: Unspecified
Bug Status: New

Description

The datastore (re)builds the index in the background (idle loop), but has no API to wait for this to finish. That creates a race condition when using the datastore from a test suite:
The test suite needs to run from a clean directory every time (so it's deterministic), thus causing the datastore to do an index build on startup. The datastore is started automatically by DBus and the tests fire their API requests right away, so any API call that uses the index will fail to work as intended (for regular use).

Attachments

0001-add-API-to-check-wait-for-index-rebuild-to-finish.patch Download (2.4 KB) - added by sascha_silbe 4 years ago.
add API to check and wait for index rebuild to finish

Change History

Changed 4 years ago by sascha_silbe

add API to check and wait for index rebuild to finish

  Changed 4 years ago by sascha_silbe

  • keywords r? added

Implemented the check_ready() method and Ready() signal from my datastore redesign proposal. Also added a wait_ready() method that blocks the caller until the index (re)build has finished. pep8.py/pylint clean.

in reply to: ↑ description   Changed 4 years ago by tomeu

Replying to sascha_silbe:

The test suite needs to run from a clean directory every time (so it's deterministic), thus causing the datastore to do an index build on startup.

But in order to be deterministic, you also need to have the same data and metadata, having the same index is not enough. So, why not just delete the whole DS, then adding entries with Create?

follow-up: ↓ 4   Changed 4 years ago by sascha_silbe

But in order to be deterministic, you also need to have the same data and metadata, having the same index is not enough. So, why not just delete the whole DS, then adding entries with Create?

That's exactly what I'm doing. The test suite creates a temporary directory and starts up a new datastore instance with HOME set to the temporary directory. But since an empty directory doesn't contain an index, the datastore will schedule an index (re)build. I need to wait for the index build to finish, otherwise API calls like get_uniquevalues() or find() won't work properly.

in reply to: ↑ 3   Changed 4 years ago by tomeu

Replying to sascha_silbe:

But in order to be deterministic, you also need to have the same data and metadata, having the same index is not enough. So, why not just delete the whole DS, then adding entries with Create?

That's exactly what I'm doing. The test suite creates a temporary directory and starts up a new datastore instance with HOME set to the temporary directory. But since an empty directory doesn't contain an index, the datastore will schedule an index (re)build. I need to wait for the index build to finish, otherwise API calls like get_uniquevalues() or find() won't work properly.

Right, but I was suggesting creating the index cia create() calls, instead of reindexing the whole directory. Is this so impractical for your use case to warrant new API?

  Changed 4 years ago by tomeu

  • status changed from new to closed
  • resolution set to obsolete

Turned this was actually a bug in the DS because the index was marked as rebuilding even when it didn't existed at startup and there were no entries. Fixed that bug, I understand that this new API is not needed any more.

  Changed 4 years ago by tomeu

  • keywords r? removed

follow-up: ↓ 8   Changed 4 years ago by sascha_silbe

  • keywords r? added
  • status changed from closed to reopened
  • resolution obsolete deleted

Still relevant because if the index gets corrupted (e.g. due to unclean shutdown on a non-dat journalling filesystem) it will be rebuilt in the background and return different results than during normal operation.

The old data store apparently had a similar method called complete_indexing() (though "indexing" meant full-text extraction there). My data store rewrite proposal contains the same API as proposed here with slightly different semantics as well ("regular" API calls are disallowed until the data store has finished recovering, instead of returning bogus results).

Even though there are slight semantic differences their usage by a caller is the same (wait for data store to be ready to server reliable results). As it's currently not used by any caller (complete_indexing() is still in sugar.datastore.datastore, but not in carquinyol.datastore) it's a low-risk change and would ease transition to the version-enabled data store.

in reply to: ↑ 7   Changed 4 years ago by tomeu

Replying to sascha_silbe:

Still relevant because if the index gets corrupted (e.g. due to unclean shutdown on a non-dat journalling filesystem) it will be rebuilt in the background and return different results than during normal operation.

That's fine, but why add API if nobody will use it?

follow-up: ↓ 10   Changed 4 years ago by sascha_silbe

Because I intend for it to get used. Introducing the API now allows for independant, small changes in other code.

in reply to: ↑ 9   Changed 4 years ago by tomeu

  • keywords r! added; r? removed

Replying to sascha_silbe:

Because I intend for it to get used. Introducing the API now allows for independant, small changes in other code.

I really believe we should be putting our energies in issues that are real right now. I also don't think it's a good idea to design APIs when we don't have real needs.

  Changed 4 years ago by sascha_silbe

  • milestone changed from 0.86 to 0.88

Moved to 0.88 as discussed on IRC.

  Changed 3 years ago by erikos

  • milestone changed from 0.88 to 0.90

Any update on this one? Moving out as it is not critical.

  Changed 3 years ago by sascha_silbe

Related: #1546 (migrating large data store causes DBus timeout)

Note: See TracTickets for help on using tickets.