Opened 10 years ago

Closed 10 years ago

Last modified 7 years ago

#1342 closed defect (fixed)

range queries use lexical comparison, not numerical

Reported by: sascha_silbe Owned by: sascha_silbe
Priority: Normal Milestone:
Component: Sugar Version: Git as of bugdate
Severity: Minor Keywords: r+
Cc: Distribution/OS: Unspecified
Bug Status: Assigned

Description

While adding a range query example to the test suite I discovered a bug: Range queries currently do lexical comparisons, not a numerical ones. This means that a range query of 9000..11000 will _not_ match 10000.
For the current code this is a minor issue as only the 'timestamp' property supports range queries and all dates between about mid-2001 and 2033 have exactly ten digits (so for regular Journal entries it shouldn't make any difference).

Attachments (2)

0001-use-xapian.sortable_serialise-for-storing-searching-.patch (2.1 KB) - added by sascha_silbe 10 years ago.
use xapian.sortable_serialise() for storing/searching numeric values
1342.patch (1.8 KB) - added by sascha_silbe 10 years ago.
fix range query for timestamp to do numerical comparison instead of lexical

Download all attachments as: .zip

Change History (17)

Changed 10 years ago by sascha_silbe

use xapian.sortable_serialise() for storing/searching numeric values

comment:1 Changed 10 years ago by tomeu

  • Priority changed from Unspecified by Maintainer to Immediate

comment:2 follow-up: Changed 10 years ago by tomeu

>>> xapian.sortable_serialise('2.0')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: in method 'sortable_serialise', argument 1 of type 'double'

I think we should be able to accept strings in propertiestimestamp?.

comment:3 in reply to: ↑ 2 ; follow-up: Changed 10 years ago by sascha_silbe

Replying to tomeu:

I think we should be able to accept strings in propertiestimestamp?.

For that to work we need to add type information to _QUERY_VALUE_MAP (or special-case timestamp somewhere else in the code) so we know we need to convert the string to an integer first.

I suggest moving this ticket to 0.88.

comment:4 in reply to: ↑ 3 ; follow-up: Changed 10 years ago by tomeu

Replying to sascha_silbe:

Replying to tomeu:

I think we should be able to accept strings in propertiestimestamp?.

For that to work we need to add type information to _QUERY_VALUE_MAP (or special-case timestamp somewhere else in the code) so we know we need to convert the string to an integer first.

Why not just doing this?

document.add_value(_VALUE_TIMESTAMP, xapian.sortable_serialise(float(properties['timestamp'])))

It will fail if people put there non-numeric values, but at least it won't fail if people happen to pass a dbus string instead of a dbus numeric type.

comment:5 in reply to: ↑ 4 ; follow-up: Changed 10 years ago by sascha_silbe

Replying to tomeu:

Why not just doing this?

document.add_value(_VALUE_TIMESTAMP, xapian.sortable_serialise(float(properties['timestamp'])))

Because that's only half of the solution and IMO the less important half. Timestamps are usually generated by the sugar.* framework while saving the entry, I can't think of any reason a (Python) activity would want to do it on their own and then I'd rather have it fail right away if they don't pass a numeric type, especially if passing a string for querying doesn't work.
The other half is querying and this is where is makes an important difference: We need to know we're expecting a numeric type in order to invoke sortable_serialise() before comparison. sortable_serialise() produces binary data, so comparing with a decimal string doesn't make any sense.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 10 years ago by tomeu

Replying to sascha_silbe:

Replying to tomeu:

Why not just doing this?

document.add_value(_VALUE_TIMESTAMP, xapian.sortable_serialise(float(properties['timestamp'])))

Because that's only half of the solution and IMO the less important half.

Well, for me it's important that we don't add more points of failure to activity development.

Timestamps are usually generated by the sugar.* framework while saving the entry, I can't think of any reason a (Python) activity would want to do it on their own and then I'd rather have it fail right away if they don't pass a numeric type, especially if passing a string for querying doesn't work.

I really don't see why you don't think it's good to play a bit safer.

The other half is querying and this is where is makes an important difference: We need to know we're expecting a numeric type in order to invoke sortable_serialise() before comparison. sortable_serialise() produces binary data, so comparing with a decimal string doesn't make any sense.

I'm just saying to accept the timestamp property in both string and numerical dbus types. It of course needs to contain a number, just don't see what we gain by enforcing it to be a numerical dbus type there.

comment:7 in reply to: ↑ 6 ; follow-up: Changed 10 years ago by sascha_silbe

Replying to tomeu:

I really don't see why you don't think it's good to play a bit safer.

I'm mostly arguing against "playing safe" early on (create() / update()) and failing silently later (find()).

comment:8 in reply to: ↑ 7 ; follow-up: Changed 10 years ago by tomeu

Replying to sascha_silbe:

Replying to tomeu:

I really don't see why you don't think it's good to play a bit safer.

I'm mostly arguing against "playing safe" early on (create() / update()) and failing silently later (find()).

Sorry, I'm still not getting it. Why would we fail silently when querying?

comment:9 in reply to: ↑ 8 ; follow-up: Changed 10 years ago by sascha_silbe

Replying to tomeu:

Sorry, I'm still not getting it. Why would we fail silently when querying?

Because timestamp is stored using sortable_serialise() on create()/update(), but if you pass a (decimal) string for timestamp to find() it will use it as-is. Comparing a decimal string to one returned by sortable_serialise() doesn't give any useful result.

comment:10 in reply to: ↑ 9 ; follow-up: Changed 10 years ago by tomeu

Replying to sascha_silbe:

Replying to tomeu:

Sorry, I'm still not getting it. Why would we fail silently when querying?

Because timestamp is stored using sortable_serialise() on create()/update(), but if you pass a (decimal) string for timestamp to find() it will use it as-is. Comparing a decimal string to one returned by sortable_serialise() doesn't give any useful result.

But if we know that timestamp is a serialized decimal number, why not treat it as such in find()?

comment:11 in reply to: ↑ 10 Changed 10 years ago by sascha_silbe

  • Keywords r? removed

Replying to tomeu:

But if we know that timestamp is a serialized decimal number, why not treat it as such in find()?

That's exactly what I said above:

For that to work we need to add type information to _QUERY_VALUE_MAP (or special-case timestamp somewhere else in the code) so we know we need to convert the string to an integer first.

We can special-case timestamp in find(), but since the impact for 0.86 is minimal I'd rather modify the generic code for 0.88 so it works for other properties (that didn't make it into 0.86 but hopefully will for 0.88) as well.

Changed 10 years ago by sascha_silbe

fix range query for timestamp to do numerical comparison instead of lexical

comment:12 Changed 10 years ago by sascha_silbe

  • Keywords r? added
  • Milestone changed from 0.86 to 0.88
  • Priority changed from Immediate to Normal
  • Status changed from new to accepted

New patch, based on #1437.

comment:13 Changed 10 years ago by alsroot

  • Keywords r+ added; r? removed
  • Resolution set to fixed
  • Status changed from accepted to closed

comment:14 Changed 7 years ago by dnarvaez

  • Component changed from sugar-datastore to Sugar

comment:15 Changed 7 years ago by dnarvaez

  • Milestone 0.88 deleted

Milestone 0.88 deleted

Note: See TracTickets for help on using tickets.