Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.4.14 maximum query size #162

Open
Rosadosa opened this issue Feb 2, 2021 · 10 comments
Open

0.4.14 maximum query size #162

Rosadosa opened this issue Feb 2, 2021 · 10 comments

Comments

@Rosadosa
Copy link

Rosadosa commented Feb 2, 2021

We encountered an issue using this package together with impyla to query to a Hive data base in a Cloudera stack.
When firing of the query with impyla it arrived at the cluster and was executed.
However, we did not receive the results back and we got a impala.error.HiveServer2Error: Invalid query handle: error.

We found out by trial and error that when we put a limit on our query we were able to retrieve data with a maximum size of 15kB.
The issue was resolved by installing thriftpy2 version 0.4.12, so it seems something in 0.4.14 is causing this issue.
We used impyla version 1.16.3.

@ethe
Copy link
Member

ethe commented Feb 2, 2021

it may be caused by #158, could you please try to install the master branch of thriftpy2?

@Rosadosa
Copy link
Author

Rosadosa commented Feb 2, 2021

Hey ethe,
I installed it with pip install thriftpy2==0.4.14, that did not work, when I installed thriftpy2==0.4.12 it did.
I am installing the master branch when I do a pip install, right?
Sorry if I misunderstand what you mean..

@ethe
Copy link
Member

ethe commented Feb 3, 2021

not 0.4.14, please install the master branch directly from the source code, the fixed code of #158 is not released yet, but if you could make a quick verification.

@Rosadosa
Copy link
Author

Rosadosa commented Feb 8, 2021

Hey Ethe,

a colleague of mine has tested it with commit 108cca5, and the error still persists and we get the same error as before.

here is the stacktrace:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/tmp/tmp.lVmdprplc8/my-code-path/db.py", line 95, in fetchall res = as_pandas(cur) File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/util.py", line 63, in as_pandas return DataFrame.from_records(cursor.fetchall(), columns=names, File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 535, in fetchall return list(self) File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 583, in __next__ convert_types=self.convert_types) File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 1242, in fetch resp = self._rpc('FetchResults', req) File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 994, in _rpc err_if_rpc_not_ok(response) File "/tmp/tmp.lVmdprplc8/.venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 748, in err_if_rpc_not_ok raise HiveServer2Error(resp.status.errorMessage) impala.error.HiveServer2Error: Invalid query handle: 7b42846c45a5272f:d5d8cd1300000000

@Rosadosa
Copy link
Author

Rosadosa commented Feb 8, 2021

We also found this online, that has the same issue as us: cloudera/thrift_sasl#28
Maybe it has been related to thrift-sasl all along..

@Maxsparrow
Copy link

Thanks for pointing this out @Rosadosa. Yes my issue appears to be the same. Are you also using thrift-sasl? It still isn't clear to me which library is the issue, as the debug stack trace I got (linked in the thrift-sasl issue) calls both of them just before erroring.

I tried thriftpy2==0.4.14, 0.4.12, 0.4.11, and also thriftpy2@master from github, and I still had the same issue with all of them.

The only way I can get it to work is by using thrift-sasl==0.2.1, which uses thriftpy, not thriftpy2, which again means it could be a problem with either of them. Using thrift-sasl==0.2.1 is not a good solution though because thriftpy is deprecated and it doesn't install right on Python 3.7+.

@ethe
Copy link
Member

ethe commented Feb 9, 2021

hi @Maxsparrow @Rosadosa could you please give me a test case to reproduce this issue?

@Maxsparrow
Copy link

Sure, this is what I run against our kerberized Impala cluster to reproduce the problem:

Python 3.7

pip install impyla thrift-sasl==0.4.2 kerberos>=1.3.0  
# Results in thriftpy2==0.4.14
  • Get valid kerberos ticket
  • Run query in Python
from impala.dbapi import connect
conn = connect(host='our-impala-host', auth_mechanism='GSSAPI', timeout=20, use_ssl=False, ca_cert=None, ldap_user=None, ldap_password=None)
cursor = conn.cursor()
cursor.execute("SELECT * FROM any_database.any_table LIMIT 1000")
f = cursor.fetchall()
print(len(f))

Error:

impala.error.HiveServer2Error: Invalid query handle: eb4a0579e37e6ce8:133415b900000000

Full debug stack trace is available in the linked thrift-sasl ticket - do you want me to copy it here?

Using LIMIT 1 instead of LIMIT 1000 works. As @Rosadosa noted, there is a certain size where it stops working (they said 15kB, but I didn't test to find the exact amount).

I've tried a variety of other things like using cursor.fetchone() instead of fetchall(), but I get the same issue. Impala daemon logs make it look like a normal successful query.

@ethe
Copy link
Member

ethe commented Feb 9, 2021

@Maxsparrow Sorry I am not familiar with samba or thrift-sasl, maybe it is better to have more exactly info to point out that it is caused by thriftpy2.

@Maxsparrow
Copy link

Hi @ethe, from the other ticket, the debug stack trace shows that thriftpy2 throws a timeout:

Attempting to open transport (tries_left=3)
Transport opened
Failed to open transport (tries_left=3)
Traceback (most recent call last):
  File "/opt/venv/lib/python3.7/site-packages/impala/hiveserver2.py", line 1010, in _execute
    return func(request)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/thrift.py", line 219, in _req
    return self._recv(_api)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/thrift.py", line 238, in _recv
    result.read(self._iprot)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/thrift.py", line 160, in read
    iprot.read_struct(self)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 387, in read_struct
    return read_struct(self.trans, obj, self.decode_response)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 316, in read_struct
    read_val(inbuf, f_type, f_container_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 289, in read_val
    read_struct(inbuf, obj, decode_response)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 316, in read_struct
    read_val(inbuf, f_type, f_container_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 289, in read_val
    read_struct(inbuf, obj, decode_response)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 316, in read_struct
    read_val(inbuf, f_type, f_container_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 256, in read_val
    result.append(read_val(inbuf, v_type, v_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 289, in read_val
    read_struct(inbuf, obj, decode_response)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 316, in read_struct
    read_val(inbuf, f_type, f_container_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 289, in read_val
    read_struct(inbuf, obj, decode_response)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 316, in read_struct
    read_val(inbuf, f_type, f_container_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 256, in read_val
    result.append(read_val(inbuf, v_type, v_spec, decode_response))
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/protocol/binary.py", line 229, in read_val
    sz = unpack_i32(inbuf.read(4))
  File "/opt/venv/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 173, in read
    self._read_frame()
  File "/opt/venv/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 177, in _read_frame
    header = self._trans_read_all(4)
  File "/opt/venv/lib/python3.7/site-packages/thrift_sasl/__init__.py", line 198, in _trans_read_all
    return read_all(sz)
  File "/opt/venv/lib/python3.7/site-packages/thriftpy2/transport/socket.py", line 110, in read
    buff = self.sock.recv(sz)
socket.timeout: timed out
Closing transport (tries_left=3)

I don't really know how to read this though. Do you have any ideas as to why it would timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants