Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27829 : New command "Show Processlist" to display current operations and related details in Hiveserver2 #5319

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

rtrivedi12
Copy link
Contributor

What changes were proposed in this pull request?

This PR introduces a new command "Show Processlist" which displays the operations running on Hiveserver2 and provides session and operation details. This is similar to the MYSQL SHOW PROCESSLIST implementation.

Why are the changes needed?

This command will help troubleshoot issues with the Hiveserver2, especially stuck queries. We can understand the load on a given HS2 instance and identify inappropriate connections to terminate them.

Does this PR introduce any user-facing change?

Yes, attaching sample output for the command


0: jdbc:hive2://localhost:10000> show processlist;
+------------+------------+-------------------+---------------------------------------+--------------------------+------------------------+----------------------------------------------------+-----------+-------------------+-------------------+---------------+-----------------+
| User Name  |  Ip Addr   | Execution Engine  |              Session Id               | Session Active Time (s)  | Session Idle Time (s)  |                      Query ID                      |   State   | Opened Timestamp  | Elapsed Time (s)  |  Runtime (s)  |      Query      |
+------------+------------+-------------------+---------------------------------------+--------------------------+------------------------+----------------------------------------------------+-----------+-------------------+-------------------+---------------+-----------------+
| hive       | 127.0.0.1  | mr                | 6cdf027f-90e1-48af-b4d8-a11aac71ca1b  | 154                      | 19                     | rtrivedi_20240624193743_fb4b2bf8-a02a-4b76-a92c-4e3ee4bb6e9e | FINISHED  | 1719275863493     | 102               | 83            | show tables     |
| hive       | 127.0.0.1  | mr                | 43945f54-d65c-424d-b523-08e7675d8223  | 165                      | 67                     | rtrivedi_20240624193826_42bff3ed-fb8d-4478-9500-fc6ff2173041 | RUNNING   | 1719275906721     | 59                | Not finished  | show databases  |
+------------+------------+-------------------+---------------------------------------+--------------------------+------------------------+----------------------------------------------------+-----------+-------------------+-------------------+---------------+-----------------+
2 rows selected (4.149 seconds)

Is the change a dependency upgrade?

No

How was this patch tested?

Manually tested
Added Test class - mvn test -Dtest=TestHiveCommandOpForProcessList

@zhangbutao
Copy link
Contributor

Is this an administrator level command? Should the common user see this information?
I am afraid this would lead to permission/authorization issues.

@nrg4878
Copy link
Contributor

nrg4878 commented Jun 25, 2024

@zhangbutao Is your concern that we are showing "query string" ? or even the query id as well? I share your concern on the query string. Any query cancellations are authorized but we should also confirm that unauthorized users arent able to kill other user's queries. Other than that, I do not see an issue from showing the query IDs. Do you

@zhangbutao
Copy link
Contributor

@zhangbutao Is your concern that we are showing "query string" ? or even the query id as well? I share your concern on the query string. Any query cancellations are authorized but we should also confirm that unauthorized users arent able to kill other user's queries. Other than that, I do not see an issue from showing the query IDs. Do you

@nrg4878 Yes, the main concern is "query string" and "query id". If we currently already have ways to prevent unauthorized users from killing or seeing the "query string" and "query id". I think the new command is also cool. :)

@rtrivedi12
Copy link
Contributor Author

@nrg4878 @zhangbutao We authorize kill query operation with SERVICE_NAME object. In case authorization is disabled, Hiveserver2 process owner should have kill privileges. So, I believe it is safe to display "query Id" . I agree we should avoid query display in this command.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants